inline functions in c++ - c++

here is a small question about inline functions in c++.
At what stage of the compilation in C++ are the inline functions actually inlined at the call?
how does that basically work.
lets say if the compiler has decided that a particualr function has to be inline after the programmer has requested with an inline keyword in front of the function ,when does the compiler does that for the programmer .i mean at what stage of the compilation.
is it at the preprocessing stage like in c macros are expanded?

It will vary by compiler. And some stages in some compilers will have no corresponding stages in other compilers. So your question doesn't really have a definite answer.
But generally it's done after the parse tree for the function is created, but before code is actually generated or many optimizations are done. This is the most optimum place to do it because you want the maximum amount of information available for optimizer to work with.
Doing it like a preprocessor macro expansion would be generally too early. The compiler doesn't then have enough information to do the appropriate type checking, and it's easier also to make mistakes that cause side effects to happen more than once and so on.
And GMan provided an excellent Wikipedia link in a comment that goes into much more detail about the function inlining process than I do here. My answer is generally true, but there is a LOT of variation, even more than I thought there was.

Related

Can a modern C/C++ compiler optimize better with the code in header?

I've often heard that it is bad practice to place code in the header, but it's been common to place short functions in the headers, partly to help the compiler optimize better.
The inline keywords can help the compiler determine what functions should be inlined, but aside from that, is there still a reason to have short performance critical functions in the headers? Or does it not matter any more for modern compilers?
Technically, the inline keyword only means that the definition is allowed in multiple translation units. That is, if you have an inline function defined in a header file, and that header gets included in multiple source files, then that is fine. For a non-inline, non-template function, that would be illegal.
But compilers can and do take advantage of being able to see the code of the function that is being called. This happens not only for inline functions but also any other functions whose code may be visible. Many compilers try to make a good guess about whether to inline the code. Having the code be inlined might make the program bigger or smaller, faster or slower. If the compiler can determine that the code is likely to be both faster and smaller when the code is inlined, then it will do it. Otherwise it has to consider the trade-off.
Many modern compilers can do link-time optimization, where code that wasn't inlined to begin with can be inlined during the link phase, with some cost in link time. There may be certain optimization opportunities that are lost when delayed until link-time as well.
In my own experience, I've found that making small functions be inline is typically always a win for both size and speed. For larger functions, I often see it make the programs faster but larger, but I've also seen it rarely make the programs slower and larger. If the performance of a particular function is important, you'll need to do measurements to help make the choice of whether to inline or not.

Are the Optimization Keywords in C and C++ Reasonable?

So we've all heard the don't-use-register line, the reasoning being that trying to out-optimize a compiler is a fool's errand.
register, from what I know, doesn't actually state anything about CPU registers, just that a given variable can't be referenced indirectly. I'll hazard a guess that it's often referred to as obsolete because compilers can detect a lack of addressing automatically thus making such optimizations transparent.
But if we're firm on that argument, can't it be levelled at every optimization-driven keyword in C? Why do we use inline and C99's restrict for example?
I suppose that some things like aliasing make deducing some optimizations hard or even impossible, so where is the line drawn before we start venturing into Sufficiently Smart Compiler territory?
Where should the line should be drawn in C and C++ between spoon-feeding a compiler optimization information and assuming it knows what it's doing?
EDIT: Jens Gustedt pointed out that my conflating of C and C++ isn't right since two of the keywords have semantic differences and one doesn't exist in standard C++. I had a good link about register in C++ which I'll add if I find it...
I would agree that register and inline are somewhat similar in this respect. If the compiler can see the body of the callee while compiling a call site, it should be able to make a good decision on inlining. The use of the inline keyword in both C and C++ has more to do with the mechanics of making the body of the function visible than with anything else.
restrict, however, is different. When compiling a function, the compiler has no idea of what the call sites are going to be. Being able to assume no aliasing can enable optimizations that would otherwise be impossible.
inline is used in the scenario where you implement a non-templated function within the header then include it from multiple compilation units.
This ensures that the compiler should create just one instance of the function as though it were inlined, so you do not get a link error for multiply defined symbol. It does not however require the compiler to actually inline it.
There are GNU flags I think force-inline or similar but that is a language extension.
register doesn't even say that you can't reference the
variable indirectly (at least in C++). It said that in the
original C, but that has been dropped.
Whether trying to out-optimize the compiler is a fool's errand
depends on the optimization. Not many compilers, for example,
will convert sin(x) * sin(x) + cos(x) * cos(x) into 1.
Today, most compilers ignore register, and no one uses it,
because compilers have become good enough at register allocation
to do a better job than you can with register. In fact,
respecting register would typically make the generated code
slower. This is not the case for inline or restrict: in
both cases, there exist techniques, at least theoretically,
which could result in the compiler doing a better job than you
can. Such techniques are not widespread, however, and (as far
as I know, at least), have a very high compile time overhead,
with in some cases compile times which grow exponentially with
the size of the program (which makes them more or less unusable
on most real programs—compile times which are measured in
years really aren't acceptable).
As to where to draw the line... it changes in time. When
I first started programming in C, register made a significant
difference, and was widely used. Today, no. I imagine that in
time, the same may happen with inline or restrict—some
experimental compilers are very close with inline already.
This is a flame-bait question but I will dive in anyway.
Compilers are a lot better at optimising that your average programmer. There was a time I programmed on a 25MHz 68030 and I got some advantage from the use of register because the compiler's optimizer was so poor. But that was back in 1990.
I see inline as just as bad as register.
In general, measure first before you modify. If you find that you code performs so poorly you want to use register or inline, take a deep breath, stand back and look for a better algorithm first.
In recent times (i.e. the last 5 years) I have gone through code bases and removed inline functions galore with no perceptible change in performance being visible. Code size, however, always benefits from the removal of inline methods. That isn't a big issue for your standard x86-style monster multicore marvel of the modern age but it does matter if you work in the embedded space.
It is a moving target, because compiler technology is improving. (Well, sometimes it is more changing than improving, but that has some of the same effect of rendering your optimization attempts moot, or worse.)
Generally, you should not guess at whether an optimization keyword or other optimization technique is good or not. One has to learn quite a bit about how computers work, including the particular platform you are targeting, and how compilers work.
So a rule about using various optimization techniques is to ask do I know the compiler will not do the best job here? Am I willing to commit to that for a while—will the compiler remain stable while this code is in use, am I willing to rewrite the code when the compiler changes this situation? Typically, you have to be an experienced and knowledgeable software engineer to know when you can do better than the compiler. It also helps if you can talk to the compiler developers.
This means people cannot give you an answer here that has a definite guideline. It depends on what compiler you are using, what your project is, what your resources are, and what your goals are, and so on.
Although some people say not to try to out-optimize the compiler, there are various areas of software engineering where people do better than a compiler and in which it is worth the expense of paying people for this.
The difference is as follows:
register is very local optimization (i.e. inside one function). The register allocation is a relatively solved problem both by smarter compilers and by larger number of register (mostly the former but say x86-64 have more registers then x86 and both have larger number then say 8-bit processor)
inline is harder as it is inter-procedure optimization. However as it involves relatively small depth of recursion and small number of procedures (if inlined procedure is too big there is no sense of inlining it) it may be safely left to the compiler.
restrict is much harder. To fully know the that two pointers don't alias you would need to analyse whole program (including libraries, system, plug-ins etc.) - and even then run into problems. However the information is clearer for programmer AND it is part of specification.
Consider very simple code:
void my_memcpy(void *dst, const void *src, size_t size) {
for (size_t i = 0; i < size; i++) {
((char *)dst)[i] = ((const char *)str)[i];
}
}
Is there a benefit to making this code efficient? Yes - memcpy tend to be very useful (say for copying GC). Can this code be vectorized (here - moved by words - say 128b instead of 8b)? Compiler would have to deduce that dst and src does not alias in any way and regions pointed by them are independent. size may depend on user input or runtime behaviour or other elements which makes the analysis practically impossible - similar problems to Halting Problem - in general we cannot analyse everything without running it. Or it might be part of C library (I assume shared libraries) and is called by program hence all call sites are not even known at compile time. Without such analysis the program would exhibit different behaviour with optimization on. On the other hand programmer might ensure that they are different objects simply by knowing the (even higher-level) design instead of need for bottom-up analysis.
restrict can also be part of documentation as it might be programmer who wrote the procedure in a way that it cannot handle 2 aliasing pointers. For example if we want to copy memory from aliasing locations the above code is incorrect.
So to sum up - Sufficiently Smart Compiler would not be able to deduce the restrict (unless we move to compilers understending the meaning of code) without knowing the whole program. Even then the it would be close to undecidability. However for local optimization the compilers are already sufficiently smart. My guess it that Sufficiently Smart Compiler with whole program analysis would be able to deduce in many interesting cases however.
PS. By local I mean single function. So local optimization cannot assume anything about arguments, global variables etc.
One thing that hasn't been mentioned is that many non-x86 compilers aren't nearly as good at optimizing as gcc and other "modern" C-compilers are.
For instance, the compilers for PIC are absolutely terrible at optimizing. Also, the optimizer for cicc (the CUDA compiler), though much better, still seems to miss a lot of fairly simple optimizations.
For these cases, I've found optimization hints like register, inline, and #pragma unroll to be extremely useful.
From what I have seen back in the days I was more involved with C/C++, these are merely orders directly given to the compiler. Compiler may try to inline a function even if it is not given the direct order to do so. That really depends on the compiler and may even raise some cross-compiler issues. As an example, visual studio provides different levels of optimization which correspond to the different intelligence levels of the compiler. I have read that all class functions are implicitly inline to give compiler a hint to minimize function call overhead. In any case, these directives are extremely helpful when you are using a less intelligent compiler while in intelligent cases, they may be very obvious for the compiler to do some optimization.
Also, be sure that these keywords are guaranteed to be safe. Some compiler optimizations may not work with some libraries such as OpenGL (as I have seen it myself). So in cases where you feel that compiler optimization may be harmful, you can use these keywords to make sure it is done the way you want it to.
The compilers such as g++ these days optimize the code very well. You might as well search for optimization elsewhere, maybe in the methods and algorithm you use or by using TBB or CUDA to make your code parallel.

Why there is no standard way to force inline in C++?

According to the wikipedia C++ article
C++ is designed to give the programmer choice, even if this makes it possible for the programmer to choose incorrectly.
If it is designed this way why there is no standard way to force the compiler to inline something even if I might be wrong?
Or I can ask why is inline keyword is just a hint?
I think I have no choice here.
In the OOP world we call methods on the objects and directly accessing members should be avoided. If we can't force the accessors to be inlined, then we are unable to write high performance but still maintainable applications.
(I know many compilers implement their own way to force inlining but it's ugly. Using macros to make inline accessors on a class are ugly too.)
Does the compiler always do it better than the programmer?
How would a compiler inline a recursive function (especially if the compiler does not support Tail-call optimization and even if it does, the function is not Tail-call optimize-able).
This is just one reason where compiler should decide whether inline is practical or not. There can be others as well which I cant think of right now.
Does the compiler always do it better than the programmer?
No, not always... but the programmer is far more error prone, and less likely to maintain the optimal tuning over a span of years. The bottom line is that inlining only helps performance if the function is really small (for at least one common/important code path) but then it can help by about an order of magnitude, depending on many things of course. It's often impractical for the programmer to assess let alone keep a careful eye on how trivial a function is, and the thresholds can vary with compiler implementation choices, command line options, CPU model etc.. There are so many things that could suddenly bloat a function - any non-builtin type can trigger all sorts of different behaviours (esp in templates), use of an operator (even new) can be overloaded, the verbosity of calling conventions and exception-handling steps aren't generally obvious to the programmer.
The chances are that if the compiler isn't inlining something that's small enough for you to expect a useful performance improvement if it was inlined, then the compiler's aware of some implementation issue you're not that would actually make it worse. In those gray cases where the compiler might go either way and you're just over some threshold the performance difference isn't likely to be significant anyway.
Further, some programmers (myself included) can be lazy and deliberately abuse inline as a convenient way to put implementation in a header file, getting around the ODR, even though they know those functions are large and that it would be disastrous if the compiler (were required to) actually inline them. This doesn't preclude a forced-inline keyword/notation though... it just explains why it's hard to change the expectations around the current inline keyword.
Or I can ask why is inline keyword is
just a hint?
Because you "might" know better than the compiler.
Most of the time, for functions not marked inline (and correctly declared/defined), the compiler, depending on it's configuration and implementation, will itself evaluate if the function can be inlined or not.
For example, most compilers will automatically inline member functions that are fully defined in the header, if the code is'isn't long and/or too complex. That's because as the function is available in the header, why not inline it as much as we can?
However this don't happen, for example, in Debug mode for Visual Studio : in Debug the debug informations still need to map the binary code of the functions, so it avoid inlining, but will still inline functions marked inline, because the user required it. That's useful if you want to mark functions yuo don't need to have debug-time informations (like simple getters) while getting better performance at debug-time.
In Release mode (by default) the compiler will agresively inline everything it can, making harder to debug some part of the code even if you activate debugging informations.
So, the general idea is that if you code in a way that helps the compiler inlining, it will inline as much as it can. If you write your code in ways that is hard or impossible to inline, it will avoid. If you mark something inline, you just tell the compiler that if it find it hard but not impossible to inline, it should inline it.
As inlining depends on both contexts of the caller and the callee, there is no "rule".
What's often advised is to simply ignore explicitly mark function inline but in two cases :
if you need to put a function definition in a header, it just have to be inlined; often the case for template (member or not) functions, and other utility functions that are just shortcuts;
if you want a specific compiler to behave in specific way at compile time, like marking some member functions inline to be inlined even in Debug configuration on Visual Studio compilers, for example.
Does the compiler always do it better
than the programmer?
No, that's why sometimes using the inline keyword can help. The programmer can have sometimes a better general view of what's necessary than the compiler. For example, if the programmer wants it's binary to be the smallest possible, depending on code, inlining can be harmful. In speed performance required application, inlining aggressively can help very much. How would the compiler know what's required? It have to be configured and be allowed to know in a fine-grain way what is really wanted to be inline.
Mistaken assumption.
There is a way. It's spelled #define. And for many early C projects, that was good enough. inline was sufficiently different - hint, better semantics - that it could be added besides macros. But once you had both, there was little room left for a third option in between, one with the nicer semantics but non-optional.
If you really need to force the inline of a function (why?), you can do it: copy the code and paste it, or use a macro.

Making a long function inline

Suppose I have a 10 line function. If I add inline keyword, let's say there is a chance of 50% that compiler will make it inline.
If I have a 2 line function, there might be 90% chance it will be inlined.
Can I split the code in 10 line function into 5 functions to make it inlined with better chances?
There may be a reason why the compiler isn't inlining it, possibly something to look at. In addition, the function call overhead becomes less of an issue with longer functions, so inlining them may not be as important (if that's your only reason).
Splitting the function into 5 small functions will just make a mess of your code, and possibly confuse the compiler and end up with it not inlining anything. I would not recommend that.
Depending on your C++ compiler, you may be able to force it to inline the function. Visual C++ has the __forceinline attribute, as well as a setting for how inlining should be handled and how often it should be used in the project settings. As Tony mentions, the GCC equivalent is __attribute__((always_inline)).
You may also be able to use some preprocessor trickery to inline the code itself, but I would not typically recommend that.
If it makes the code more readable, go for it. If not, trust the compiler and don't go messing up your code on the off chance that it'll help. The compiler's a lot smarter than you think, and generally knows better than you do when inlining will help -- and when it won't, or worse, will break stuff.

inline functions [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Inline functions vs Preprocessor macros
hello can somebody please explain what exactly does it mean, and what is the difference from regular macro(I know that it works during compile time and not preprocessor but so what?) thanks in advance for any help, looked in google but didn't find something understandable
(Assuming here that you're talking about C/C++.)
An inline function has its code copied into the points where it's called, much like a macro would.
The big reason you would use inline functions rather than macros to accomplish this is that the macro language is much weaker than actual C/C++ code; it's harder to write understandable, re-usable, non-buggy macros. A macro doesn't create a lexical scope, so variables in one can collide with those already in scope where it's used. It doesn't type-check its arguments. It's easy to introduce unexpected syntactic errors, since all a macro does is basically search-and-replace.
Also, IIRC, the compiler can choose to ignore an inline directive if it thinks that's really boneheaded; it can't do that with a macro.
Or, to rephrase this in a more opinionated and short way: macros (in C/C++, not, say, Lisp dialects) are an awful kludge, inline functions let you not use them.
Also, keep in mind that it's often not a great idea to mark a function as inline. Compilers will generally inline or not as they see fit; by marking a function inline, you're taking over responsibility for a low-level function that most of the time, the compiler's going to know more about than you are.
There is a big difference between macros and inline functions:
Inline is only a hint to the compiler
that function might be inlined. It
does not guarantee it will be.
Compiler might inline functions not
marked with inline.
Macro invocation does not perform
type checking, so macros are not type
safe.
Using function instead of a macro is
more likely to give you a nice and
understandable compiler output in
case there is some error.
It is easier to debug functions,
using macros complicates debugging a
lot. Most compilers will give you an
option of enabling or disabling
inlining, this is not possible with
macros.
The general rule in C++ is that macros should be avoided whenever possible.