can overuse in Macros hurt performance? - c++

I have a very long code, which is being called millions of time,
I have noticed that if I change all the macros into inline functions the code runs a lot faster.
Can you explain why this is? Aren't macros only a text replacement? As opposed to inline functions which can be a call to a function?

A macro is a text sustitution and will as such generally produce more executable code. Every time you call a macro, code is inserted (well, not necessarily, the macro could be empty... but in principle).
Inline functions, on the other hand, may work the same as macros, but they might also not be inlined at all.
In general, the inline keyword is rather a weak hint than a requirement anyway, compilers will nowadays judiciously inline functions (or will abstain from doing so) based on heuristics, mostly the number of pseudo-instructions.
Inline functions may thus cause the compiler to not inline the function at all, or inline it a couple of times and then call it non-inined in addition.
Surprisingly, not inlining may actually be faster than inlining, since it reduces overall code size and thus the number of cache and TLB misses.

This will depend on the particular macro and function call that you are using. A particular macro can actually compile to a longer set of operations than the inline function. It is often better not to use a macro for certain processes. The inline function will allow the compiler to type check and optimize the various processes. Macros will be subject to a number of errors and can actually cause various inefficiencies (such as by having to move variables in and out of storage).
In any case, since you actually see this happening in your code, you can tell that the compiler is able to optimize your inline code rather than blindly put in the text expansion.
Note that a google search 'macros vs inline' shows a number of discussions of this.

Apart from forcing inlining, macros can also be detrimental to speed if they are not carefully written not to evaluate their arguments twice. Take for example this little function-like macro and its inline function equivalent:
#define square(x) ((x)*(x))
inline long square(long x) { return x*x; }
Now, when you call them with a variable square(foo), they are equivalent. The macro vesion expands to ((foo)*(foo)), which is one multiplication just like the function if it's inlined.
However, if you call them with square(expensiveComputation(foo)), the result of the macro is, that expensiveComputation() is called twice. The inline function, in contrast, behaves like any function: its argument is evaluated once before the body of the function is executed.
Of course, you could write the macro using the gnu extension of compound statements (see http://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html for documentation on this) to avoid double evaluation like this:
#define square(x) ({ \
long square_temp_variable = (x); \
square_temp_variable*square_temp_variable; \
})
But this is a lot of hassle, and it makes the code unportable. So, better stick with inline functions.

at general it is a good advise to replace function style macros by inline functions wherever this is possible.
not only you ged rit of some nasty traps a = MIN(i++, 50) for example you also gain typesafety and as already stated in some comments you avoid multiple evaluation of arguements, that may have very bad influence on performance.

Related

Does it makes sense to define an inline function which makes calls to other function(s)?

From what I know an inline function is an optimization to enhance performance, thus it should run as fast as a macro. Inline function's code should be as short as possible.
I wonder if it make sense to embed functions calls inside an inline function. If the answer is yes, in which context and what are the restrictions?
Actually, I am asking this question because I looked at a code of someone who is calling functions such as "socket()", "sendto()" and "memset()" from inline functions; something that overrides the purpose of an inline function in my opinion.
Note: In the code I have there is no use of any conditional calls to the functions, the inline function just passes arguments to the called functions.
I wonder if it make sense to embed functions calls inside an inline function.
Of course it does. Inlining a call to your function is still an optimisation, removing the cost of that function call and allowing further optimisations in the context of the calling function, whether or not it in turn calls other functions.
in which context and what are the restrictions?
I've no idea what you mean by "context"; but there are no restrictions on what you can do in an inline function. The only semantic effects of declaring a function inline are to allow multiple identical definitions of the, and require a definition in any translation unit that uses the function. In all other respects, it's the same as any other function definition.
Comment posted as answer, by request:
If the guy who wrote the code believed that inline had any meaningful impact on performance, he was manifestly NOT 'doing the right thing'.
Performance comes from correct algorithm selection and avoiding cache misses.
Headaches come from naive premature optimisation techniques that may have worked in 1991
I see no a priori reason why inline code could not contain function calls.
Argument passing aside, inlining inserts the lines of code as they stand, reducing call overhead and allowing local/ad-hoc optimizations.
For instance, inline void MyInline(bool Perform) { if (Perform) memset(); } could very well be skipped when invoked with MyInline(false).
Inlining could also allow inlining of the internal function calls, resulting in even more (micro)optimization opportunities.
The compiler will choose when to inline. And you should avoid attempting premature optimisation at the expense of exposing your implementation.
The compiler may be able to optimise away the forwarding of the functions you are calling. It might do that anyway with optimisation flags even if you do not use the inline keyword.
The time to use the inline keyword is when you want to make a header-only file to use in multiple projects without having to use a link-library. In reality this doesn't really mean "inline" at all, it means "one definition only" even across compilation units calling the function.
In any case you should look at this wiki question / answer:
Benefits of inline functions in C++?
It makes a perfect sense.
Consider a function that consists of two possible branches of execution — a fast path which is activated when certain condition holds (most of the time) and a slow path.
Inlining the whole thing would result in growing the size of the code for little benefit. The slow path complexity may prevent the compiler from inlining the function.
If you make the slow path into a separate function an interesting opportunity opens.
It makes it possible to inline the condition and the fast path while a slow path remains a function call. Inlining the fast path allows to avoid function call overhead most of the time. The slow path is already slow hence the call overhead is negligible.
First, in C++, an "inline" function (one declared in the header file or labeled as such) is just a suggestion to the compiler. The compiler itself will make the decision on whether or not to actually make it inline.
There are three reasons why to inline a function:
Pushing another round of variables onto the stack is expensive, as is branching to the new point in the program.
Sometimes we can do local optimizations with the intermediates of the function (though I wouldn't count on it!)
Put the definition of a function in the header file (work around).
Take the following example
void NonInlinable(int x);
inline void Inline() { NonInlinable(10);}
This makes a ton of sense to inline. I remove 1 function call, so if NonInlinable is pretty fast, then this could be a huge speedup. So regardless of whether or not I'm calling functions, I could still want to inline the call.
Now another example:
inline int Inline(int y) {return y/10;}
//main
...
int x = 5;
int y = Inline(5);
int z = x % 10;
The modulo and devise operations are usually calculated by the same instruction. A really nice compiler, can compute y and z in 1 assembly instruction! magic
So in my mind, a better question to ask is when should I not use inline functions:
When I want to separate definition from declaration (very good practice for readability, and premature optimization is the root of all evil).
When I want to hide my implementation/use good encapsulation.

What is better in this case, macro of inline function? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Inline functions vs Preprocessor macros
what is concept of Inline function and how it is differ from macro?
inline unsigned int getminutes( unsigned int seconds )
{
return( seconds / 60 );
}
#define GetMinutes(seconds) (seconds) / (60)
To be honest I'd ask which one is faster, but I've seen so much on S.O that asking which one is better would grant me me knowledge. (Yes! I'm a knowledge hunter)
Never use a macro if you can use an inline function to achieve the same. The compiler is going to generate exactly the same code for both of the solutions you provided, assuming you are using a fairly decent one.
Of course there is no guarantee that inline functions will actually be inlined, but in these cases, if your compiler can't inline that function, then it's probably a really bad one.
Just don't use macros unless you really need to(header guards, do repetitive stuff, etc). Macros are evil in several ways, you can read a lot about that if you search for information online.
I guess the macro will be faster if you consider that inline is not guaranteed by the compiler to be used. If the function is not inlined, then you have the overhead of a function call.
The macro will be expanded in place by the preprocessor, so it's always going to be inline.
The macro is also not type safe and has global scope.
Functions are preferred.
With a good optimizing compiler the performance will be identical. The difference is that the inline function is more or less a suggestion to the compiler. Although the compiler should in most cases honor the suggestion, the macro version will force the compiler to inline the code.
As an aside, your macro should be written ((seconds) / 60) to make sure the intended grouping is used in all cases.
Unfortunately, which is faster is one of those cases where the only way to know is to profile. I suspect, however, that the result is the same in typical release build settings.
Which is better, however, I'd say the inline function. Easier to debug. Safer than a macro.
I avoid macros except where absolutely necessary. I think of them as compile-time find-and-replace. I consider find-and-replace to be extremely dangerous at worst. I actually wrote a post or two about why I dislike #define macros so intensely...
Another word of advice I run on: The compiler knows better than you. The macro will force inline, even if it's actually not good for performance. inline will suggest it as a candidate for inlining, but may not inline if it doesn't meet criteria to be inlined.

What are any real sets of rules compilers use to decide whether to inline a function?

We have a macro for signalling errors in a common utilities library that goes like this:
#define OurMacro( condition ) \
if( condition ) { \
} else { \
CallExternalFunctionThatWillThrowAnException( parametersListHere ); \
} \
What I refer to as parametersListHere is a comma-separated list of constants and macros that is populated by the compiler at each macro expansion.
That function call always resolves into a call - the function implementation is not exposed to the compiler. The function has six parameters and in debug configuration all of them have meaningful values, while in release configuration only two have meaningful values and others are passed the same default values.
Normally the condition will hold true, so I don't care how fast the invokation is, I only care about the code bloat. Calling that function with 6 parameters requires seven x86 instruction (6 pushes and one call), and clearly 4 of those pushes can be avoided if the function signature is changed to have two parameters only - this can be done by introducing an intermediate "gate" function implemented in such way its implementation is not visible to the compiler.
I need to estimate whether I should insist on that change. So far the primary improvement I expect is that reducing the number of parameters will drop 4 instructions on each invokation which means that the code surrounding the macro expansion will become smaller and the compiler will inline it more likely and optimize the emitted code further.
How can I estimate that without actually trying and recompiling all our code and carefully analyzing the emitted code? Every time I read about inline there's a statement that the compiler decides whether to inline the function.
Can I see some exact set of rules of how the function internals influence compiler decision on inlining?
GCC has a fairly large set of options that expose how their process works, documented here. It's of course not exact, given that it will be tweaked over time and it's CPU-dependent.
The first rule is "their body is smaller than expected function call code".
A second rule is "static functions called once".
There are also parameters affecting the inling process, e.g. max-inline-insns-single. An insn is a pseudo-instruction in the GCC compiler, and is used here as a measure of function complexity. The documentation of parameter max-inline-insns-auto makes it clear that manually declaring a function inline might cause it to be considered for inlining even if it is too big for automatic inlining.
Inlining isn't a all-or-nothing process, since there's a -fpartial-inlining flag.
Of course, you can't consider inlining in isolation. Common Subexpression Elimination (CSE) makes code simpler. It's an optimization pass that may make a function small enough to be inlined. After inlining, new common subexpressions may be discovered so the CSE pass should be run again, which in turn might trigger further inlining. And CSE isn't the only optimization that needs rerunning.
The rules on what functions get inlined and under what conditions (e.g. selected optimization level) are specific to each compiler, so I suggest you check your compiler's documentation. However, a function that just forwards to another function (as you propose) should be a good candidate for inlining by any compiler that supports it.
Some compilers have a mechanism whereby you can flag that you really want a function to be inlined, e.g. MSVC++ has __forceinline.
If you are using Visual C++, you can use __forceinline to force the compiler to inline a function.

Is it okay to have a method declared an inline method if its has a for loop in C++

I have a method like the one shown below.
Will the for loop always make the compiler for go the "inline request" ?
inline void getImsiGsmMapFrmImsi
(
const string& imsiForUEDir,
struct ImsiGsmMap& imsiGsmMap
)
{
for (int i = 0 ; (unsigned)i < imsiForUEDir.length() - 1 ; i++)
{
imsiGsmMap.value[i] = imsiForUEDir[i] - '0' ;
}
imsiGsmMap.length = imsiForUEDir.length() - 1 ;
}
You can specify "inline" and the compiler can ignore it if it feels like that.
Simply, no.
"inline" is just a hint to the compiler.
There are ways to force a compiler to inline something, but these ways are compiler-specific. Your code looks mobile to me, so here's some ways on some C++ compilers used on various mobile phone platforms:
Windows CE/ Windows Mobile VC++ ARM compiler uses the __forceinline keyword instead of the hint 'inline'.
A better compiler (i.e. makes faster output) for Windows CE/ Windows Mobile is cegcc, which uses the very latest GCC 4.4. In GCC, you write __attribute__((always_inline)) after the function name and before the body.
The bigger thing is if it's a good idea to inline this loop. I program mobile phones for a living, and they don't have much CPU budget generally. But I'd be really surprised if this loop is a bottleneck. Strip your program of all the 'inline' decorations and when you're approaching shipping, if the program is slow, profile it!
Some compilers allow 'profile guided optimisation' where they can make an instrumented binary that you run in a realistic way, and then they use the data so gathered to make a production binary where they make informed decisions about code speed vs code size in the various parts of your program to give the very best mix of both.
"No inlining for functions with loops" is probably a bit of some inline heuristic from some particular compiler. It doesn't apply universally.
Every compiler uses some heuristics to determine whether the function should be inlined or not, but normally every compiler uses its own ones. So, to say that a loop will have some universal effect on inlining is not correct. It won't. There's absolutely nothing in your function that would somehow fundamentally preclude inlining. Most modern compilers can easily inline this function, if they deem it reasonable or if you force them to do it.
Yes, some compilers offer non-standard declaration specifiers (or compiler options) that will actually force the inlining, i.e. override the heuristic analysis, except for a number of situation when the inlining is truly beyond the capabilities of the compiler. For example, many modern C/C++ compilers normally can't inline functions with variable number of parameters (variadic functions).
It also commonly believed that recursive function can't be inlined. In reality, in many compilers recursive functions can be inlined to certain fixed recursion depth, thus "compressing" the recursion.
I wonder if the inline keyword is even necessary anymore. Don't modern compilers mostly just ignore it and do whatever they think is best, anyway?
Most likely compilers will not inline a function with a loop, since what would be the point? If the code is looping, generally the cost of a function call will be unmeasurable noise compared to the looping.
But if a compiler wants to inline it (maybe the compiler is sophisticated enough to determine the loop bounds and can even unroll the loop), it's certainly allowed to.
But I wouldn't bet on it.
To summarize a previous answer I gave to this, the things you should watch out for when choosing a function for inlining are:
* local static variables
* loop constructs
* switch statements
* try/catch
* goto
* recursion
* and of course too much complexity (whatever that means)
Having said that as the other answers here point out, it's basically unspecified if the compiler inlines the function or not. 7.1.2/2 has:
A function declaration (8.3.5, 9.3, 11.4) with an inline specifier declares an inline function. The inline specifier indicates to the implementation that inline substitution of the function body at the point of call is to be preferred to the usual function call mechanism. An implementation is not required to perform this inline substitution at the point of call; however, even if this inline substitution is omitted, the other rules for inline functions defined by 7.1.2 shall still be respected.
An interesting detail on this, is that the compiler would normally label the kind of behaviour that's involved here. For example: "it is unspecified" or "the behaviour is undefined" etc.

What is wrong with using inline functions?

While it would be very convenient to use inline functions at some situations,
Are there any drawbacks with inline functions?
Conclusion:
Apparently, There is nothing wrong with using inline functions.
But it is worth noting the following points!
Overuse of inlining can actually make programs slower. Depending on a function's size, inlining it can cause the code size to increase or decrease. Inlining a very small accessor function will usually decrease code size while inlining a very large function can dramatically increase code size. On modern processors smaller code usually runs faster due to better use of the instruction cache. - Google Guidelines
The speed benefits of inline functions tend to diminish as the function grows in size. At some point the overhead of the function call becomes small compared to the execution of the function body, and the benefit is lost - Source
There are few situations where an inline function may not work:
For a function returning values; if a return statement exists.
For a function not returning any values; if a loop, switch or goto statement exists.
If a function is recursive. -Source
The __inline keyword causes a function to be inlined only if you specify the optimize option. If optimize is specified, whether or not __inline is honored depends on the setting of the inline optimizer option. By default, the inline option is in effect whenever the optimizer is run. If you specify optimize , you must also specify the noinline option if you want the __inline keyword to be ignored. -Source
It worth pointing out that the inline keyword is actually just a hint to the compiler. The compiler may ignore the inline and simply generate code for the function someplace.
The main drawback to inline functions is that it can increase the size of your executable (depending on the number of instantiations). This can be a problem on some platforms (eg. embedded systems), especially if the function itself is recursive.
I'd also recommend making inline'd functions very small - The speed benefits of inline functions tend to diminish as the function grows in size. At some point the overhead of the function call becomes small compared to the execution of the function body, and the benefit is lost.
It could increase the size of the
executable, and I don't think
compilers will always actually make
them inline even though you used the
inline keyword. (Or is it the other
way around, like what Vaibhav
said?...)
I think it's usually OK if the
function has only 1 or 2 statements.
Edit: Here's what the linux CodingStyle document says about it:
Chapter 15: The inline disease
There appears to be a common
misperception that gcc has a magic
"make me faster" speedup option called
"inline". While the use of inlines can
be appropriate (for example as a means
of replacing macros, see Chapter 12),
it very often is not. Abundant use of
the inline keyword leads to a much
bigger kernel, which in turn slows the
system as a whole down, due to a
bigger icache footprint for the CPU
and simply because there is less
memory available for the pagecache.
Just think about it; a pagecache miss
causes a disk seek, which easily takes
5 miliseconds. There are a LOT of cpu
cycles that can go into these 5
miliseconds.
A reasonable rule of thumb is to not
put inline at functions that have more
than 3 lines of code in them. An
exception to this rule are the cases
where a parameter is known to be a
compiletime constant, and as a result
of this constantness you know the
compiler will be able to optimize most
of your function away at compile time.
For a good example of this later case,
see the kmalloc() inline function.
Often people argue that adding inline
to functions that are static and used
only once is always a win since there
is no space tradeoff. While this is
technically correct, gcc is capable of
inlining these automatically without
help, and the maintenance issue of
removing the inline when a second user
appears outweighs the potential value
of the hint that tells gcc to do
something it would have done anyway.
There is a problem with inline - once you defined a function in a header file (which implies inline, either explicit or implicit by defining a body of a member function inside class) there is no simple way to change it without forcing your users to recompile (as opposed to relink). Often this causes problems, especially if the function in question is defined in a library and header is part of its interface.
I agree with the other posts:
inline may be superfluous because the compiler will do it
inline may bloat your code
A third point is it may force you to expose implementation details in your headers, .e.g.,
class OtherObject;
class Object {
public:
void someFunc(OtherObject& otherObj) {
otherObj.doIt(); // Yikes requires OtherObj declaration!
}
};
Without the inline a forward declaration of OtherObject was all you needed. With the inline your
header needs the definition for OtherObject.
As others have mentioned, the inline keyword is only a hint to the compiler. In actual fact, most modern compilers will completely ignore this hint. The compiler has its own heuristics to decide whether to inline a function, and quite frankly doesn't want your advice, thank you very much.
If you really, really want to make something inline, if you've actually profiled it and looked at the disassembly to ensure that overriding the compiler heuristic actually makes sense, then it is possible:
In VC++, use the __forceinline keyword
In GCC, use __attribute__((always_inline))
The inline keyword does have a second, valid purpose however - declaring functions in header files but not inside a class definition. The inline keyword is needed to tell the compiler not to generate multiple definitions of the function.
I doubt it. Even the compiler automatically inlines some functions for optimization.
I don't know if my answer's related to the question but:
Be very careful about inline virtual methods! Some buggy compilers (previous versions of Visual C++ for example) would generate inline code for virtual methods where the standard behaviour was to do nothing but go down the inheritance tree and call the appropriate method.
You should also note that the inline keyword is only a request. The compiler may choose not to inline it, likewise the compiler may choose to make a function inline that you did not define as inline if it thinks the speed/size tradeoff is worth it.
This decision is generaly made based on a number of things, such as the setting between optimise for speed(avoids the function call) and optimise for size (inlining can cause code bloat, so isn't great for large repeatedly used functions).
with the VC++ compiler you can overide this decision by using __forceinline
SO in general:
Use inline if you really want to have a function in a header, but elsewhere theres little point because if your going to gain anything from it, a good compiler will be making it inline for you anyway.
Inlining larger functions can make the program larger, resulting in more instruction cache misses and making it slower.
Deciding when a function is small enough that inlining will increase performance is quite tricky. Google's C++ Style Guide recommends only inlining functions of 10 lines or less.
(Simplified) Example:
Imagine a simple program that just calls function "X" 5 times.
If X is small and all calls are inlined: Potentially all instructions will be prefetched into the instruction cache with a single main memory access - great!
If X is large, let's say approaching the capacity of the instruction cache:
Inlining X will potentially result in fetching instructions from memory once for each inline instance of X.
If X isn't inlined, instructions may be fetched from memory on the first call to X, but could potentially remain in the cache for subsequent calls.
Excessive inlining of functions can increase size of compiled executable which can have negative impact on cache performance, but nowadays compiler decide about function inlining on their own (depending on many criterias) and ignore inline keyword.
Among other issues with inline functions, which I've seen heavily overused (I've seen inline functions of 500 lines), what you have to be aware of are:
build instability
Changing the source of an inline function causes all the users of the header to recompile
#includes leak into the client. This can be very nasty if you rework an inlined function and remove a no-longer used header which some client has relied on.
executable size
Every time an inline is inlined instead of a call instruction the compiler has to generate the whole code of the inline. This is OK if the code of the function is short (one or two lines), not so good if the function is long
Some functions can produce a lot more code than at first appears. I case in point is a 'trivial' destructor of a class that has a lot of non-pod member variables (or two or 3 member variables with rather messy destructors). A call has to be generated for each destructor.
execution time
this is very dependent on your CPU cache and shared libraries, but locality of reference is important. If the code you might be inlining happens to be held in cpu cache in one place, a number of clients can find the code an not suffer from a cache miss and the subsequent memory fetch (and worse, should it happen, a disk fetch). Sadly this is one of those cases where you really need to do performance analysis.
The coding standard where I work limit inline functions to simple setters/getters, and specifically say destructors should not be inline, unless you have performance measurements to show the inlining confers a noticeable advantage.
In addition to other great answers, at least once I saw a case where forced inlining actually slowed down the affected code by 1.5x. There was a nested loop inside (pretty small one) and when this function was compiled as a separate unit, compiler managed to efficiently unroll and optimize it. But when same function was inlined into much bigger outer function, compiler (MSVC 2017) failed to optimize this loop.
As other people said that inline function can create a problem if the the code is large.As each instruction is stored in a specific memory location ,so overloading of inline function make a code to take more time to get exicuted.
there are few other situations where inline may not work
does not work in case of recursive function.
It may also not work with static variable.
it also not work in case there is use of a loop,switch etc.or we can say that with multiple statements.
And the function main cannot work as inline function.