I'm wondering if C++ will still obey the inline keyword when a function is passed as an agument. In the following example, would a new frame for onFrame be pushed onto the stack every time frame() is called in the while loop?
bool interrupt = false;
void run(std::function<void()> frame) {
while(!interrupt) frame();
}
inline void onFrame() {
// do something each frame
}
int main() {
run(onFrame);
}
Or would changing to this have any effect?
void run(std::function<inline void()> frame) {
while(!interrupt) frame();
}
If you have no definitive answer, can you help me find a way to test this? Possibly using memory addresses or some sort of debugger?
It's going to be pretty hard for the compiler to inline your function if it has to go through std::function's type-erased dispatch to get there. It's possible it'll happen anyway, but you're making it as hard as possible. Your proposed alternative (taking a std::function<inline void()> argument) is ill-formed.
If you don't need type erasure, don't use type erasure. run() can simply take an arbitrary callable:
template <class F>
void run(F frame) {
while(!interrupt) frame();
}
That is muuch easier to inline for the compiler. Although, simply having an inline function does not in of itself guarantee that the function gets inlined. See this answer.
Note also that when you're passing a function pointer, that also makes it less likely to get inlined, which is awkward. I'm trying to find an answer on here that had a great example, but until then, if inlining is super important, wrapping it in a lambda may be the way to go:
run([]{ onFrame(); });
still obey the inline keyword ... would a new frame ... be pushed onto the stack
That isn't what the inline keyword does in the first place (see this question for extensive reference).
Assuming, as Barry does, that you're hoping to persuade the optimiser to inline your function call (once more for luck: this is nothing to do with the inline keyword), function template+lambda is probably the way to go.
To see why this is, consider what the optimiser has to work with in each of these cases:
function template + lambda
template <typename F>
void run(F frame) { while(!interrupt) frame(); }
// ... call site ...
run([]{ onFrame(); });
here, the function only exists at all (is instantiated from the template) at the call site, with everything the optimizer needs to work in scope and well-defined.
Note the optimizer may still reasonably choose not to inline a call if it thinks the extra instruction cache pressure will outweigh the saving of stack frame
function pointer
void run(void (*frame)()) { while(!interrupt) frame(); }
// ... call site ...
run(onFrame);
here, run may have to be compiled as a standalone function (although that copy may be thrown away by the linker if it can prove no-one used it), and same for onFrame, especially since its address is taken. Finally, the optimizer may need to consider whether run is called with many different function pointers, or just one, when deciding whether to inline these calls. Overall, it seems like more work, and may end up as a link-time optimisation.
NB. I used "standalone function" to mean the compiler likely emits the code & symbol table entry for a normal free function in both cases.
std::function
This is already getting long. Let's just notice that this class goes to great lengths (the type erasure Barry mentioned) to make the function
void run(std::function<void()> frame);
not depend on the exact type of the function, which means hiding information from the compiler at the point it generates the code for run, which means less for the optimiser to work with (or conversely, more work required to undo all that careful information hiding).
As for testing what your optimiser does, you need to examine this in the context of your whole program: it's free to choose different heuristics depending on code size and complexity.
To be totally sure what it actually did, just disassemble with source or compile to assembler. (Yes, that's potentially a big "just", but it's platform-specific, not really on-topic for the question, and a skill worth learning anyway).
Compile for release and check the list files, or turn on disassembly in debugger. Best way to know is to check the generated code.
Related
The basic question:
Edit: v-The question-v
class foo {
public:
constexpr foo() { }
constexpr int operator()(const int& i) { return int(i); }
}
Performance is a non-trivial issue. How does the compiler actually compile the above? I know how I want it to be resolved, but how does the specification actually specify it will be resolved?
1) Seeing the type int has a constexpr constructor, create a int object and compile the string of bytes that make the type from memory into the code directly?
2) Replace any calls to the overload with a call to the 'int's constructor that for some unknown reason int doesn't have constexpr constructors? (Inlining the call.)
3) Create a function, call the function, and have that function call 'int's consctructor?
Why I want to know, and how I plan to use the knowledge
edit:v-Background only-v
The real library I'm working with uses template arguments to decide how a given type should be passed between functions. That is, by reference or by value because the exact size of the type is unknown. It will be a user's responsibility to work within the limits I give them, but I want these limits to be as light and user friendly as I can sanely make them.
I expect a simple single byte character to be passed around in which case it should be passed by value. I do not bar 300mega-byte behemoth that does several minuets of recalculation every time a copy constructor is invoked. In which case passing by reference makes more sense. I have only a list of requirements that a type must comply with, not set cap on what a type can or can not do.
Why I want to know the answer to my question is so I can in good faith make a function object that accepts this unknown template, and then makes a decision how, when, or even how much of a object should be copied. Via a virtual member function and a pointer allocated with new is so required. If the compiler resolves constexpr badly I need to know so I can abandon this line of thought and/or find a new one. Again, It will be a user's responsibility to work within the limits I give them, but I want these limits to be as light and user friendly as I can sanely make them.
Edit: Thank you for your answers. The only real question was the second sentence. It has now been answered. Everything else If more background is required, Allow me to restate the above:
I have a template with four argument. The goal of the template is a routing protocol. Be that TCP/IP -unlikely- or node to node within a game -possible. The first two are for data storage. They have no requirement beyond a list of operators for each. The last two define how the data is passed within the template. By default this is by reference. For performance and freedom of use, these can be changed define to pass information by value at a user's request.
Each is expect to be a single byte long. They could in the case of metric for a EIGRP or OSFP like protocol the second template argument could be the compound of a dozen or more different variable. Each taking a non-trival time to copy or recompute.
For ease of use I investigate the use a function object that accepts the third and fourth template to handle special cases and polymorphic classes that would fail to function or copy correctly. The goal to not force a user to rebuild their objects from scratch. This would require planning for virtual function to preform deep copies, or any number of other unknown oddites. The usefulness of the function object depends on how sanely a compiler can be depended on not generate a cascade of function calls.
More helpful I hope?
The C++11 standard doesn't say anything about how constexpr will be compiled down to machine instructions. The standard just says that expressions that are constexpr may be used in contexts where a compile time constant value is required. How any particular compiler chooses to translate that to executable code is an implementation issue.
Now in general, with optimizations turned on you can expect a reasonable compiler to not execute any code at runtime for many uses of constexpr but there aren't really any guarantees. I'm not really clear on what exactly you're asking about in your example so it's hard to give any specifics about your use case.
constexpr expressions are not special. For all intents and purposes, they're basically const unless the context they're used in is constexpr and all variables/functions are also constexpr. It is implementation defined how the compiler chooses to handle this. The Standard never deals with implementation details because it speaks in abstract terms.
In my implementation files (.cc files), I often find that it's convenient to define member functions in class definitions (such as the functions of a Pimpl class). For example:
struct X::Impl {
void DoSomething() {
...
}
};
instead of
struct X::Impl {
void DoSomething();
};
void X::Impl::DoSomething() {
...
}
I think this is preferable to implementing the function outside of the class definition for several reasons. It enhances readability and facilitates the practice of keeping methods small (by making it easy to add them). The code is also easier to maintain since you never have to update method declarations.
The only downside I see is that methods defined in the class declaration are implicitly inlined, which is not usually desirable because of the increase in the size of the object code.
My questions are:
Do I have this right? Are there other downsides to this practice that I'm missing?
Is the implicit inlining something to worry about? Is the compiler smart enough to reject my implicit request to inline methods that shouldn't be inlined?
Is it possible (via compiler extensions or otherwise) to declare that a method defined in the class definition not be inlined?
The simple answer is that you should not care. Member functions defined within the class definition are implicitly inline, but that does not mean that they are inlined (i.e. the code need not be inlined at the place of call).
Compiler implementors have dedicated quite a bit of time and resources to come up with heuristics that determine whether actual inlining should be done or not, based on the size of the function, the complexity and whether it can be inlined at all or not (a recursive function cannot be inlined[*]). The compiler has more information on the generated code and the architecture in which it will run than most of us have. Trust it, then if you feel that there might be an issue, profile, and if profiling indicates that you should change the code, do it, but make an informed decision after the fact.
If you want to verify whether the function has actually be inlined or not, you can look at the assembly and check whether there are calls to the function or the code was really inlined.
[*] If the compiler can transform the recursion into iteration, as is the case in tail recursion, then the transformed function could be theoretically be inlined. But then, functions with loops have lesser probabilities of being inlined anyway...
I don't know portable way of preventing inlining, but with GCC you can use attribute. For example:
struct X::Impl {
void DoSomething() __attribute__((noinline)) {
...
}
};
Add up all the CPU time that will ever be saved by your inlining choices. Spend less than that of your own on them. Linear pathlength worries are for paths handling now trillions of executions a day. Four billion is near enough a second out of L1. 250 times that isn't five minutes. If nobody's complaining about performance, your and your compiler's choices are at least playing in the right league.
Opinions on readability differ widely. It's your audience's opinion that matters.
Generally, the compiler will inline whatever it pleases, and it won't pay too much attention to any inline keywords given by the programmer. inline is usually treated more or less the same as static.
For example, a recursive function can usually not be completely inlined (the function body you'd inline will contain again a call that should be inlined). To test the behavior of your specific compiler, you could for example define a class like this:
class Fib {
public:
int calc(int i);
};
int Fib::calc(int i) {
if (i > 1)
return calc(i-1) + calc(i-2);
return 1;
}
With this definition, a sample program like the following would not be compilable in reasonable time or size if the compiler would feel obligated to inline all calls to calc:
#include "tst.hpp"
#include <iostream>
int main() {
Fib f;
std::cout << f.calc(1000) << std::endl;
};
So you can test you compilers behavior by compiling this program. If compilation succeeds, the compiler didn't inline all calls to Fib::calc.
I am looking a tool able to detect ordered function call pairs in a nested fashion as shown below:
f() // depth 0
f() //depth 1
g()
g()
At each depth of call f() there must be a call of g() forming function call pair. This is particularly important in critical section entry and exit.
In C++, one option is to wrap the calls to f() and g() in the constructor and destructor of a class and only call those functions by instantiating an instance of that class. For example,
struct FAndGCaller
{
FAndGCaller() { f(); }
~FAndGCaller() { g(); }
};
This can then be used in any scope block like so:
{
FAndGCaller call_f_then_later_g; // calls f()
} // calls g()
Obviously in real code you'd want to name things more appropriately, and often you'll simply want to have the contents of f() and g() in the constructor and destructor bodies, rather than in separate functions.
This idiom of Scope Bound Resource Management (SBRM, or more commonly referred to as Resource Acquisition is Initialization, RAII) is quite common.
You may abuse a for-loop for this.
#define SAVETHEDAY for (bool seen = ((void)f(), true); seen; seen = ((void)g(), false))
The comma operator always lets your functions f be executed before the dependent statement and g afterwards. E.g
SAVETHEDAY {
SAVETHEDAY {
}
}
Pros:
Makes nesting levels clear.
Works for C++ and C99.
The pseudo for-loop will be
optimized away by any decent
compiler.
Cons:
You might have surprises with
break, return and continue
inside the blocks, so g might not be called in such a situation.
For C++, this is not safe against a
throw inside, again g might not be called
Will be frowned upon by many people since is in some sort extending the language(s).
Will be frowned upon by many people
especially for C++ since generally
such macros that hide code are
thought to be evil
The problem with continue can be repaired by doing things a bit more cleverly.
The first two cons can be circumvented in C++ by using a dummy type as for-variable that just has f and g in the constructor and destructor.
Scan through the code (that's the hard part) and every time you see an invocation of f(), increment a counter. Every time you see an invocation of g(), decrement the counter. At the end, the counter should be back to zero. If it ever goes negative, that's a problem as well (you had a call to g() that wasn't preceded by a matching call to f()).
Scanning the code accurately is the hard part though -- with C and (especially) C++, writing code to understand source code is extremely difficult. Offhand, I don't know of an existing tool for this particular job. You could undoubtedly get clang (for one example) to do it, but while it'll be a lot easier than doing it entirely on your own, it still won't be trivial.
The Coccinelle tool for semantic searching and patching of C code is designed for this sort of task (see also this LWN article on the tool).
Can anyone tell me the main difference between an inline function and recursive function?
These are unrelated concepts.
An function may be declared inline, which signals to the compiler that any calls to the function should be replaced by an implementation of the function directly at the point the call is made. It's vaguely like implementing some piece of logic as a macro, but it retains the clean semantics of a normal function call.
A recursive function is simply one that calls itself.
Note that the inline keyword is just a suggestion. The compiler is free to ignore it whenever it wants.
Also note that a recursive can also be declared inline. The compiler may, in principle, be able to inline a recursive function by transforming it into an iterative algorithm inside the calling function. However, recursion is usually one of the things that will make a compiler give up on inlining.
These are two very different concepts.
99% of programming languages allow recursive functions. A recursive function re-calls it's self to get something done. Most recursive functions can be rewritten as loops.
E.g. Simple recursive function.
int Factorial(int f)
{
if(f > 1)
return f * Factorial(f-1);
else
return 1;
}
An in-lined a function is a hint to the compiler that you don't want the processor to jump to this function, instead just include the op-codes for the function where ever it is used. This builds faster code for some calls one some architectures.
Be aware that most modern compilers targeting non-embedded processors will choose to ignore your "inline" hints, and will choose what to inline itself.
Hope this helps, apologies if badly formatted, typed on my I-phone.
A recursive function is a function that calls itself.
An inlined function is a function, that is "inserted into another function", i.e. if you have an inlined function add(a,b), and you call it from a function func, the compiler may be able to integrate the function body of add into the body of func so the arguments don't need to be pushed onto the stack.
The question: Is there benefit to passing an integral type by const reference as opposed to simply by value.
ie.
void foo(const int& n); // case #1
vs
void foo(int n); // case #2
The answer is clear for user defined types, case #1 avoids needless copying while ensuring the constness of the object. However in the above case, the reference and the integer (at least on my system) are the same size, so I can't imagine there being a whole lot of difference in terms of how long it takes for the function call (due to copying). However, my question is really related to the compiler inlining the function:
For very small inline functions, will the compiler have to make a copy of the integer in case #2? By letting the compiler know we won't change the reference can it inline the function call without needless copying of the integer?
Any advice is welcome.
Passing a built-in int type by const ref will actually be a minor de-optimization (generally). At least for a non-inline function. The compiler may have to actually pass a pointer that has to be de-referenced to get the value. You might think it could always optimize this away, but aliasing rules and the need to support separate compilation might force the compiler's hand.
However, for your secondary question:
For very small inline functions, will the compiler have to make a copy of the integer in case #2? By letting the compiler know we won't change the reference can it inline the function call without needless copying of the integer?
The compiler should be able to optimize away the copy or the dereference if semantics allow it, since in that situation the compiler has full knowledge of the state at the call site and the function implementation. It'll likely just load the value into a register have its way with it and just use the register for something else when it's done with the parameter. Of course,all this is very dependent on the actual implementation of the function.
I actually find it irritating when somebody uses const references like this for the basic datatypes. I can't see any benefit of doing this, although it may be argued that for datatypes bigger than sizeof(pointer) it may be more efficient. Although, I really don't care about such minute 'optimizations'.
It depends on the compiler, but I'd expect that any reasonable optimizer would give you the same results either way.
I tested with gcc, and the results were indeed the same. here's the code I tested:
inline int foo(const int& n) {
return n * 2;
}
int bar(int x) {
int y = foo(x);
return y;
}
(with and without const & on foo's n parameter)
I then compiled with gcc 4.0.1 with the following command line:
g++ -O3 -S -o foo.s foo.cc
The outputs of the two compiles were identical.
It's usually not worth it. Even for inline function, the compiler won't be stupid. The only time I would say it's appropriate is if you had a template; it might not be worth the extra effort to specialize for builtins just to take a copy instead of a reference.
You can use boost::call_traits<your type>::param_type for optimal parameter passing. Which defaults to simple parameter passing of primitive types and passing by const reference of structs and classes.
A lot of people are saying there's no difference between the two. I can see one (perhaps contrived) case in which the difference would matter...
int i = 0;
void f(const int &j)
{
i++;
if (j == 0)
{
// Do something.
}
}
void g()
{
f(i);
}
But.. As others have mentioned... integers and pointers are likely to be of similar size. For something as small as an integer, references will decrease your performance. It probably won't be too noticeable unless your method is called a lot, but it will be there. On the other hand, under some circumstances the compiler may optimize it out.
When writing or using templates, you may end up with (const int &) because the template writer can't know what the type actually is. If the object is heavyweight, passing a reference is the right thing to do; if it's an int or something, the compiler may be able to optimize it away.
In the absence of some kind of external requirement, there is generally no reason to do something like this for a one-off function -- it's just extra typing, plus throwing around references actually tends to inhibit optimization. Copying small data in registers is much cheaper than reloading it from memory in case it's changed!
I can't think of any benefit. I've even seen recommendation that when writing templates, you use meta-programming to pass integral types by value and only use const reference for non-integral types.
well the cost of a reference is the same typically of that of an integral type, but with the reference you have an indirection that has to take place, because the reference to some memory has to be resolved into a value.
Just copy by value, stick to an immutable convention for built-in types.
Don't do this. int is the same size as pointer/reference on common 32-bit plattforms, and smaller on 64-bit, thus you could get yourself a performance disadvantage instead of a benefit. I mean, all function arguments are pushed onto stack in order so that a function can read them, and it will either be your int, or its address in the case of reference. Another disadvantage is that the callee will either access your n through an indirection (dereferencing an address), or it will make copy on its stack as an optimization.
If you make some changes to an int passed by value, it might be written either back onto the place on the stack where it was passed, or onto a new stack position. The second case naturally isn't advantagous, but shouldn't happen. By consting you bar yourself from making such a change, but this would work the same with const int.
In the proper inline case it doesn't matter, naturally, but keep in mind that not everything where you write inline, will be.
Please read Want Speed? Pass by Value by Dave Abrahams.
It's not only performance.
A true story: this week I noticed that a colleague tried to improve upon the Numerical Recipes and replaced the macro
#define SHFT(a,b,c,d) do { (a)=(b); (b)=(c); (c)=(d); } while (0)
by this function
inline void Rotate(double& dFirst, double& dSecond, double& dThird, const double dNewValue)
{
dFirst = dSecond;
dSecond = dThird;
dThird = dNewValue;
} // Function Rotate
This would have worked, if he had passed the last parameter by reference, but as it is, this code
Rotate(dum,*fb,*fa,dum);
which was supposed to swap *fa and *fb no longer works.
Passing it by reference without const is not possible, as in other places non-l-values are passed to the last parameter.