What is the performance overhead of calling a std::mem_fn vs. a raw call to the wrapped member function? I expect that with an optimizing compiler it would be zero since there's probably polymorphic mechanics as there are in std::function, but I'm not sure since I don't really know the internals.
Related
I was wondering if that type of thing falls under compiler's optimizations purview. From what I've gathered from this talk even std::unique_ptr is not truly "zero-cost" in part due to the implicit indirection of this pointer which plays a role in passing the actual underlying pointer to the unique pointer's member function.
Is there always a reference involved in passing member fields to methods of a class or can the compiler see that, say this method does not use certain fields and doesn't modify the rest so it will just pass them by value?
There seems to be some misunderstanding of what the talk discusses. There is a discussion of why unique_ptr is not "zero-cost". However, that discussion focuses on a specific case, the transferal of ownership. On the one hand, since there is a situation where there are costs, it is true that unique_ptr is not zero-cost. On the other hand, that conclusion is misleading, as it sounds just as all-encompassing as saying that it is zero-cost. A more accurate description would combine the two views: unique_ptr can be a zero-cost replacement for a raw pointer, but not always.
A unique_ptr can be zero-cost. The first question from the Q&A session at the end of the talk addresses this (starting at 36:36). Most member functions of smart_ptr are simple enough to be inlined by any C++ compiler that understands template syntax. There is no overhead associated with the this pointer and member functions. If you never transfer ownership, go on thinking of unique_ptr as zero-cost.
The extra cost comes when ownership is transferred. The talk specifically focused on passing a unique_ptr as a parameter to a function. This unambiguously gives the called function ownership of whatever the pointer points to. It also entails an additional run-time cost (two additional costs if the raw pointer version lacked exception safety).
The extra cost is not intrinsic to the C++ language, but rather comes from a commonly-used ABI (application binary interface). The ABI defines at a low level (think assembly) how parameters are passed to functions. According to this convention, there is an important difference between T* and unique_ptr<T> – the former is a primitive type, while the latter is an instance of a class. If I understood this part of the talk correctly, the ABI calls for primitive types to be placed directly in the call stack (potentially simply stored in a register), whereas class instances must exist in main memory and a pointer to the instance is placed directly in the stack / potentially in a register. Yes, even if the object is passed by value. Why? Because that's what the convention calls for. (There are better reasons, but they are tangential to the current subject.) In order for things like dynamic libraries to work, there needs to be a convention, and this ABI is what we have.
The upshot is that primitives receive preferential treatment, making them faster. When you switch a function's parameter from a pointer to a class instance, there is a runtime cost (of unknown size – it might be insignificant). This is the cost that prevents unique_ptr from being zero-cost in all cases. Zero-cost in many common cases, but not when a function takes a unique_ptr argument.
I have seen many instances where people have advised against using std::function<> because it is a heavyweight mechanism. Could someone please explain why that is so?
std::function is a type erasure class.
It takes whatever it is constructed from, and erases everything except:
Invoke with the signature in question (with possible implicit casting)
Destroy
Copy
Cast back to exact original type
and possibly
Move
This involves some overhead. A typical decent-quality std::function will have small object optimization (like small string optimization), avoiding a heap allocation when the amount of memory used is small.
A function pointer will fit in there.
However, there is still overhead. If you initialize a std::function with a compatible function pointer, instead of directly calling the function pointer in question, you do a virtual function table lookup, or invoke some other function, which then invokes the function pointer.
With a vtable implementation, that is a possible cache miss, an instruction cache miss, then another instruction cache miss. With a function pointer, the pointer is probably stored locally, and it is called directly, resulting on one possible instruction cache miss.
On top of this, in practice compilers understand function pointers better than std::functions: a number of compilers can figure out that the pointer is constant value during inlining or whole program optimization. I have never seen one that pulls that off with std::function.
For larger objects (say larger than sizeof(std::string) in one implementation), a heap allocation is also done by the std::function. This is another cost. For function pointers and reference wrappers, SOO is guaranteed by the standard.
Directly storing the lambda without storing it in a std::function is even better than a function pointer: in that case, the code being run is implicit in the type of the lambda. This makes it trivial for code to work out what is going to happen when it is called, and inlining easy for the compiler.
Only do type erasure when you need to.
Under the hood, std::function typically uses type erasure (one simplified explanation for how it may be implemented is here). The cost of storing your function object inside the std::function object may involve a heap allocation. The cost of invoking your function object is typically an indirection through a pointer plus a virtual function call. Also, while compilers are getting better at this, the virtual function call usually inhibits inlining of your function.
That being said, I recommend using std::function unless you know via measurements that the cost is too high (typically when you cannot afford heap allocations, your function will be called many times in a place that requires very low latency, etc.), as it is better to write straightforward code than to prematurely optimize.
Depending of the implementation, std::function will add some overhead due to the use of type easure. They have been some other implementation such as Don Clugston's fast delegate, with a C++11 implementation here. Please note that it uses UB to make the fastest possible delegate, but is still extremely portable.
If you want type erasure it's the right tool for the job and almost certainly not your bottleneck and not something you could write faster anyway.
However sometimes it can be all to tempting to use type erasure when it really isn't required. That's where to draw the line. For example if all you want to do is keep hold of a lambda locally then it's probably not the right tool and you should just use:
auto l = [](){};
Likewise for function pointers you don't plan to type erase - just use a function pointer type.
You also don't need type erasure for templates from <algorithm> or your own equivalents because there's simply no need for heterogenous functor types to coexist.
It's not so.
To put it simply, it's not too heavyweight unless you profiled your program and showed that it is too heavyweight. Since evidently you did not (otherwise you would know the answer to this question), we can safely conclude that it is in fact not too heavyweight at all.
You should always profile, before concluding that it's too slow.
Considering the fact that a pointer to a function returning another pointer to another function is the mechanism used in C to introduce some runtime polymorphism/callbacks, What is the equivalent way to implement this in C++ while improving locality and lower the cost about pointers and indirections ?
For example this syntactic sugar can help but I'm not really interested in this, altough it's a nice way to do things in a C++ way instead of a more C-ish typedef, I'm more interested in improving locality while trying to reduce the use of explicit pointers at runtime.
The real reason as to why people use function pointers in C to emulate polymorphism is not performance but the fact that C neither supports real polymorphism nor templates. These are two alternatives you have in C++. All three approaches are compared in this thread.
Note that even though calling a function pointer does not require the additional vtable lookup that virtual function calls do, calling virtual functions and function pointers both suffer from the same major performance problem: Branch prediction in both cases is not as reliable and you tend to end up with more pipeline flushes.
I think you can use Virtual function for meet part of the requirement.
I didn't find information about performance issues with auto_ptr and shared_ptr(I use tr1 implementation). As shared_ptr is more complicated compared to auto_ptr so auto_ptr is faster? So in general question sounds like what can you say about performance of shared_ptr with comparison with auto_ptr?
PS: I know that now unique_ptr is preferable than auto_ptr, but not all compilers support it, so question is about auto_ptr and shared_ptr
When dereferencing there are no performance differences.
If you allocate your shared_ptr's with make_shared you don't waste any heap allocations.
When copying the pointer (not the object), a shared_ptr is a little bit slower because it needs to increase it's reference counter.
However, these probably does not matter, so stick with shared_ptr combined with make_shared.
auto_ptr is a deprecated C++98 construct. As such, any performance benefit it might have is far outweighed by the fact that you shouldn't use it anymore in newly-written C++11 code, and instead badger the compiler / library maintainers to implement support for unique_ptr.
That being said, use what semantically fits your use case, unless you have profiled that you have a problem with *_ptr...
As with all performance issues: you need to measure it for yourself in your particular setup.
In general, though, you can expect some more overhead for a shared_ptr<> than for auto_ptr<> as it has to do some more work behind the scene to ensure proper shared behavior for the enclosed pointer.
On the other hand, the two are not really comparable: auto_ptr<> supports only one copy (the ownership is transfered on copy), while shared_ptr<> supports multiple copies. So unless you only use the above pointers for one copy-pointer, you cannot make a meaningful comparision. If you do use one-copy pointer and you are sure you will not need more copies, use the specialized auto_ptr<>.
In any case, whether the performance differences are significant depends on your particular project.
In general the only good answer is to measure.
But for this case you know that shared_ptr, unless you're using make_shared it involves a dynamic allocation of a counter block, and unless you have a really good small objects allocator, it's bound to be really slow (talking orders of magnitude) to create the first shared pointer to an object.
But then, perhaps with Boost and the standard library that allocation is really optimized. And maybe, probably!, you can use make_shared to use just 1 allocation for both the object and its control block. So … measure.
By the way, auto_ptr is deprecated.
With any modern compiler use unique_ptr instead.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Virtual Functions and Performance C++
Is this correct, that class member function takes more time than a simple function? What if inheritance and virtual functions are used?
I have tried to collect my functions into a simple interface class (only member functions, no data members), and it appears that I lose time. Is there a way to fix it?
P.S. I'm checking with gcc and icc compilers and using -O3 option.
Premature optimization is the root of all evil
A nonstatic member function takes an additional argument, which is the object (a pointer or reference to it) on which the function is called. This is one overhead. If the function is virtual, then there is also one small indirection in case of a polymorphic call, that is, addition of the function index to the virtual table base offset. Both of these "overheads" are sooo negligable, you shouldn't worry about it unless the profiler says this is your bottleneck. Which it most likely is not.
Premature optimization is the root of all evil
Member functions, if they're not virtual, are same as free functions. There is no overhead in its invocation.
However, in case of virtual member functions, there is overhead, as it involves indirection, and even then, it is slower when you call the virtual function through pointer or reference (that is called polymorphic call). Otherwise, there is no difference if the call is not polymorphic.
There is no additional time penalty involved for member functions. Virtual functions are slightly slower but not by much. Unless you're running an incredibly tight loop, even virtual function overhead is negligible.
For normal functions it's enough with a "jump" to them, which is very fast. The same goes for normal member functions. Virtual functions on the other hand, the address to jump to has to be fetched from a table, which of course involves more machine code instructions and therefore will be slower. However, the difference is negligible and will hardly even be measurable.
In other words, don't worry about it. If you have slowdowns, then it's most likely (like 99,999%) something else. Use a profiler to find out where.