Overhead with std::function

Overhead with std::function - c++

I have seen many instances where people have advised against using std::function<> because it is a heavyweight mechanism. Could someone please explain why that is so?

std::function is a type erasure class.
It takes whatever it is constructed from, and erases everything except:
Invoke with the signature in question (with possible implicit casting)
Destroy
Copy
Cast back to exact original type
and possibly
Move
This involves some overhead. A typical decent-quality std::function will have small object optimization (like small string optimization), avoiding a heap allocation when the amount of memory used is small.
A function pointer will fit in there.
However, there is still overhead. If you initialize a std::function with a compatible function pointer, instead of directly calling the function pointer in question, you do a virtual function table lookup, or invoke some other function, which then invokes the function pointer.
With a vtable implementation, that is a possible cache miss, an instruction cache miss, then another instruction cache miss. With a function pointer, the pointer is probably stored locally, and it is called directly, resulting on one possible instruction cache miss.
On top of this, in practice compilers understand function pointers better than std::functions: a number of compilers can figure out that the pointer is constant value during inlining or whole program optimization. I have never seen one that pulls that off with std::function.
For larger objects (say larger than sizeof(std::string) in one implementation), a heap allocation is also done by the std::function. This is another cost. For function pointers and reference wrappers, SOO is guaranteed by the standard.
Directly storing the lambda without storing it in a std::function is even better than a function pointer: in that case, the code being run is implicit in the type of the lambda. This makes it trivial for code to work out what is going to happen when it is called, and inlining easy for the compiler.
Only do type erasure when you need to.

Under the hood, std::function typically uses type erasure (one simplified explanation for how it may be implemented is here). The cost of storing your function object inside the std::function object may involve a heap allocation. The cost of invoking your function object is typically an indirection through a pointer plus a virtual function call. Also, while compilers are getting better at this, the virtual function call usually inhibits inlining of your function.
That being said, I recommend using std::function unless you know via measurements that the cost is too high (typically when you cannot afford heap allocations, your function will be called many times in a place that requires very low latency, etc.), as it is better to write straightforward code than to prematurely optimize.

Depending of the implementation, std::function will add some overhead due to the use of type easure. They have been some other implementation such as Don Clugston's fast delegate, with a C++11 implementation here. Please note that it uses UB to make the fastest possible delegate, but is still extremely portable.

If you want type erasure it's the right tool for the job and almost certainly not your bottleneck and not something you could write faster anyway.
However sometimes it can be all to tempting to use type erasure when it really isn't required. That's where to draw the line. For example if all you want to do is keep hold of a lambda locally then it's probably not the right tool and you should just use:
auto l = [](){};
Likewise for function pointers you don't plan to type erase - just use a function pointer type.
You also don't need type erasure for templates from <algorithm> or your own equivalents because there's simply no need for heterogenous functor types to coexist.

It's not so.
To put it simply, it's not too heavyweight unless you profiled your program and showed that it is too heavyweight. Since evidently you did not (otherwise you would know the answer to this question), we can safely conclude that it is in fact not too heavyweight at all.
You should always profile, before concluding that it's too slow.

Related

std::function internal memory organization and copies; passing reference vs value

When a std::function is copied, are the code instructions it references copied as well?
An std::function is initialized via some form of callable, that points to executable code in some way (like a function pointer typically does). Now, when a function-object is copied, is this executable code runtime copied or internally referenced?
To rephrase the question: If one instance of std::function is copied, are there then multiple copies of the same compiled code instructions in memory?
Is std::function an object that actually stores the function code or is it more an abstraction for a function pointer?
The former would seem wasteful and I don't suspect it, but everything I found so far on the subject is either too vague, lacking or too specific for me to say for me for sure. For example
When the target is a function pointer or a std::reference_wrapper, small object optimization is guaranteed, that is, these targets are always directly stored inside the std::function object, no dynamic allocation takes place. Other large objects may be constructed in dynamic allocated storage and accessed by the std::function object through a pointer. - cppreference
gives some hints about how it's done but seems still too vague and maybe is not related at all to this question, because of further abstractions inside of std::function.
For context: I am trying to refactor some bad C-ish code that maps input-events (keystrokes, mouse input and the like) to a certain behavior, which is executed upon a target data structure which can be interpreted by the program as more specific input that have semantic context other than than keystrokes (, aka keybindings). One can suspect that requirements of behaviours varies drastically.
This was previously implemented with lists of defines and numbers specifying input-event-ids, and hard-coded behavior, which was selected by switch-case. We quickly approach the border of where this intial way of doing it becomes unwieldly.
To get out of the defined lists to an expandable, declarative, object oriented and flexible design I consider higher order functions.
Especially since some behavior is quite simple and repeatedly needed (like for example the toggle of one value in the output data structure) other behaviors are more complex with multiple conditions attached, I'd like to declare some of the behavior statically, but still would like to be open to just assign some special lambda in some cases. Since I need to store behavior per input (key, mousebutton, mouse-axis, etc.) and potentially many copies of one certain behaviour type can be instantiated in one time for different sets of keybindings, I wonder if this behavior should be referenced, rather than stored by value. In the former case, fresh lambdas would need to be owned by the behavior structures, but statically declared behavior does not, which pragmatically would lead to some shared_ptr shenanigans. In the latter case, by value, this would not be an issue, but I wouldn't want multiple copies of for example the toggle behavior to cause too much redundant overhead instead.

(Note: the whole discussion below is a little simplified. AFAIK, none of it is wrong, but I did omit some details and edge cases and definitions and implementation stuff.)
The std::function does not copy any executable code. The executable code is always merely pointed to, by std::function. And when the std::function gets copied, the pointer gets duplicated (which is completely fine, because executable code is never freed either.) So far, there is no difference between a plain old function pointer and a std::function.
But that's not the whole story.
Contrary to function pointers, instances of std::function can carry around "state" as well as a pointer to the executable code, and the whole hubbub about std::function having to allocate/deallocate and copy/move data around is about this extra state, not the function pointer.
Suppose that you have code like this:
(And note that although I've used a lambda here, the following explanation would have been equally applicable for "functors" and "function objects" and "bind results" and other forms of callable things in C++, all except plain old function pointers.)
int x = 42, y = 17;
std::function<int()> f = [x, y] {return x + y;};
Here, f not only stores the pointer to the executable code for return x + y;, but it also has to remember the value of x and y. Since the amount of state that you can "capture" in this way is not limited, then - by definition - the std::function must allocate memory from the heap upon construction, and deallocate it, copy it and move it at appropriate times. Again, it is this extra "state" that gets copied, not the code.
Let's review: each std::function needs to be able to store at least a pointer to executable code, and 0 or more bytes of extra captured state. If there is no captured state, a std::function is essentially the same as a function pointer (although in practice, std::functions are usually implemented polymorphically and have other stuff in there.)
Some (most) implementations of std::function that I'm aware of employ an optimization that is called "Small Object Optimization". In these implementations, in addition to the space for the pointer to code, the std::function object has some more (fixed amount of) space inside its instance (i.e. as a member of its class, as opposed to somewhere else on the heap) and will use that area if the total number of bytes of the captured state would fit in there. This eliminates the heap allocation, which is important in some use cases and would balance out the additional memory used (when there is no or little state to capture.)

I think the information in regarding the exceptions share some light:
Does not throw if other's target is a function pointer or a std::reference_wrapper, otherwise may throw std::bad_alloc or any exception thrown by the constructor used to copy or move the stored callable object. CppReference
This seems to imply that every copy of the std::function copies the contained callable as well. For example, in case your function contains a lambda with a vector, that lambda and by result vector gets copied. The actual machine code that is linked to it, stays in the read-only part of your executable and won't be copied.
An update from the c++20 standard draft: 20.14.16.2.1 Constructors and destructor[func.wrap.func.con]
function(const function& f);
Postconditions: !*this if !f; otherwise, *this targets a copy off.target().
Throws: Nothing iff’s target is a specialization ofreference_wrapperor
a function pointer. Otherwise, may throwbad_allocor any exception
thrown by the copy constructor of the stored callable object.
[Note:
Implementations should avoid the use of dynamically allocated memory
for small callable objects for example, where f’s target is an object holding only a pointer or reference to an object and a member function pointer. — end note]

It seems that std::function does only manage one callable.
If copied, what happens to code is specified by the callable itself.
In a function pointer case, only a function pointer needs to be copied.
In a lambda or custom callable case this would be determined by the implementation of the copy of lambdas or any custom callable class.
These latter 2 typically can hold members of their own, outside of the reference to code. Therefore some space must be allocated by std::function to accomodate these cases. This is however misleading as it could seem std::function as allocating space for code. The management of instruction code seems to be done by the callable however this is done internally there.
In this context default behavior of typically used callables (like lambdas) when copied seems far more interesting for the intended question, but does seem to strech the posed question too far out of the bounds of the context of std::function.
I therefore would consider this question as solved as posed and deepen my knowledge about how lamdas are implemented especially in regards to how they are compiled and the compiled code referenced.

Does using a method always imply an indirect access to member fields through "this" pointer?

I was wondering if that type of thing falls under compiler's optimizations purview. From what I've gathered from this talk even std::unique_ptr is not truly "zero-cost" in part due to the implicit indirection of this pointer which plays a role in passing the actual underlying pointer to the unique pointer's member function.
Is there always a reference involved in passing member fields to methods of a class or can the compiler see that, say this method does not use certain fields and doesn't modify the rest so it will just pass them by value?

There seems to be some misunderstanding of what the talk discusses. There is a discussion of why unique_ptr is not "zero-cost". However, that discussion focuses on a specific case, the transferal of ownership. On the one hand, since there is a situation where there are costs, it is true that unique_ptr is not zero-cost. On the other hand, that conclusion is misleading, as it sounds just as all-encompassing as saying that it is zero-cost. A more accurate description would combine the two views: unique_ptr can be a zero-cost replacement for a raw pointer, but not always.
A unique_ptr can be zero-cost. The first question from the Q&A session at the end of the talk addresses this (starting at 36:36). Most member functions of smart_ptr are simple enough to be inlined by any C++ compiler that understands template syntax. There is no overhead associated with the this pointer and member functions. If you never transfer ownership, go on thinking of unique_ptr as zero-cost.
The extra cost comes when ownership is transferred. The talk specifically focused on passing a unique_ptr as a parameter to a function. This unambiguously gives the called function ownership of whatever the pointer points to. It also entails an additional run-time cost (two additional costs if the raw pointer version lacked exception safety).
The extra cost is not intrinsic to the C++ language, but rather comes from a commonly-used ABI (application binary interface). The ABI defines at a low level (think assembly) how parameters are passed to functions. According to this convention, there is an important difference between T* and unique_ptr<T> – the former is a primitive type, while the latter is an instance of a class. If I understood this part of the talk correctly, the ABI calls for primitive types to be placed directly in the call stack (potentially simply stored in a register), whereas class instances must exist in main memory and a pointer to the instance is placed directly in the stack / potentially in a register. Yes, even if the object is passed by value. Why? Because that's what the convention calls for. (There are better reasons, but they are tangential to the current subject.) In order for things like dynamic libraries to work, there needs to be a convention, and this ABI is what we have.
The upshot is that primitives receive preferential treatment, making them faster. When you switch a function's parameter from a pointer to a class instance, there is a runtime cost (of unknown size – it might be insignificant). This is the cost that prevents unique_ptr from being zero-cost in all cases. Zero-cost in many common cases, but not when a function takes a unique_ptr argument.

What is the performance overhead of std::mem_fn?

What is the performance overhead of calling a std::mem_fn vs. a raw call to the wrapped member function? I expect that with an optimizing compiler it would be zero since there's probably polymorphic mechanics as there are in std::function, but I'm not sure since I don't really know the internals.

std::function lambda optimization

std::function is known to have performance issues because it may do heap allocations. Admitted, if you are being 100% honest, one heap allocation should hardly be a problem in most cases... but let's just assume doing a heap allocation is undesirable or forbidding in a particular scenario. Maybe we're doing a few million callbacks and don't want a few million heap allocations for that, whatever.
So... we want to avoid that heap allocation.
The Dr. Dobbs article Efficient Use of Lambda Expressions and std::function gives a recommendation on optimizing the use of std::function by taking advantage of the small object optimization that is recommended by the standard and implemented in every mainstream standard library.
The article goes into length explaining how the standard library must copy the functor since the std::function object might outlive the original functor (though you can use std::ref if you are sure it doesn't), which would be bad mojo. Also, captures need to be copied, and here is the problem: The exact type of closure (or its size) is not known beforehand as it could be any type of closure with any number of captures, so some compromise must be made. Up to a certain size, the captures will be saved in a store inside the function object, and beyond that, it will be dynamically allocated. The store is small, anywhere from 12 to 16 bytes, so assuming a 64-bit build, a maximum of two pointers (not counting the actual function pointer).
Dr. Dobbs thus recommends (and several other sites pick up that advice, seemingly without much of an objection) capturing a reference to a struct that holds references to what you actually want to capture. That way, you only capture one reference, which is just perfect, since it will always fit into the small object store.
How does that work? The assumption which made copying stuff around necessary in the first place was that the function object may outlive the scope of the original closure. Which means, of course, that it also outlives the structure that it holds a reference to, as well as anything referenced from inside that struct.
How is this supposed to work? And since I can't see how it could possibly work, is there a better well-known recipe to address this? (one that doesn't reference invalid objects)

I don't think it's supposed to work if the function object does outlive its calling function (and you're capturing references to objects that are on the stack).
In many practical cases the function object is used locally and will not outlive its caller and then you can avoid the heap allocation (but then again, the compiler might be able to optimize the references and the entire struct technique is probably not necessary).
Here's a simple test which compiles but crashes (tested on clang in C++14 mode.)

I'm sorry that the article wasn't clearly enough. (I'm the author.)
The advised technique is indeed not supposed to work when the std::function object outlives the scope of the original closure in which case you must not use std::ref and must pay the price of copying and, potentially, making a heap allocation.
The point of the article is this: when there's no lifetime issue (a case which, as nimrodm pointed out is quite common), a user can pass this information to std::function for its constructor to somehow take the closure object by reference instead of by value. Obviously, the user cannot magically and punctually change the signature of std::function's constructor for one particular call. That's where std::reference_wrapper and std::ref come in. The client passes a std::reference_wrapper (created by std::ref) object to std::function's constructor. Then, what gets copied is this object which is small and should fit in the small-object-optimisation buffer and acts as a "reference" to the original closure object.
In here you can see impact on the performance of std::function construction (sure this is just one point of consideration amongst many others). In this example the closure object contains 3 doubles and using std::ref makes the construction 7.5 times faster (YMMV):
One can inspect the generated assembly by clicking on the "Assembly" tab of the link above. A notable difference between the two versions is that the slower one contains this line:
callq 404430 <operator new(unsigned long)#plt>
This confirms that there's a call to operator new which is absent from the faster alternative.

Should I use std::function or a function pointer in C++?

When implementing a callback function in C++, should I still use the C-style function pointer:
void (*callbackFunc)(int);
Or should I make use of std::function:
std::function< void(int) > callbackFunc;

In short, use std::function unless you have a reason not to.
Function pointers have the disadvantage of not being able to capture some context. You won't be able to for example pass a lambda function as a callback which captures some context variables (but it will work if it doesn't capture any). Calling a member variable of an object (i.e. non-static) is thus also not possible, since the object (this-pointer) needs to be captured.(1)
std::function (since C++11) is primarily to store a function (passing it around doesn't require it to be stored). Hence if you want to store the callback for example in a member variable, it's probably your best choice. But also if you don't store it, it's a good "first choice" although it has the disadvantage of introducing some (very small) overhead when being called (so in a very performance-critical situation it might be a problem but in most it should not). It is very "universal": if you care a lot about consistent and readable code as well as don't want to think about every choice you make (i.e. want to keep it simple), use std::function for every function you pass around.
Think about a third option: If you're about to implement a small function which then reports something via the provided callback function, consider a template parameter, which can then be any callable object, i.e. a function pointer, a functor, a lambda, a std::function, ... Drawback here is that your (outer) function becomes a template and hence needs to be implemented in the header. On the other hand you get the advantage that the call to the callback can be inlined, as the client code of your (outer) function "sees" the call to the callback will the exact type information being available.
Example for the version with the template parameter (write & instead of && for pre-C++11):
template <typename CallbackFunction>
void myFunction(..., CallbackFunction && callback) {
...
callback(...);
...
}
As you can see in the following table, all of them have their advantages and disadvantages:
function ptr
std::function
template param
can capture context variables
no1
yes
yes
no call overhead (see comments)
yes
no
yes
can be inlined (see comments)
no
no
yes
can be stored in a class member
yes
yes
no2
can be implemented outside of header
yes
yes
no
supported without C++11 standard
yes
no3
yes
nicely readable (my opinion)
no
yes
(yes)
(1) Workarounds exist to overcome this limitation, for example passing the additional data as further parameters to your (outer) function: myFunction(..., callback, data) will call callback(data). That's the C-style "callback with arguments", which is possible in C++ (and by the way heavily used in the WIN32 API) but should be avoided because we have better options in C++.
(2) Unless we're talking about a class template, i.e. the class in which you store the function is a template. But that would mean that on the client side the type of the function decides the type of the object which stores the callback, which is almost never an option for actual use cases.
(3) For pre-C++11, use boost::function

void (*callbackFunc)(int); may be a C style callback function, but it is a horribly unusable one of poor design.
A well designed C style callback looks like void (*callbackFunc)(void*, int); -- it has a void* to allow the code that does the callback to maintain state beyond the function. Not doing this forces the caller to store state globally, which is impolite.
std::function< int(int) > ends up being slightly more expensive than int(*)(void*, int) invocation in most implementations. It is however harder for some compilers to inline. There are std::function clone implementations that rival function pointer invocation overheads (see 'fastest possible delegates' etc) that may make their way into libraries.
Now, clients of a callback system often need to set up resources and dispose of them when the callback is created and removed, and to be aware of the lifetime of the callback. void(*callback)(void*, int) does not provide this.
Sometimes this is available via code structure (the callback has limited lifetime) or through other mechanisms (unregister callbacks and the like).
std::function provides a means for limited lifetime management (the last copy of the object goes away when it is forgotten).
In general, I'd use a std::function unless performance concerns manifest. If they did, I'd first look for structural changes (instead of a per-pixel callback, how about generating a scanline processor based off of the lambda you pass me? which should be enough to reduce function-call overhead to trivial levels.). Then, if it persists, I'd write a delegate based off fastest possible delegates, and see if the performance problem goes away.
I would mostly only use function pointers for legacy APIs, or for creating C interfaces for communicating between different compilers generated code. I have also used them as internal implementation details when I am implementing jump tables, type erasure, etc: when I am both producing and consuming it, and am not exposing it externally for any client code to use, and function pointers do all I need.
Note that you can write wrappers that turn a std::function<int(int)> into a int(void*,int) style callback, assuming there are proper callback lifetime management infrastructure. So as a smoke test for any C-style callback lifetime management system, I'd make sure that wrapping a std::function works reasonably well.

Use std::function to store arbitrary callable objects. It allows the user to provide whatever context is needed for the callback; a plain function pointer does not.
If you do need to use plain function pointers for some reason (perhaps because you want a C-compatible API), then you should add a void * user_context argument so it's at least possible (albeit inconvenient) for it to access state that's not directly passed to the function.

The only reason to avoid std::function is support of legacy compilers that lack support for this template, which has been introduced in C++11.
If supporting pre-C++11 language is not a requirement, using std::function gives your callers more choice in implementing the callback, making it a better option compared to "plain" function pointers. It offers the users of your API more choice, while abstracting out the specifics of their implementation for your code that performs the callback.

std::function may bring VMT to the code in some cases, which has some impact on performance.

The other answers answer based on technical merits. I'll give you an answer based on experience.
As a very heavy X-Windows developer who always worked with function pointer callbacks with void* pvUserData arguments, I started using std::function with some trepidation.
But I find out that combined with the power of lambdas and the like, it has freed up my work considerably to be able to, at a whim, throw multiple arguments in, re-order them, ignore parameters the caller wants to supply but I don't need, etc. It really makes development feel looser and more responsive, saves me time, and adds clarity.
On this basis I'd recommend anyone to try using std::function any time they'd normally have a callback. Try it everywhere, for like six months, and you may find you hate the idea of going back.
Yes there's some slight performance penalty, but I write high-performance code and I'm willing to pay the price. As an exercise, time it yourself and try to figure out whether the performance difference would ever matter, with your computers, compilers and application space.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Overhead with std::function - c++

I have seen many instances where people have advised against using std::function<> because it is a heavyweight mechanism. Could someone please explain why that is so?

Related

std::function internal memory organization and copies; passing reference vs value

Does using a method always imply an indirect access to member fields through "this" pointer?

What is the performance overhead of std::mem_fn?

std::function lambda optimization

Should I use std::function or a function pointer in C++?

Categories

Resources