std::function lambda optimization - c++

std::function is known to have performance issues because it may do heap allocations. Admitted, if you are being 100% honest, one heap allocation should hardly be a problem in most cases... but let's just assume doing a heap allocation is undesirable or forbidding in a particular scenario. Maybe we're doing a few million callbacks and don't want a few million heap allocations for that, whatever.
So... we want to avoid that heap allocation.
The Dr. Dobbs article Efficient Use of Lambda Expressions and std::function gives a recommendation on optimizing the use of std::function by taking advantage of the small object optimization that is recommended by the standard and implemented in every mainstream standard library.
The article goes into length explaining how the standard library must copy the functor since the std::function object might outlive the original functor (though you can use std::ref if you are sure it doesn't), which would be bad mojo. Also, captures need to be copied, and here is the problem: The exact type of closure (or its size) is not known beforehand as it could be any type of closure with any number of captures, so some compromise must be made. Up to a certain size, the captures will be saved in a store inside the function object, and beyond that, it will be dynamically allocated. The store is small, anywhere from 12 to 16 bytes, so assuming a 64-bit build, a maximum of two pointers (not counting the actual function pointer).
Dr. Dobbs thus recommends (and several other sites pick up that advice, seemingly without much of an objection) capturing a reference to a struct that holds references to what you actually want to capture. That way, you only capture one reference, which is just perfect, since it will always fit into the small object store.
How does that work? The assumption which made copying stuff around necessary in the first place was that the function object may outlive the scope of the original closure. Which means, of course, that it also outlives the structure that it holds a reference to, as well as anything referenced from inside that struct.
How is this supposed to work? And since I can't see how it could possibly work, is there a better well-known recipe to address this? (one that doesn't reference invalid objects)

I don't think it's supposed to work if the function object does outlive its calling function (and you're capturing references to objects that are on the stack).
In many practical cases the function object is used locally and will not outlive its caller and then you can avoid the heap allocation (but then again, the compiler might be able to optimize the references and the entire struct technique is probably not necessary).
Here's a simple test which compiles but crashes (tested on clang in C++14 mode.)

I'm sorry that the article wasn't clearly enough. (I'm the author.)
The advised technique is indeed not supposed to work when the std::function object outlives the scope of the original closure in which case you must not use std::ref and must pay the price of copying and, potentially, making a heap allocation.
The point of the article is this: when there's no lifetime issue (a case which, as nimrodm pointed out is quite common), a user can pass this information to std::function for its constructor to somehow take the closure object by reference instead of by value. Obviously, the user cannot magically and punctually change the signature of std::function's constructor for one particular call. That's where std::reference_wrapper and std::ref come in. The client passes a std::reference_wrapper (created by std::ref) object to std::function's constructor. Then, what gets copied is this object which is small and should fit in the small-object-optimisation buffer and acts as a "reference" to the original closure object.
In here you can see impact on the performance of std::function construction (sure this is just one point of consideration amongst many others). In this example the closure object contains 3 doubles and using std::ref makes the construction 7.5 times faster (YMMV):
One can inspect the generated assembly by clicking on the "Assembly" tab of the link above. A notable difference between the two versions is that the slower one contains this line:
callq 404430 <operator new(unsigned long)#plt>
This confirms that there's a call to operator new which is absent from the faster alternative.

Related

std::function internal memory organization and copies; passing reference vs value

When a std::function is copied, are the code instructions it references copied as well?
An std::function is initialized via some form of callable, that points to executable code in some way (like a function pointer typically does). Now, when a function-object is copied, is this executable code runtime copied or internally referenced?
To rephrase the question: If one instance of std::function is copied, are there then multiple copies of the same compiled code instructions in memory?
Is std::function an object that actually stores the function code or is it more an abstraction for a function pointer?
The former would seem wasteful and I don't suspect it, but everything I found so far on the subject is either too vague, lacking or too specific for me to say for me for sure. For example
When the target is a function pointer or a std::reference_wrapper, small object optimization is guaranteed, that is, these targets are always directly stored inside the std::function object, no dynamic allocation takes place. Other large objects may be constructed in dynamic allocated storage and accessed by the std::function object through a pointer. - cppreference
gives some hints about how it's done but seems still too vague and maybe is not related at all to this question, because of further abstractions inside of std::function.
For context: I am trying to refactor some bad C-ish code that maps input-events (keystrokes, mouse input and the like) to a certain behavior, which is executed upon a target data structure which can be interpreted by the program as more specific input that have semantic context other than than keystrokes (, aka keybindings). One can suspect that requirements of behaviours varies drastically.
This was previously implemented with lists of defines and numbers specifying input-event-ids, and hard-coded behavior, which was selected by switch-case. We quickly approach the border of where this intial way of doing it becomes unwieldly.
To get out of the defined lists to an expandable, declarative, object oriented and flexible design I consider higher order functions.
Especially since some behavior is quite simple and repeatedly needed (like for example the toggle of one value in the output data structure) other behaviors are more complex with multiple conditions attached, I'd like to declare some of the behavior statically, but still would like to be open to just assign some special lambda in some cases. Since I need to store behavior per input (key, mousebutton, mouse-axis, etc.) and potentially many copies of one certain behaviour type can be instantiated in one time for different sets of keybindings, I wonder if this behavior should be referenced, rather than stored by value. In the former case, fresh lambdas would need to be owned by the behavior structures, but statically declared behavior does not, which pragmatically would lead to some shared_ptr shenanigans. In the latter case, by value, this would not be an issue, but I wouldn't want multiple copies of for example the toggle behavior to cause too much redundant overhead instead.
(Note: the whole discussion below is a little simplified. AFAIK, none of it is wrong, but I did omit some details and edge cases and definitions and implementation stuff.)
The std::function does not copy any executable code. The executable code is always merely pointed to, by std::function. And when the std::function gets copied, the pointer gets duplicated (which is completely fine, because executable code is never freed either.) So far, there is no difference between a plain old function pointer and a std::function.
But that's not the whole story.
Contrary to function pointers, instances of std::function can carry around "state" as well as a pointer to the executable code, and the whole hubbub about std::function having to allocate/deallocate and copy/move data around is about this extra state, not the function pointer.
Suppose that you have code like this:
(And note that although I've used a lambda here, the following explanation would have been equally applicable for "functors" and "function objects" and "bind results" and other forms of callable things in C++, all except plain old function pointers.)
int x = 42, y = 17;
std::function<int()> f = [x, y] {return x + y;};
Here, f not only stores the pointer to the executable code for return x + y;, but it also has to remember the value of x and y. Since the amount of state that you can "capture" in this way is not limited, then - by definition - the std::function must allocate memory from the heap upon construction, and deallocate it, copy it and move it at appropriate times. Again, it is this extra "state" that gets copied, not the code.
Let's review: each std::function needs to be able to store at least a pointer to executable code, and 0 or more bytes of extra captured state. If there is no captured state, a std::function is essentially the same as a function pointer (although in practice, std::functions are usually implemented polymorphically and have other stuff in there.)
Some (most) implementations of std::function that I'm aware of employ an optimization that is called "Small Object Optimization". In these implementations, in addition to the space for the pointer to code, the std::function object has some more (fixed amount of) space inside its instance (i.e. as a member of its class, as opposed to somewhere else on the heap) and will use that area if the total number of bytes of the captured state would fit in there. This eliminates the heap allocation, which is important in some use cases and would balance out the additional memory used (when there is no or little state to capture.)
I think the information in regarding the exceptions share some light:
Does not throw if other's target is a function pointer or a std::reference_wrapper, otherwise may throw std::bad_alloc or any exception thrown by the constructor used to copy or move the stored callable object. CppReference
This seems to imply that every copy of the std::function copies the contained callable as well. For example, in case your function contains a lambda with a vector, that lambda and by result vector gets copied. The actual machine code that is linked to it, stays in the read-only part of your executable and won't be copied.
An update from the c++20 standard draft: 20.14.16.2.1 Constructors and destructor[func.wrap.func.con]
function(const function& f);
Postconditions: !*this if !f; otherwise, *this targets a copy off.target().
Throws: Nothing iff’s target is a specialization ofreference_wrapperor
a function pointer. Otherwise, may throwbad_allocor any exception
thrown by the copy constructor of the stored callable object.
[Note:
Implementations should avoid the use of dynamically allocated memory
for small callable objects for example, where f’s target is an object holding only a pointer or reference to an object and a member function pointer. — end note]
It seems that std::function does only manage one callable.
If copied, what happens to code is specified by the callable itself.
In a function pointer case, only a function pointer needs to be copied.
In a lambda or custom callable case this would be determined by the implementation of the copy of lambdas or any custom callable class.
These latter 2 typically can hold members of their own, outside of the reference to code. Therefore some space must be allocated by std::function to accomodate these cases. This is however misleading as it could seem std::function as allocating space for code. The management of instruction code seems to be done by the callable however this is done internally there.
In this context default behavior of typically used callables (like lambdas) when copied seems far more interesting for the intended question, but does seem to strech the posed question too far out of the bounds of the context of std::function.
I therefore would consider this question as solved as posed and deepen my knowledge about how lamdas are implemented especially in regards to how they are compiled and the compiled code referenced.

Does using a method always imply an indirect access to member fields through "this" pointer?

I was wondering if that type of thing falls under compiler's optimizations purview. From what I've gathered from this talk even std::unique_ptr is not truly "zero-cost" in part due to the implicit indirection of this pointer which plays a role in passing the actual underlying pointer to the unique pointer's member function.
Is there always a reference involved in passing member fields to methods of a class or can the compiler see that, say this method does not use certain fields and doesn't modify the rest so it will just pass them by value?
There seems to be some misunderstanding of what the talk discusses. There is a discussion of why unique_ptr is not "zero-cost". However, that discussion focuses on a specific case, the transferal of ownership. On the one hand, since there is a situation where there are costs, it is true that unique_ptr is not zero-cost. On the other hand, that conclusion is misleading, as it sounds just as all-encompassing as saying that it is zero-cost. A more accurate description would combine the two views: unique_ptr can be a zero-cost replacement for a raw pointer, but not always.
A unique_ptr can be zero-cost. The first question from the Q&A session at the end of the talk addresses this (starting at 36:36). Most member functions of smart_ptr are simple enough to be inlined by any C++ compiler that understands template syntax. There is no overhead associated with the this pointer and member functions. If you never transfer ownership, go on thinking of unique_ptr as zero-cost.
The extra cost comes when ownership is transferred. The talk specifically focused on passing a unique_ptr as a parameter to a function. This unambiguously gives the called function ownership of whatever the pointer points to. It also entails an additional run-time cost (two additional costs if the raw pointer version lacked exception safety).
The extra cost is not intrinsic to the C++ language, but rather comes from a commonly-used ABI (application binary interface). The ABI defines at a low level (think assembly) how parameters are passed to functions. According to this convention, there is an important difference between T* and unique_ptr<T> – the former is a primitive type, while the latter is an instance of a class. If I understood this part of the talk correctly, the ABI calls for primitive types to be placed directly in the call stack (potentially simply stored in a register), whereas class instances must exist in main memory and a pointer to the instance is placed directly in the stack / potentially in a register. Yes, even if the object is passed by value. Why? Because that's what the convention calls for. (There are better reasons, but they are tangential to the current subject.) In order for things like dynamic libraries to work, there needs to be a convention, and this ABI is what we have.
The upshot is that primitives receive preferential treatment, making them faster. When you switch a function's parameter from a pointer to a class instance, there is a runtime cost (of unknown size – it might be insignificant). This is the cost that prevents unique_ptr from being zero-cost in all cases. Zero-cost in many common cases, but not when a function takes a unique_ptr argument.

Overhead with std::function

I have seen many instances where people have advised against using std::function<> because it is a heavyweight mechanism. Could someone please explain why that is so?
std::function is a type erasure class.
It takes whatever it is constructed from, and erases everything except:
Invoke with the signature in question (with possible implicit casting)
Destroy
Copy
Cast back to exact original type
and possibly
Move
This involves some overhead. A typical decent-quality std::function will have small object optimization (like small string optimization), avoiding a heap allocation when the amount of memory used is small.
A function pointer will fit in there.
However, there is still overhead. If you initialize a std::function with a compatible function pointer, instead of directly calling the function pointer in question, you do a virtual function table lookup, or invoke some other function, which then invokes the function pointer.
With a vtable implementation, that is a possible cache miss, an instruction cache miss, then another instruction cache miss. With a function pointer, the pointer is probably stored locally, and it is called directly, resulting on one possible instruction cache miss.
On top of this, in practice compilers understand function pointers better than std::functions: a number of compilers can figure out that the pointer is constant value during inlining or whole program optimization. I have never seen one that pulls that off with std::function.
For larger objects (say larger than sizeof(std::string) in one implementation), a heap allocation is also done by the std::function. This is another cost. For function pointers and reference wrappers, SOO is guaranteed by the standard.
Directly storing the lambda without storing it in a std::function is even better than a function pointer: in that case, the code being run is implicit in the type of the lambda. This makes it trivial for code to work out what is going to happen when it is called, and inlining easy for the compiler.
Only do type erasure when you need to.
Under the hood, std::function typically uses type erasure (one simplified explanation for how it may be implemented is here). The cost of storing your function object inside the std::function object may involve a heap allocation. The cost of invoking your function object is typically an indirection through a pointer plus a virtual function call. Also, while compilers are getting better at this, the virtual function call usually inhibits inlining of your function.
That being said, I recommend using std::function unless you know via measurements that the cost is too high (typically when you cannot afford heap allocations, your function will be called many times in a place that requires very low latency, etc.), as it is better to write straightforward code than to prematurely optimize.
Depending of the implementation, std::function will add some overhead due to the use of type easure. They have been some other implementation such as Don Clugston's fast delegate, with a C++11 implementation here. Please note that it uses UB to make the fastest possible delegate, but is still extremely portable.
If you want type erasure it's the right tool for the job and almost certainly not your bottleneck and not something you could write faster anyway.
However sometimes it can be all to tempting to use type erasure when it really isn't required. That's where to draw the line. For example if all you want to do is keep hold of a lambda locally then it's probably not the right tool and you should just use:
auto l = [](){};
Likewise for function pointers you don't plan to type erase - just use a function pointer type.
You also don't need type erasure for templates from <algorithm> or your own equivalents because there's simply no need for heterogenous functor types to coexist.
It's not so.
To put it simply, it's not too heavyweight unless you profiled your program and showed that it is too heavyweight. Since evidently you did not (otherwise you would know the answer to this question), we can safely conclude that it is in fact not too heavyweight at all.
You should always profile, before concluding that it's too slow.

Should I now be passing by value?

In this talk (sorry about the sound) Chandler Carruth suggests not passing by reference, even const reference, in the vast majority of cases due to the way in which it limits the back-end to perform optimisation.
He claims that in most cases the copy is negligible - which I am happy to believe, most data structures/classes etc. have a very small part allocated on the stack - especially when compared with the back-end having to assume pointer aliasing and all the nasty things that could be done to a reference type.
Let's say that we have large object on the stack - say ~4kB and a function that does something to an instance of this object (assume free-standing function).
Classically I would write:
void DoSomething(ExpensiveType* inOut);
ExpensiveType data;
...
DoSomething(&data);
He's suggesting:
ExpensiveType DoSomething(ExpensiveType in);
ExpensiveType data;
...
data = DoSomething(data);
According to what I got from the talk, the second would tend to optimise better. Is there a limit to how big I make something like this though, or is the back-end copy-elision stuff just going to prefer the values in almost all cases?
EDIT: To clarify I'm interested in the whole system, since I feel that this would be a major change to the way I write code, I've had use of refs over values drilled into me for anything larger than integral types for a long time now.
EDIT2: I tested it too, results and code here. No competition really, as we've been taught for a long time, the pointer is a far faster method of doing things. What intrigues me now is why it was suggested during that talk that we move to pass by value, but as the numbers don't support it, it's not something I'm going to do.
I have now watched parts of Chandler's talk. I think the general discussion along the lines "should I now always pass by value" does not do his talk justice. Edit: And actually his talk has been discussed before, here value semantics vs output params with large data structures and in a blog from Eric Niebler, http://ericniebler.com/2013/10/13/out-parameters-vs-move-semantics/.
Back to Chandler. In the key note he specifically (around the 4x-5x minute mark mentioned elsewhere) mentions the following points:
If the optimizer cannot see the code of the called function you have much bigger problems than passing refs or values. It pretty much prevents optimization. (There is a follow-up question at that point about link time optimization which may be discussed later, I don't know.)
He recommends the "new classical" way of returning values using move semantics. Instead of the old school way of passing a reference to an existing object as an in-out parameter the value should be constructed locally and moved out. The big advantage is that the optimizer can be sure that no part of the object is alisased since only the function has access to it.
He mentions threads, storing a variable's value in globals, and observable behaviour like output as examples for unknowns which prevent optimization when only refs/pointers are passed. I think an abstract description could be "the local code can not assume that local value changes are undetected elsewhere, and it cannot assume that a value which is not changed locally has not changed at all". With local copies these assumptions could be made.
Obviously, when passing (and possibly, if objects cannot be moved, when returning) by value, there is a trade-off between the copy cost and the optimization benefits. Size and other things making copying costly will tip the balance towards reference strategies, while lots of optimizable work on the object in the function tips it towards value passing. (His examples involved pointers to ints, not to 4k sized objects.)
Based on the parts I watched I do not think Chandler promoted passing by value as a one-fits-all strategy. I think he dissed passing by reference mostly in the context of passing an out parameter instead of returning a new object. His example was not about a function which modified an existing object.
On a general note:
A program should express the programmer's intent. If you need a copy, by all means do copy! If you want to modify an existing object, by all means use references or pointers. Only if side effects or run time behavior become unbearable; really only then try do do something smart.
One should also be aware that compiler optimizations are sometimes surprising. Other platforms, compilers, compiling options, class libraries or even just small changes in your own code may all prevent the compiler from coming to the rescue. The run-time cost of the change would in many cases come totally unexpected.
Perhaps you took that part of the talk out of context, or something. For large objects, typically it depends on whether the function needs a copy of the object or not. For example:
ExpensiveType DoSomething(ExpensiveType in)
{
cout << in.member;
}
you wasted a lot of resource copying the object unnecessarily, when you could have passed by const reference instead.
But if the function is:
ExpensiveType DoSomething(ExpensiveType in)
{
in.member = 5;
do_something_else(in);
}
and we did not want to modify the calling function's object, then this code is likely to be more efficient than:
ExpensiveType DoSomething(ExpensiveType const &inr)
{
ExpensiveType in = inr;
in.member = 5;
do_something_else(in);
}
The difference comes when invoked with an rvalue (e.g. DoSomething( ExpensiveType(6) ); The latter creates a temporary , makes a copy, then destroys both; whereas the former will create a temporary and use that to move-construct in. (I think this can even undergo copy elision).
NB. Don't use pointers as a hack to implement pass-by-reference. C++ has native pass by reference.

What is preferred way of passing pointer/reference to existing object in a constructor?

I'll start from example. There is a nice "tokenizer" class in boost. It take a string to be tokenized as a parameter in a constructor:
std::string string_to_tokenize("a bb ccc ddd 0");
boost::tokenizer<boost::char_separator<char> > my_tok(string_to_tokenize);
/* do something with my_tok */
The string isn't modifed in the tokenizer, so it is passed by const object reference. Therefore I can pass a temporary object there:
boost::tokenizer<boost::char_separator<char> > my_tok(std::string("a bb ccc ddd 0"));
/* do something with my_tok */
Everything looks fine, but if I try to use the tokenizer, a disaster occurs. After short investigation I realized, that the tokenizer class store the reference that I gave to it, and use in further use. Of course it cannot work well for reference to temporary object.
The documentation doesn't say explicitly that the object passed in the constructor will be used later, but ok, it is also not stated, that it won't be :) So I cannot assume this, my mistake.
It is a bit confusing however. In general case, when one object take another one by const reference, it suggest that temporary object can be given there. What do you think? Is this a bad convention? Maybe pointer to object (rather than reference) should be used in such cases? Or even further - won't it be useful to have some special keyword to argument that allow/disallow giving temporary object as parameter?
EDIT: The documentation (version 1.49) is rather minimalistic and the only part that may suggest such a problem is:
Note: No parsing is actually done upon construction. Parsing is done on demand as the tokens are accessed via the iterator provided by begin.
But it doesn't state explicitly, that the same object that was given will be used.
However, the point of this question is rather discussion about coding style in such a case, this is only an example that inspired me.
If some function (such as a constructor) takes an argument as reference-to-const then it should either
Document clearly that the lifetime of the referenced object must satisfy certain requirements (as in "Is not destroyed before this and that happens")
or
Create copies internally if it needs to make use of the given object at a later point.
In this particular case (the boost::tokenizer class) I'd assume that the latter isn't done for performance reasons and/or to make the class usable with container types which aren't even copyable in the first place. For this reason, I'd consider this a documentation bug.
Personally I think it's a bad idea, and it would be better write the constructor either to copy the string, or to take a const std::string* instead. It's only one extra character for the caller to type, but that character stops them accidentally using a temporary.
As a rule: don't create responsibilities on people to maintain objects without making it very obvious that they have that responsibility.
I think a special keyword wouldn't be a complete enough solution to justify changing the language. It's not actually temporaries that are the problem, it's any object that lives for less time than the object being constructed. In some circumstances a temporary would be fine (for example if the tokenizer object itself were also a temporary in the same full-expression). I don't really want to mess about with the language for the sake of half a fix, and there are fuller fixes available (for example take a shared_ptr, although that has its own issues).
"So I cannot assume this, my mistake"
I don't think it really is your mistake, I agree with Frerich that as well as being against my personal style guide to do this at all, if you do it and don't document then that's a documentation bug in any reasonable style guide.
It's absolutely essential that the required lifetime of by-reference function parameters is documented, if it's anything other than "at least as long as the function call". It's something that docs are often lax about, and needs to be done properly to avoid errors.
Even in garbage-collected languages, where lifetime itself is automatically handled and so tends to get neglected, it matters whether or not you can change or re-use your object without changing the behavior of some other object that you passed it to method of, some time in the past. So functions should document whether they retain an alias to their arguments in any language that lacks referential transparency. Especially so in C++ where object lifetime is the caller's problem.
Unfortunately the only mechanism to actually ensure that your function cannot retain a reference is to pass by value, which has a performance cost. If you can invent a language that allows aliasing normally, but also has a C-style restrict property that is enforced at compile-time, const-style, to prevent functions from squirreling away references to their arguments, then good luck and sign me up.
As others said, the boost::tokenizer example is the result of either a bug in the tokenizer or a warning missing from the documentation.
To generally answer the question, I found the following priority list useful. If you can't choose an option for some reason, you go to the next item.
Pass by value (copyable at an acceptable cost and don't need to change original object)
Pass by const reference (don't need to change original object)
Pass by reference (need to change original object)
Pass by shared_ptr (the lifetime of the object is managed by something else, this also clearly shows the intention to keep the reference)
Pass by raw pointer (you got an address to cast to, or you can't use a smart pointer for some reason)
Also, if your reasoning to choose the next item from the list is "performance", then sit down and measure the difference. In my experience, most people (especially with Java or C# backgrounds) tend to over-estimate the cost of passing an object by value (and under-estimate the cost of dereferencing). Passing by value is the safest option (it will not cause any surprises outside the object or function, not even in another thread), don't give up that huge advantage easily.
A lot of time it will depend on context, for example if it's a functor which will be called in a for_each or similar, then you will often store a reference or a pointer within your functor to an object you expect will have a lifetime beyond your functor.
If it is a general use class then you have to consider how people are going to use it.
If you are writing a tokenizer, you need to consider that copying what you are tokenizing over might be expensive, however you also need to consider that if you are writing a boost library you are writing it for the general public who will use it in a multi-purpose way.
Storing a const char * would be better than a std::string const& here. If the user has a std::string then the const char * will remain valid as long as they don't modify their string, and they probably won't. If they have a const char * or something that holds an array of chars and passes it in, it will copy it anyway to create the std::string const & and you are in great danger of the fact that it won't live past your constructor.
Of course, with a const char * you can't use all the lovely std::basic_string functions in your implementation.
There is an option to take, as parameter, a std::string& (not const reference) which should guarantee (with a compliant compiler) that nobody will pass in a temporary, but you will be able to document that you don't actually change it, and the rationale behind your seemingly not const-correct code. Note, I have used this trick once in my code too. And you can happily use string's find functions. (As well as, if you wish, taking basic_string rather than string so you can tokenize wide character strings too).