Why isn't RVO / NRVO always applied? - c++

A brief (and possibly dated and over-simplified) summary of the return value optimization mechanics reads like this:
an implementation may create a hidden object in the caller's stack frame, and pass the address of this object to the function. The function's return value is then copied into the hidden object (...) Around 1991, Walter Bright invented a technique to minimize copying, effectively replacing the hidden object and the named object inside the function with the object used for holding the result [1]
Since it's a topic greatly discussed on SO, I'll only link the most complete QA I found.
My question is, why isn't the return value optimization always applied? More specifically (based on the definition in [1]) why doesn't this replacement always happen per function call, since function return types (hence size on stack) are always known at compile time and this seems to be a very useful feature.

Obviously, when an lvalue is returned by value, there is no way to not do a copy. So, let's consider only local variables. A simple reason applying to local variables is that often it is unclear which object is to be returned. Consider code like this:
T f(Args... args) {
T v1{some_init(args)};
T v2{some_other(args)};
bool rc = determine_result(v1, v2);
return rc? v1: v2;
}
At the point the local variable v1 and v2 are created the compiler has no way to tell which one is going to be returned so it can be created in place.
Another reason is that copy/move construction and destruction can have deliberate side-effects. Thus, it is desirable to have ways to inhibit copy-elision. At the time copy-elision was introduced there was already a lot of C++ code around which may depend on certain copies to be made, i.e., only few situations were made eligible to copy elision.

Requiring that the implementation do this could be a de-optimization in certain circumstances, such as if the return value were thrown away. If you start adding exceptions it starts becoming difficult to prove that an implementation is correct.
Instead they take the easy way and let the implementation decide when to do the optimization, and when it would be counter-productive to do it.

Related

Get rid of output parameters when returning a container

When returning a container, I always had to determine if I should use return value or use output parameter. If the performance matter, I chose the second option, otherwise I always chose the first option because it is more intuitive.
Frankly, I personally have been strongly objected about output parameters, possibly because of my mathematical background, but it was kind of OK to use them when I had no other options.
However, the things have been completely changed when it comes to generic programming. There are situations I encountered where a function may not know whether or not the object it returns is a huge container or just a simple value.
Consistently using output parameters may be the solution that I want to know if I can avoid. It is just so awkward if I have to do
int a;
f(a, other_arguments);
compared to
auto a = f(other_arguments);
Furthermore, sometimes the return type of f() has no default constructor. If output parameters are used, there is no graceful way to deal with that case.
I wonder if it is possible to return a "modifier object", a functor taking output parameters to modify them appropriately. (Perhaps this is a kind of lazy evaluation?) Well, returning such objects is not a problem, but the problem is I can't insert an appropriate overload of the assignment operator (or constructor) that takes such an object and triggers it to do its job, when the return type belongs to a library that I can't touch, e.g., std::vector. Of course, conversion operators are not helpful as they have no access to existing resources prepared for the target object.
Some people might ask why not use assign(); define a "generator object" which has begin() & end(), and pass those iterators to std::vector::assign. It is not a solution. For the first reason, the "generator object" does not have the full access to the target object and this may limit what could be done. For the second and more important reason, the call site of my function f() may also be a template which does not know the exact return type of f(), so it cannot determine which of the assignment operator or the assign() member function should be used.
I think that the "modifier object" approach to modify containers should have been already discussed in the past as it is not at all a new idea.
To sum up,
Is it possible to use return values to simulate what would happen when output parameters are used instead, in particular, when outputs are containers?
If not, was adding those supports to the standard discussed before? If it was, what were the issues? Is it a terrible idea?
Edit
The code example I've put above is misleading. The function f() may be used to initialize a local variable, but it may be also used to modify existing variables defined elsewhere. For the first case, as Rakete1111 mentioned, there is no problem with return by value as copy elision comes into play. But for the second case, there may be unnecessary resource releasing/acquiring.
I don't think your "modifier object" was ever proposed (AFAIK). And it will never go into the standard. Why? Because we already have a way to get rid of the expensive copy, and that is return by value (plus compiler optimizations).
Before C++17, compilers were allowed to do basically almost the same thing. This optimization is known as (N)RVO, which optimizes away the copy when returning a (named) temporary from a function.
auto a = f(other_arguments);
Will not return a temporary, then copy it into a. The compiler will optimize the copy away entirely, it is not needed. In theory, you cannot assume that your compiler supports this, but the three major ones do (clang, gcc, MSVC) so no need to worry - I don't know about ICC and the others, so I can't say.
So, as there is no copy (or move) involved, there is no performance penalty of using return values instead of output parameters (most probably, if for some reason your compiler doesn't support it, you'll get a move most of the time). You should always use return parameters if possible, and only use output parameters or some other technique if you measure that you get significantly better performance otherwise.
(Edited, based on comments)
You are right you should avoid output parameters if possible, because the code using them is harder to read and to debug.
Since C++11 we have a feature called move constructors (see reference). You can use std::move on all primitive and STL containers types. It is efficient (question: Efficiency difference between copy and move constructor), because you don't actually copy the values of the variables. Only the pointers are swapped. For your own complex types, you can write your own move constructor.
On the other hand, the only thing you can do is to return a reference to a temporary, which is of undefined behavior, for example:
#include <iostream>
int &&add(int initial, int howMany) {
return std::move(initial + howMany);
}
int main() {
std::cout << add(5, 2) << std::endl;
return 0;
}
Its output could be:
7
You could avoid the problem of temporary variables using global or static ones, but it is bad either. #Rakete is right, there is no good way of achieving it.

Should I now be passing by value?

In this talk (sorry about the sound) Chandler Carruth suggests not passing by reference, even const reference, in the vast majority of cases due to the way in which it limits the back-end to perform optimisation.
He claims that in most cases the copy is negligible - which I am happy to believe, most data structures/classes etc. have a very small part allocated on the stack - especially when compared with the back-end having to assume pointer aliasing and all the nasty things that could be done to a reference type.
Let's say that we have large object on the stack - say ~4kB and a function that does something to an instance of this object (assume free-standing function).
Classically I would write:
void DoSomething(ExpensiveType* inOut);
ExpensiveType data;
...
DoSomething(&data);
He's suggesting:
ExpensiveType DoSomething(ExpensiveType in);
ExpensiveType data;
...
data = DoSomething(data);
According to what I got from the talk, the second would tend to optimise better. Is there a limit to how big I make something like this though, or is the back-end copy-elision stuff just going to prefer the values in almost all cases?
EDIT: To clarify I'm interested in the whole system, since I feel that this would be a major change to the way I write code, I've had use of refs over values drilled into me for anything larger than integral types for a long time now.
EDIT2: I tested it too, results and code here. No competition really, as we've been taught for a long time, the pointer is a far faster method of doing things. What intrigues me now is why it was suggested during that talk that we move to pass by value, but as the numbers don't support it, it's not something I'm going to do.
I have now watched parts of Chandler's talk. I think the general discussion along the lines "should I now always pass by value" does not do his talk justice. Edit: And actually his talk has been discussed before, here value semantics vs output params with large data structures and in a blog from Eric Niebler, http://ericniebler.com/2013/10/13/out-parameters-vs-move-semantics/.
Back to Chandler. In the key note he specifically (around the 4x-5x minute mark mentioned elsewhere) mentions the following points:
If the optimizer cannot see the code of the called function you have much bigger problems than passing refs or values. It pretty much prevents optimization. (There is a follow-up question at that point about link time optimization which may be discussed later, I don't know.)
He recommends the "new classical" way of returning values using move semantics. Instead of the old school way of passing a reference to an existing object as an in-out parameter the value should be constructed locally and moved out. The big advantage is that the optimizer can be sure that no part of the object is alisased since only the function has access to it.
He mentions threads, storing a variable's value in globals, and observable behaviour like output as examples for unknowns which prevent optimization when only refs/pointers are passed. I think an abstract description could be "the local code can not assume that local value changes are undetected elsewhere, and it cannot assume that a value which is not changed locally has not changed at all". With local copies these assumptions could be made.
Obviously, when passing (and possibly, if objects cannot be moved, when returning) by value, there is a trade-off between the copy cost and the optimization benefits. Size and other things making copying costly will tip the balance towards reference strategies, while lots of optimizable work on the object in the function tips it towards value passing. (His examples involved pointers to ints, not to 4k sized objects.)
Based on the parts I watched I do not think Chandler promoted passing by value as a one-fits-all strategy. I think he dissed passing by reference mostly in the context of passing an out parameter instead of returning a new object. His example was not about a function which modified an existing object.
On a general note:
A program should express the programmer's intent. If you need a copy, by all means do copy! If you want to modify an existing object, by all means use references or pointers. Only if side effects or run time behavior become unbearable; really only then try do do something smart.
One should also be aware that compiler optimizations are sometimes surprising. Other platforms, compilers, compiling options, class libraries or even just small changes in your own code may all prevent the compiler from coming to the rescue. The run-time cost of the change would in many cases come totally unexpected.
Perhaps you took that part of the talk out of context, or something. For large objects, typically it depends on whether the function needs a copy of the object or not. For example:
ExpensiveType DoSomething(ExpensiveType in)
{
cout << in.member;
}
you wasted a lot of resource copying the object unnecessarily, when you could have passed by const reference instead.
But if the function is:
ExpensiveType DoSomething(ExpensiveType in)
{
in.member = 5;
do_something_else(in);
}
and we did not want to modify the calling function's object, then this code is likely to be more efficient than:
ExpensiveType DoSomething(ExpensiveType const &inr)
{
ExpensiveType in = inr;
in.member = 5;
do_something_else(in);
}
The difference comes when invoked with an rvalue (e.g. DoSomething( ExpensiveType(6) ); The latter creates a temporary , makes a copy, then destroys both; whereas the former will create a temporary and use that to move-construct in. (I think this can even undergo copy elision).
NB. Don't use pointers as a hack to implement pass-by-reference. C++ has native pass by reference.

How to avoid copy-constructing a return value

I'm a newcomer to C++ and I ran into a problem recently returning a reference to a local variable. I solved it by changing the return value from std::string& to an std::string. However, to my understanding this can be very inefficient. Consider the following code:
string hello()
{
string result = "hello";
return result;
}
int main()
{
string greeting = hello();
}
To my understanding, what happens is:
hello() is called.
Local variable result is assigned a value of "hello".
The value of result is copied into the variable greeting.
This probably doesn't matter that much for std::string, but it can definitely get expensive if you have, for example, a hash table with hundreds of entries.
How do you avoid copy-constructing a returned temporary, and instead return a copy of the pointer to the object (essentially, a copy of the local variable)?
Sidenote: I've heard that the compiler will sometimes perform return-value optimization to avoid calling the copy constructor, but I think it's best not to rely on compiler optimizations to make your code run efficiently.)
The description in your question is pretty much correct. But it is important to understand that this is behavior of the abstract C++ machine. In fact, the canonical description of abstract return behavior is even less optimal
result is copied into a nameless intermediate temporary object of type std::string. That temporary persists after the function's return.
That nameless intermediate temporary object is then copied to greeting after function returns.
Most compilers have always been smart enough to eliminate that intermediate temporary in full accordance with the classic copy elision rules. But even without that intermediate temporary the behavior has always been seen as grossly suboptimal. Which is why a lot of freedom was given to compilers in order to provide them with optimization opportunities in return-by-value contexts. Originally it was Return Value Optimization (RVO). Later Named Return Value Optimization was added to it (NRVO). And finally, in C++11, move semantics became an additional way to optimize the return behavior in such cases.
Note that under NRVO in your example the initialization of result with "hello" actually places that "hello" directly into greeting from the very beginning.
So in modern C++ the best advice is: leave it as is and don't avoid it. Return it by value. (And prefer to use immediate initialization at the point of declaration whenever you can, instead of opting for default initialization followed by assignment.)
Firstly, the compiler's RVO/NRVO capabilities can (and will) eliminate the copying. In any self-respecting compiler RVO/NRVO is not something obscure or secondary. It is something compiler writers do actively strive to implement and implement properly.
Secondly, there's always move semantics as a fallback solution if RVO/NRVO somehow fails or is not applicable. Moving is naturally applicable in return-by-value contexts and it is much less expensive than full-blown copying for non-trivial objects. And std::string is a movable type.
I disagree with the sentence "I think it's best not to rely on compiler optimizations to make your code run efficiently." That's basically the compiler's whole job. Your job is to write clear, correct, and maintainable source code. For every performance issue I've ever had to fix, I've had to fix a hundred or more issues caused by a developer trying to be clever instead of doing something simple, correct, and maintainable.
Let's take a look at some of the things you could do to try to "help" the compiler and see how they affect the maintainability of the source code.
You could return the data via reference
For example:
void hello(std::string& outString)
Returning data using a reference makes the code at the call-site hard to read. It's nearly impossible to tell what function calls mutate state as a side effect and which don't. Even if you're really careful with const qualifying the references it's going to be hard to read at the call site. Consider the following example:
void hello(std::string& outString); //<-This one could modify outString
void out(const std::string& toWrite); //<-This one definitely doesn't.
. . .
std::string myString;
hello(myString); //<-This one maybe mutates myString - hard to tell.
out(myString); //<-This one certainly doesn't, but it looks identical to the one above
Even the declaration of hello isn't clear. Does it modify outString, or was the author just sloppy and forgot to const qualify the reference? Code that is written in a functional style is easier to read and understand and harder to accidentally break.
Avoid returning the data via reference
You could return a pointer to the object instead of returning the object.
Returning a pointer to the object makes it hard to be sure your code is even correct. Unless you use a unique_ptr you have to trust that anybody using your method is thorough and makes sure to delete the pointer when they're done with it, but that isn't very RAII. std::string is already a type of RAII wrapper for a char* that abstracts away the data lifetime issues associated with returning a pointer. Returning a pointer to a std::string just re-introduces the problems that std::string was designed to solve. Relying on a human being to be diligent and carefully read the documentation for your function and know when to delete the pointer and when not to delete the pointer is unlikely to have a positive outcome.
Avoid returning a pointer to the object instead of returning the object
That brings us to move constructors.
A move constructor will just transfer ownership of the pointed-to data from 'result' to its final destination. Afterwards, accessing the 'result' object is invalid but that doesn't matter - your method ended and the 'result' object went out of scope. No copy, just a transfer of ownership of the pointer with clear semantics.
Normally the compiler will call the move constructor for you. If you're really paranoid (or have specific knowledge that the compiler isn't going to help you) you can use std::move.
Use move constructors if at all possible
Finally modern compilers are amazing. With a modern C++ compiler, 99% of the time the compiler is going to do some sort of optimization to eliminate the copy. The other 1% of the time it's probably not going to matter for performance. In specific circumstances the compiler can re-write a method like std::string GetString(); to void GetString(std::string& outVar); automatically. The code is still easy to read, but in the final assembly you get all of the real or imagined speed benefits of returning by reference. Don't sacrifice readability and maintainability for performance unless you have specific knowledge that the solution doesn't meet your business requirements.
There are plenty of ways to achieve that:
1) Return some data by the reference
void SomeFunc(std::string& sResult)
{
sResult = "Hello world!";
}
2) Return pointer to the object
CSomeHugeClass* SomeFunc()
{
CSomeHugeClass* pPtr = new CSomeHugeClass();
//...
return(pPtr);
}
3) C++ 11 could utilize a move constructor in such cases. See this this and this for the additional info.

Functions returning a collection of objects in C++

In my current project I need to implement quite a few functions/methods that take some parameters and generate a collection of results (rather large). So in order to return this collection without copying, I can either create a new collection and return a smart pointer:
boost::shared_ptr<std::vector<Stuff> > generate();
or take a reference to a vector which will be populated:
void generate(std::vector<Stuff> &output);
Both approaches have benefits. The first clearly shows that the vector is the output of the function, it is trivial to use in a parallelized scenario, etc. The second might be more efficient when called in a loop (because we don't allocate memory every time), but then it is not that obvious that the parameter is the output, and someone needs to clean the old data from the vector...
Which would be more customary in real life (i.e. what is the best practise)? In C#/java I would argue that the first one, what is the case in C++?
Also, is it possible to effectively return a vector by value using C++11? What would the pitfalls be?
do correctness first, then optimize if necessary
with both move semantics and Return Value Optimization conspiring to make an ordinary function result non-copying, you would probably have to work at it to make it sufficiently inefficient to be worth optimization work
so, just return the collection as a function result, then MEASURE if you feel that it's too slow
You should return by value.
is it possible to effectively return a vector by value using C++11?
Yes, C++11 supports move semantics. You return a value, but the compiler knows it's a temporary, and therefore can invoke a special constructor (move constructor) that is especially designed to simply "steal the guts" of the returned object. After all, you won't use that temporary object anymore, so why copying it when you can just move its content?
Apart from this, it may be worth mentioning that most C++ compilers, even pre-C++11, implement (Named) Return Value Optimization, which would elide the copy anyway, incurring in no overhead. Thus, you may want to actually measure the performance penalty you (possibly) get before optimizing.
I think you should pass by reference, or return a shared pointer, only when you need reference semantics. This does not seem to be your case.
There is an alternative approach. If you can make your functions template, make them take an output iterator (whose type is a template argument) as argument:
tempalte<class OutputIterator>
void your_algorithm(OutputIterator out) {
for(/*condition*/) {
++out = /* calculation */;
}
}
This has the advantage that the caller can decide in what kind of collection he wants to store the result (the output iterator could for instance write directly to a file, or store the result in a std::vector, or filter it, etc.).
The best practise will probably be surprising to you. I would recommend returning by value in both C++03 and C++11.
In C++03, if you create a std::vector local to generate and return it, the copy may be elided by the compiler (and almost certainly will be). See C++03 §12.8/15:
in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object with the same cv-unqualified type as the function return type, the copy operation can be omitted by constructing the automatic object directly into the function's return value
In C++11, if you create a std::vector local to generate and return it, the copy will first be considered as a move first (which will already be very fast) and then that may be elided (and almost certainly will be). See C++11 §12.8/31:
in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv-unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value
And §12.8/32:
When the criteria for elision of a copy operation are met or would be met save for the fact that the source object is a function parameter, and the object to be copied is designated by an lvalue, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue.
So return by value!
Believe it or not, I'm going to suggest that instead of either of those approaches just take the obvious implementation and return by value! Compilers are very often able to optimize away the notional copy that would be induced, removing it completely. By writing the code in the most obvious manner you make it very clear to future maintainers what the intent is.
But let's say you try return by value and your program runs too slow and let's further suppose that your profiler shows that the return by value is in fact your bottleneck. In this case I would allocate the container on the heap and return as an auto_ptr in C++03 or a unique_ptr in C++11 to clearly indicate that ownership is being transferred and that the generate isn't keeping a copy of that shared_ptr for its own purposes later.
Finally, the series at http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/ provides a great perspective on almost the exact same question.

How to minimize the times that list constructor (either copy-constructor ) is called?

I have the following compare method.
The method compare and return diff result.
I want to minimize the times that the result list is copied (to temporal and to assignment).
One way to do this is to add additional reference argument for the result, but i love that utils function are closed (they dont change values), so i prefer to avoid this.
One copy can be avoid by using the const& in the assignment
const& list<uint32> diff = getDiffNewElements (...)
, can be a way to also avoid the local copy to temporal ?
The diff method:
list<uint32> getDiffNewElements(const list<Row>& src ,const list<Row>& dst) {
list<uint32> result;
... Do Some compare
return result;
}
There are potentially two copies in the code that you present. One inside the function, from the variable result to the returned object. NRVO will take care of that if the complexity of the function allows for it. If as it seems, you have a single return statement, then the compiler will elide that copy (assuming that you are not disabling it with compiler flags and that you have some optimization level enabled).
The second potential copy is in the caller, from the returned value to the final storage. That copy is almost always elided by the compiler, and it is even simpler to do here than in the NRVO case: the calling convention (all calling conventions I know of) determines that the caller reserves the space for the returned object, and that it passes a hidden pointer to that location to the function. The function in turn uses that pointer as the destination of the first copy.
A function T f() is transformed into void f( uninitialized<T>* __ret ) (there is no such thing as uninitialized<>, but bear with me) and the implementation of the function uses that pointer as the destination when copying in the return statement. On the caller site, if you have T a = f(); the compiler will transform it into T a/*no construction here*/; f(&a);
There is an interesting bit of code in the question that seems to indicate that you have been mislead in the past: const list<uint32>& diff = getDiffNewElements(...). Using a reference rather than storing the value directly (as in list<uint32> diff = getDiffNewElements(...)) has no impact at all in the number of copies that are made. The getDiffNewElements still needs to copy (or elide the copy) to the returned object and that object lives in the scope of the caller. By taking a const reference you are telling the compiler that you don't want to directly name that object, but rather keep it as an unnamed object in the scope and that you only want to use a reference to it. Semantically it can be transformed into:
T __unnamed = getDiffNewElements(...); // this copy can be elided
T const& diff = __unnamed;
The compiler is free, and will probably, optimize the reference away, using the identifier diff as an alias to __unnamed without requiring extra space, so in general it will not be worse than the alternative, but the code is slightly more complex and there is no advantage at all.
Long time ago, when I had time, I started a blog and wrote a couple of articles on value semantics, (N)RVO and copy elision. You might want to take a look.
NRVO
Copy elision
You are looking in the wrong direction. Your getDiffNewElements method does not create unnecessary copies, because modern compilers do RVO (which was pointed in the comments, although if your compiler doesn't do RVO, there is nothing you can do about it, your const& won't help). The way you can optimize this function is to return vector instead of list, since you can call reserve on vector and avoid memory allocations every time you push_back a new element. list does not preallocate memory with default allocator and it has no reserve method to do that.