One commonly known compiler optimisation is is the so-called return value optimisation. This optimisation basically allows the compiler to not copy a local variable that is being returned from a function, but instead moving it.
However, I was wondering if the same is also possible for passing arguments to a function by value if it is known that the return value of the function will overwrite the original argument.
Here is an example. Let's assume we have the following function:
std::vector<Foo> modify(std::vector<Foo> data) {
/* Do some funny things to data */
return data;
}
This function is then used in the following way:
std::vector<Foo> bigData = /* big data */;
bigData = modify(bigData); // Here copying the data into the function could be omitted
Now, in this case it can be clearly determined that the return value of the function call will override the argument that is passed into the function per value. My question is whether current compilers are able to optimise this code in a way so that the argument data is not copied when passed to the function, or if this might even be a part of the so-called return value optimisation.
Update
Let's take C++11 into account. I wonder if the following understanding is correct: If the value passed to a function parameter by value is an r-value, and the type of the parameter has a move-constructor, the move constructor will be used instead of the copy constructor.
For example:
std::vector<Foo> bigData = /* big data */;
bigData = modify(std::move(bigData));
If this is assumption is correct, this eliminates the copy operation when passing the value. From the answers already given it seems that the optimisation I referred to earlier is not commonly undertaken. Looking at this manual approach I don't really understand why, as appears to be pretty straightforward to apply.
It's hard to say for sure because in principle compilers can optimize many things, as long as they are certain it has the same behavior. However, in my experience, this optimization will not occur without inlining. Consider the following code:
__attribute__((noinline)) std::vector<double> modify(std::vector<double> data) {
std::sort(data.begin(), data.end());
return data;
}
std::vector<double> blah(std::vector<double> v) {
v = modify(v);
return v;
}
You can look at the assembly generated for this for various compilers; here I have clang 4.0 with O3 optimization: https://godbolt.org/g/xa2Dhf. If you look at the assembly carefully, you'll see a call to operator new in blah. This proves that blah is indeed performing a copy in order to call modify.
Of course, if inlining occurs, it should be pretty trivial for the compiler to remove the copy.
In C++11 the compiler could determine that bigData is reassigned after use in the function and pass it as rvalue, but there is no guarantee for that, unlike for the RVO (from c++17).
For std::vector at least you can make sure this happens by calling the function as modify(std::move(bigData)), which will construct the value in modify from the rvalue reference, which it cannot optimize with the RVO afaik, because it is the function parameter, which is explicitly excluded from this optimization (3rd point here). However the compiler should understand that the return value is an r-value, and move it into big-data again.
Whether some compilers elide a move from an object into a function and out of the function back into the object I don't know for sure, but I know nothing that explicitly allows it, and since the move-constructor could have observable side-effects, that probably means, that it is not allowed (cf. the Notes section in above link).
That is really compiler specific and depends on how you perform operations(whether we are modifying the data or not) with the data. Mostly you shouldn't expect the compiler to do such kind of optimizations unless you really benchmark it. I did some tests with VS2012 compiler that performs copy operations though we don't modify it.
Please have a look at this post(Does the compiler optimize the function parameters passed by value?), that may give you a better idea I hope.
Related
I've seen it recommended (see here, for example) that if you want to create a copy of a parameter to a function, then you should pass it by value so that the copy is done automatically in the initialization list.
I.e., instead of writing
void some_function_ref(Some_class const& input)
{
Some_class copy_of_input(input);
// Do something with copy_of_input
}
, we should write
void some_function_val(Some_class input)
{
// Do something with input
}
. In C++11, there's an elegant reason for this: the second case will choose between calling the copy constructor or the move constructor depending on whether it is passed an lvalue or an rvalue, whereas the first case always calls the copy constructor.
For example, if we call
some_function_ref(function_that_returns_some_class())
, then the program will create a temporary to store the return value of function_that_returns_some_class, and then make a copy of this temporary inside the function. Whereas if we call
some_function_val(function_that_returns_some_class())
, then the program will create a temporary as before, but will now move the temporary into the function and use that value internally. We have avoided the needless copy.
However, some_function_ref has advantages of its own: if we want to make our code easy to debug, then it might be helpful to see the value of input that was passed into the function. If some_function has been passed an lvalue, then we can see this value by inspecting the value of the variable in the enclosing scope. However, if it has been passed an rvalue, we might have no way of seeing what it is.
Of course, there is a reason for this: if the function has been called with an rvalue, then some_function_val will use the move constructor on the temporary, which means that it the old value no longer exists! From an optimization point of view, this is great, since we've avoided an extra copy. From a debugging point of view, it's not so good.
What would be really nice is if we could see the original value of input in an unoptimized debug build of the code, but have the value moved in an optimized release build. For that reason, I would be tempted to use the first version (some_function_ref), under the assumption that the compiler will optimize away the copy and call the move constructor instead if I've turned optimization on, effectively converting some_function_ref into some_function_val.
Is this a reasonable assumption to make? Is the compiler (think gcc, clang etc.) smart enough to realize that input is only used to make a copy and optimize some_function_ref to do the same thing as some_function_val?
When returning a container, I always had to determine if I should use return value or use output parameter. If the performance matter, I chose the second option, otherwise I always chose the first option because it is more intuitive.
Frankly, I personally have been strongly objected about output parameters, possibly because of my mathematical background, but it was kind of OK to use them when I had no other options.
However, the things have been completely changed when it comes to generic programming. There are situations I encountered where a function may not know whether or not the object it returns is a huge container or just a simple value.
Consistently using output parameters may be the solution that I want to know if I can avoid. It is just so awkward if I have to do
int a;
f(a, other_arguments);
compared to
auto a = f(other_arguments);
Furthermore, sometimes the return type of f() has no default constructor. If output parameters are used, there is no graceful way to deal with that case.
I wonder if it is possible to return a "modifier object", a functor taking output parameters to modify them appropriately. (Perhaps this is a kind of lazy evaluation?) Well, returning such objects is not a problem, but the problem is I can't insert an appropriate overload of the assignment operator (or constructor) that takes such an object and triggers it to do its job, when the return type belongs to a library that I can't touch, e.g., std::vector. Of course, conversion operators are not helpful as they have no access to existing resources prepared for the target object.
Some people might ask why not use assign(); define a "generator object" which has begin() & end(), and pass those iterators to std::vector::assign. It is not a solution. For the first reason, the "generator object" does not have the full access to the target object and this may limit what could be done. For the second and more important reason, the call site of my function f() may also be a template which does not know the exact return type of f(), so it cannot determine which of the assignment operator or the assign() member function should be used.
I think that the "modifier object" approach to modify containers should have been already discussed in the past as it is not at all a new idea.
To sum up,
Is it possible to use return values to simulate what would happen when output parameters are used instead, in particular, when outputs are containers?
If not, was adding those supports to the standard discussed before? If it was, what were the issues? Is it a terrible idea?
Edit
The code example I've put above is misleading. The function f() may be used to initialize a local variable, but it may be also used to modify existing variables defined elsewhere. For the first case, as Rakete1111 mentioned, there is no problem with return by value as copy elision comes into play. But for the second case, there may be unnecessary resource releasing/acquiring.
I don't think your "modifier object" was ever proposed (AFAIK). And it will never go into the standard. Why? Because we already have a way to get rid of the expensive copy, and that is return by value (plus compiler optimizations).
Before C++17, compilers were allowed to do basically almost the same thing. This optimization is known as (N)RVO, which optimizes away the copy when returning a (named) temporary from a function.
auto a = f(other_arguments);
Will not return a temporary, then copy it into a. The compiler will optimize the copy away entirely, it is not needed. In theory, you cannot assume that your compiler supports this, but the three major ones do (clang, gcc, MSVC) so no need to worry - I don't know about ICC and the others, so I can't say.
So, as there is no copy (or move) involved, there is no performance penalty of using return values instead of output parameters (most probably, if for some reason your compiler doesn't support it, you'll get a move most of the time). You should always use return parameters if possible, and only use output parameters or some other technique if you measure that you get significantly better performance otherwise.
(Edited, based on comments)
You are right you should avoid output parameters if possible, because the code using them is harder to read and to debug.
Since C++11 we have a feature called move constructors (see reference). You can use std::move on all primitive and STL containers types. It is efficient (question: Efficiency difference between copy and move constructor), because you don't actually copy the values of the variables. Only the pointers are swapped. For your own complex types, you can write your own move constructor.
On the other hand, the only thing you can do is to return a reference to a temporary, which is of undefined behavior, for example:
#include <iostream>
int &&add(int initial, int howMany) {
return std::move(initial + howMany);
}
int main() {
std::cout << add(5, 2) << std::endl;
return 0;
}
Its output could be:
7
You could avoid the problem of temporary variables using global or static ones, but it is bad either. #Rakete is right, there is no good way of achieving it.
I want to construct an object with another using rvalue.
class BigDataClass{
public:
BigDataClass(); //some default BigData
BigDataClass(BigDataClass&& anotherBigData);
private:
BigDataClass(BigDataClass& anotherBigData);
BigDataPtr m_data;
};
So now I want to do something like:
BigDataClass someData;
BigDataClass anotherData(std::move(someData));
So now anotherData gets rValue. It's an eXpiring Value in fact, so as http://en.cppreference.com/w/cpp/utility/move states compiler now
has an oppourtunity to optimize the initialization of anotherData with moving
someData to another.
In my opinion we can in fact get 2 different things:
Optimized approach: data moved. It's optimized, fast and we're happy
Nonoptimized approach: data not moved. We have to copy data from object to another AND delete data from the first one(as far as I know after changing object to rvalue once we cannot use it, because it has got no ownership of data, that it held). In fact it can be even slower than initialization with lvalue referrence due to deletion operation.
Can we really get so unoptimized way of data initialization?
You said:
So now anotherData gets rValue. It's an eXpiring Value in fact, so as http://en.cppreference.com/w/cpp/utility/move states compiler now has an oppourtunity to optimize the initialization of anotherData with moving someData to another.
Actually, what it stated was:
Code that receives such an xvalue has the opportunity to optimize away unnecessary overhead by moving data out of the argument, leaving it in a valid but unspecified state.
That is, it's the code that's responsible for optimization here, not the compiler. All std::move(someData) does is cast its argument to an rvalue reference. Given that BigDataClass has a constructor that takes an rvalue reference, that constructor is preferred, and that constructor will be the one that is called. There isn't any room for change here from the compiler's point of view. Thus the code will do whatever the BigDataClass(BigDataClass&&) constructor does.
Looks like you are confused with what is optimization and what is not. Using move constructor (when available) is not an optimization, it is mandated by standard. It is not that the compiler has this opportunity, it has to do this.
On the other hand, copy elision is an optimization which compiler has an opportunity to perform. How relibale it is, depends on your compiler (though they are applying it pretty uniformely) and the actual function code.
You think about what the optimizer can do with move semantics. Simply nothing itself! You, the coder, has to implement the code which is the optimization compared against the constructor with a const ref.
The question can go to the opposite:
If the compiler already knows that you have a rvalue which is passed as const ref to a constructor, the compiler is able to do the construction as if the value is generated in the constructor itself. Copy eliding is done very often by up to date compilers. The question here is ( for me ) how many effort I should spend to write some rvalue reference constructions to get the same result as the compiler already builds on the fly for me.
OK, in c++11 you have a lot of opportunities to handle code for forwarding and moving by your algorithms. And yes, some benefit can be generated. But I see the benefit only for templated code where I have the need to move/forward some of the parameters to (meta)template functions.
And on the opposite: Handling rvalue references must taken with care and the meaning of a valid but undefined state rise some questions on every user who use your interface implementation. See also: What can I do with a moved-from object?
What kind of optimizations does rvalue guarantee
Simply nothing. You have to implement it!
This is probably a simple question, but this came across my mind. It is regarding the difference between the two functions below:
T func_one(T obj) { //for the purpose of this question,
return obj + obj; //T is a large object and has an overloaded '+' operator
}
T func_two(T obj) {
T output = obj + obj;
return output;
}
In func_one(), rather than creating an object T, assigning it a value and then returning the object, I just return the value itself without creating a new object. If T was a large object, would func_one() be more efficient than func_two() or does func_one() make an object T anyways when returning the sum of the two objects?
The compiler would optimize away fund_two into something similar to func_one which would then be optimized to something else, long story short, you need not to worry about this, unless you really do need to worry about this, then in that case you can look at the asm output.
Short answer: We can't know
Long answer: it depends highly on how T works and your compilers support for return value optimization.
Any function which returns by value can have RVO or NRVO optimization applied to it.
This means that it will construct the return value directly into the calling function, eliminating the copy constructor. As this is the problem with returning large objects by value, this will mean a substantial gain in performance.
The difference between func_one and func_two is that func_one returns an anonymous temporary value, an r-value; this means RVO can trivially be used. func_two returns a named value, an l-value, so NRVO, a much harder optimization, will be used. However, func_two is trivial, so it will almost certainly have NRVO applied, and both functions will be basically identical.
This is assuming you have a modern or even semi-modern compiler; if not, it will depend highly on how you implemented T.
If T has move semantics, your compiler will instead be able to move rather than copy. This should apply to both functions, as temporaries exist in both; however, as func_two returns a named value, it may not be capable of using move semantics. It's up to the compiler, and if the compiler isn't doing RVO or NRVO, I doubt it's doing move.
Finally, it depends on how + operator and = operator are implemented. If, for example, they were implemented as expression templates, then fun_two still requires an assignment, which will slow it down, where as func_one will simply return a highly optimized temporary.
In Summary
In almost all practical contexts, these are identical. In the vanishingly small window where your compiler is acting very strange, func_one is almost universally faster.
Modern compilers can transform the version with the extra variable to the one without (named return value optimization, this is quite a frequent source of questions here on SO, Why isn't the copy-constructor called when returning LOCAL variable for example). So this is not the overhead you should worry about.
The overhead you should worry about, is the function call overhead. An addition takes a modern CPU at most a single cycle. A function call takes between 10 and 20 cycles, depending on the amount of arguments.
I am a bit unsure what you mean with T in your question (is it a template parameter? is it a class? is it a placeholder for a type that you didn't want to disclose in your question?). However, the question whether you have a function call overhead problem depends on that type. And it depends on whether your compiler can inline your function.
Obviously, if it's inlined, you're fine, there's no function call overhead.
If T is a complex type with an expensive operator+() overload, then you are fine as well.
However, if T is int, for instance, and your function is not inlined, then you have roughly 90% overhead in your function.
In my current project I need to implement quite a few functions/methods that take some parameters and generate a collection of results (rather large). So in order to return this collection without copying, I can either create a new collection and return a smart pointer:
boost::shared_ptr<std::vector<Stuff> > generate();
or take a reference to a vector which will be populated:
void generate(std::vector<Stuff> &output);
Both approaches have benefits. The first clearly shows that the vector is the output of the function, it is trivial to use in a parallelized scenario, etc. The second might be more efficient when called in a loop (because we don't allocate memory every time), but then it is not that obvious that the parameter is the output, and someone needs to clean the old data from the vector...
Which would be more customary in real life (i.e. what is the best practise)? In C#/java I would argue that the first one, what is the case in C++?
Also, is it possible to effectively return a vector by value using C++11? What would the pitfalls be?
do correctness first, then optimize if necessary
with both move semantics and Return Value Optimization conspiring to make an ordinary function result non-copying, you would probably have to work at it to make it sufficiently inefficient to be worth optimization work
so, just return the collection as a function result, then MEASURE if you feel that it's too slow
You should return by value.
is it possible to effectively return a vector by value using C++11?
Yes, C++11 supports move semantics. You return a value, but the compiler knows it's a temporary, and therefore can invoke a special constructor (move constructor) that is especially designed to simply "steal the guts" of the returned object. After all, you won't use that temporary object anymore, so why copying it when you can just move its content?
Apart from this, it may be worth mentioning that most C++ compilers, even pre-C++11, implement (Named) Return Value Optimization, which would elide the copy anyway, incurring in no overhead. Thus, you may want to actually measure the performance penalty you (possibly) get before optimizing.
I think you should pass by reference, or return a shared pointer, only when you need reference semantics. This does not seem to be your case.
There is an alternative approach. If you can make your functions template, make them take an output iterator (whose type is a template argument) as argument:
tempalte<class OutputIterator>
void your_algorithm(OutputIterator out) {
for(/*condition*/) {
++out = /* calculation */;
}
}
This has the advantage that the caller can decide in what kind of collection he wants to store the result (the output iterator could for instance write directly to a file, or store the result in a std::vector, or filter it, etc.).
The best practise will probably be surprising to you. I would recommend returning by value in both C++03 and C++11.
In C++03, if you create a std::vector local to generate and return it, the copy may be elided by the compiler (and almost certainly will be). See C++03 §12.8/15:
in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object with the same cv-unqualified type as the function return type, the copy operation can be omitted by constructing the automatic object directly into the function's return value
In C++11, if you create a std::vector local to generate and return it, the copy will first be considered as a move first (which will already be very fast) and then that may be elided (and almost certainly will be). See C++11 §12.8/31:
in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv-unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value
And §12.8/32:
When the criteria for elision of a copy operation are met or would be met save for the fact that the source object is a function parameter, and the object to be copied is designated by an lvalue, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue.
So return by value!
Believe it or not, I'm going to suggest that instead of either of those approaches just take the obvious implementation and return by value! Compilers are very often able to optimize away the notional copy that would be induced, removing it completely. By writing the code in the most obvious manner you make it very clear to future maintainers what the intent is.
But let's say you try return by value and your program runs too slow and let's further suppose that your profiler shows that the return by value is in fact your bottleneck. In this case I would allocate the container on the heap and return as an auto_ptr in C++03 or a unique_ptr in C++11 to clearly indicate that ownership is being transferred and that the generate isn't keeping a copy of that shared_ptr for its own purposes later.
Finally, the series at http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/ provides a great perspective on almost the exact same question.