Is return value optimization dependable in C++? - c++

As this wiki page says (code exerted as below), return value optimization is an allowed by C++ compiler, but still depends on the implementation. To reduce the cost of copying, is it recommended to do optimize it manually (assign the object of function to a reference, like const C& obj = f();) or leave the compiler to do such optimization in practice?
#include <iostream>
struct C {
C() {}
C(const C&) { std::cout << "A copy was made.\n"; }
};
C f() {
return C();
}
int main() {
std::cout << "Hello World!\n";
C obj = f();
}
EDIT: Update the change as const reference.

You can't (portably) use the temporary return value to initialise a non-const reference, so that's certainly not recommended.
Using it to initialise a const reference wouldn't have any effect on whether or not the copy/move of the return expression's value might be elided; although it would eliminate the notional copy/move used to initialise the variable from the returned value, whether or not that might have been elided. Of course, that's not the same as initialising a (non-reference) variable, since you can't modify it.
In practice, any compiler with a decent optimiser will elide copies and moves wherever it's allowed to. If you're not using a decent optimiser, then you can't expect decent performance anyway.

To manually make sure you don't get any redundant copies of objects, you need to do a bit more than what you did so far. Earlier I answered that it wouldn't be possible, but I was wrong. Also, your use of const& may disallow certain operations on the returned value that you do want to allow. Here's what I would do if you need to do the optimisations manually:
#include <iostream>
struct S {
S()
{ std::cout << "default constructor\n"; }
S(const S &)
{ std::cout << "copy constructor\n"; }
S(S &&)
{ std::cout << "move constructor\n"; }
~S()
{ std::cout << "destructor\n"; }
};
S f() { return {}; }
int main() {
auto&&s = f();
std::cout << "main\n";
}
This prints "default constructor", followed by "main", and then "destructor". This is the output regardless of whether any copy elision takes place. Inside main, s is a named reference, so it is an lvalue, and it is not const-qualified. You can do everything with it that you otherwise could.
Given that it turns out fairly easy to avoid relying on copy elision in cases such as these, as long as you take care to pay attention to it from the start, it may be worth your efforts if you have to worry about other compilers not performing copy elision. Most compilers are capable of that, and there is a fair chance that if a compiler doesn't, it will have other, bigger, problems anyway, so there is a good argument for not worrying about it.
However, at the same time, copy elision is somewhat unreliable: even current optimising compilers do not always perform it, simply because there may be corner cases where copy elision would make sense, but is not permitted by the standard or not possible for that particular implementation. Forcing yourself to write code that doesn't rely on copy elision means you cannot get stuck in that situation.
That said, there are still some cases where copy elision can only realistically be eliminated by optimising compilers, so you may have no choice but to rely on it:
Suppose we add void m(); to S's definition. Suppose we now edit f to
S f() {
S s;
s.m();
return s;
}
This is more difficult to rewrite into a form that guarantees no redundant copies. Yet at the same time, copies are unnecessary, as can easily be determined from the fact that with GCC (and probably other compilers too), by default, no copies are made.
My final conclusion is that it's probably not worth optimising for compilers that don't perform RVO, but it is worth thinking carefully about what exactly makes it work, and writing code in such a way that RVO remains not only possible, but becomes something a compiler is very likely to do.

Related

How to force the call to move constructor and why should I do that?

I tested this code to see that the compiler automatically transfers temporary object to variables without needing a move constructor.
#include <iostream>
using namespace std;
class A
{
public:
A()
{
cout<<"Hi from default\n";
}
A(A && obj)
{
cout<<"Hi from move\n";
}
};
A getA()
{
A obj;
cout<<"from getA\n";
return obj;
}
int main()
{
A b(getA());
return 0;
}
This code prints "Hi from default from getA" and not the presumed "Hi from move"
In optimization terms, it's great. But, how to force the call to the move constructor without adding a copy? (if I wanted a specific behavior for my temporary objects)
Complementary question: I though that if I did not write a move constructor there would be a copy each time I would assign a rvalue to a lvalue (like in this code at the line A b(getA());). Since it's not the case and the compiler seems to do well, when is it really useful to implement the move semantics?
In optimisation terms, it's great. But, how to force the call to the move constructor without adding a copy ? (if I wanted a specific behavior for my temporary objects)
Normally this requires disabling an optimization flag in order to get this behavior. For gcc and clang you can use -fno-elide-constructors to turn off copy elison. For MSVS it will not do it in debug mode(optimizations turned off) but I'm not sure if they have a specific flag for this
You can also call std::move in the return statement which will disable the elision and force the compiler to generate the temporary and move from it.
return std::move(obj);
That said, RVO/NRVO is something that you should want. You shouldn't want to even have the temporary created if you can help it as less work done means more work you can do in the same time. To that end C++17 introduces guaranteed copy elision which stops those temporaries from even existing.
This doesn't mean you should not write a move constructor if you can (well if you follow the rule of zero then you would not write one and just use the compiler provided one). There may be times where the compiler cannot elide a temporary or you want to move an lvalue so it is still a useful thing to have.

How to write move so that it can potentially be optimized away?

Given the following code:
struct obj {
int i;
obj() : i(1) {}
obj(obj &&other) : i(other.i) {}
};
void f() {
obj o2(obj(obj(obj{})));
}
I expect release builds to only really create one object and never call a move constructor because the result is the same as if my code was executed. Most code is not that simple though, I can think of a few hard to predict side effects that could stop the optimizer from proving the "as if":
changes to global or "outside" things in either the move constructor or destructor.
potential exceptions in the move constructor or destructor (probably bad design anyway)
internal counting or caching mechanisms changing.
Since I don't use any of these often can I expect most of my moves in and out of functions which are later inlined to be optimized away or am I forgetting something?
P.S. I know that just because an optimization is possible does not mean it will be made by any given compiler.
This doesn't really have anything to do with the as-if rule. The compiler is allowed to elide moves and copies even if they have some side effect. It is the single optimization that a compiler is allowed to do that might change the result of your program. From §12.8/31:
When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class object, even if the copy/move constructor and/or destructor for the object have side effects.
So the compiler doesn't have to bother inspecting what happens inside your move constructor, it will likely get rid of any moves here anyway. To demonstrate this, consider the following example:
#include <iostream>
struct bad_mover
{
static int move_count;
bad_mover() = default;
bad_mover(bad_mover&& other) { move_count++; }
};
int bad_mover::move_count = 0;
int main(int argc, const char* argv[])
{
bad_mover b{bad_mover(bad_mover(bad_mover()))};
std::cout << "Move count: " << bad_mover::move_count << std::endl;
return 0;
}
Compiled with g++ -std=c++0x:
Move count: 0
Compiled with g++ -std=c++0x -fno-elide-constructors:
Move count: 3
However, I would question any reason you have for providing a move constructor that has additional side effects. The idea in allowing this optimization regardless of side effects is that a copy or move constructor shouldn't do anything other than copy or move. The program with the copy or move should be exactly the same as without.
Nonetheless, your calls to std::move are unnecessary. std::move is used to change an lvalue expression to an rvalue expression, but an expression that creates a temporary object is already an rvalue expression.
Using std::move( tmp(...) ) is completely pointless, the temporary tmp is already an rvalue, you don't need to use std::move to cast it to an rvalue.
Read this series of articles: Want Speed? Pass By Value
You'll learn more and understand better than you will by asking a question on Stackoverflow

copy elision method

From the standard definition of copy elision method:
In C++ computer programming, copy elision refers to a compiler optimization technique that eliminates unnecessary copying of objects.
Let us consider following code:
#include <cstdlib>
#include <iostream>
using namespace std;
int n=0;
struct C
{
C (int) {}
C(const C&) {++n;}
};
int main(int argc, char *argv[])
{
C c1(42);
C c2=42;
return n;
}
This line "return n" will returns either 0 or 1, depending on whether the copy was elided.
Also consider this code:
#include <iostream>
struct C {
C() {}
C(const C&) { std::cout << "Hello World!\n"; }
};
void f() {
C c;
throw c; // copying the named object c into the exception object.
} // It is unclear whether this copy may be elided.
int main() {
try {
f();
}
catch(C c) {
}
}
It says that
// copying the exception object into the temporary in the exception declaration.
//It is also unclear whether this copy may be elided.
So my question is how useful is implement such optimization method, if sometimes results are undefined? And in general how often it is used?
The important bit is that the standard explicitly allows for this, and that means that you cannot assume that the side effects of a copy constructor will be executed as the copies might be elided. The standard requires that the implementation of a copy-constructor has copy-constructor semantics, that is, has as whole purpose the generation of a second object semantically equivalent in your domain to the original object. If your program complies with that, then the optimization will not affect the program.
It is true, on the other hand, that this is the only situation I can think where the standard allows for different visible outcomes from the same program depending on what the compiler does, but you have been advised that you should not have side effects in your copy constructor (or rather, you cannot depend on the exact number of copies performed).
As to whether it is worth it, yes it is. In many cases copies are quite expensive (I am intentionally ignoring move-constructors in C++11 from the discussion). Consider a function that returns a vector<int>, if the copy is not elided, then another dynamic allocation is required, copy of all of the vector contents and then release of the original block of memory all three operations can be expensive.
Alternatively, you could force users to change their code to create an empty object and pass it by reference, but that will make code harder to read.

compiler optimization

So I have a question for you. :)
Can you tell me the output the following code should produce?
#include <iostream>
struct Optimized
{
Optimized() { std::cout << "ctor" << std::endl; }
~Optimized() { std::cout << "dtor" << std::endl; }
Optimized(const Optimized& copy) { std::cout << "copy ctor" << std::endl; }
Optimized(Optimized&& move) { std::cout << "move ctor" << std::endl; }
const Optimized& operator=(const Optimized& rhs) { std::cout << "assignment operator" << std::endl; return *this; }
Optimized& operator=(Optimized&& lhs) { std::cout << "move assignment operator" << std::endl; return *this; }
};
Optimized TestFunction()
{
Optimized a;
Optimized b = a;
return b;
}
int main(int argc, char* argv[])
{
Optimized test = TestFunction();
return 0;
}
My first response would be:
ctor
copy ctor
move ctor
dtor
dtor
dtor
and it IS true, but only if compiler optimization is turned off. When optimization is turned ON then the output is entirely different. With optimization turned on, the output is:
ctor
copy ctor
dtor
dtor
With compiler optimization, the test variable is the return variable.
My question is, what conditions would cause this to not be optimized this way?
I have always been taught that returning a struct/class which results in extra copy constructors could better be optimized by being passed in as a reference but the compiler is doing that for me. So is return a structure still considered bad form?
This is known as Copy Elision and is a special handling instead of copying/moving.
The optimization is specifically allowed by the Standard, as long as it would be possible to copy/move (ie, the method is declared and accessible).
The implementation in a compiler is generally referred to, in this case, as Return Value Optimization. There are two variations:
RVO: when you return a temporary (return "aa" + someString;)
NRVO: N for Named, when you return an object that has a name
Both are implemented by major compilers, but the latter may kick in only at higher optimization levels as it is more difficult to detect.
Therefore, to answer your question about returning structs: I would recommend it. Consider:
// Bad
Foo foo;
bar(foo);
-- foo can be modified here
// Good
Foo const foo = bar();
The latter is not only clearer, it also allows const enforcement!
Both outputs are permissible. The C++03 language standard says, in clause 12.8/15:
When certain criteria are met, an implementation is allowed to omit the copy construction of a class object,
even if the copy constructor and/or destructor for the object have side effects. In such cases, the implementation
treats the source and target of the omitted copy operation as simply two different ways of referring to
the same object, and the destruction of that object occurs at the later of the times when the two objects
would have been destroyed without the optimization.111) This elision of copy operations is permitted in the
following circumstances (which may be combined to eliminate multiple copies):
in a return statement in a function with a class return type, when the expression is the name of a
non-volatile automatic object with the same cv-unqualified type as the function return type, the copy
operation can be omitted by constructing the automatic object directly into the function’s return value
when a temporary class object that has not been bound to a reference (12.2) would be copied to a class
object with the same cv-unqualified type, the copy operation can be omitted by constructing the temporary
object directly into the target of the omitted copy
The output this code will produce is unpredictable, since the language specification explicitly allows optional elimination (elision) of "unnecessary" temporary copies of class objects even if their copy constructors have side effects.
Whether this will happen or not might depend on may factors, including the compiler optimization settings.
In my opinion calling the above copy elision an "optimization" is not entirely correct (although the desire to use this term here is perfectly understandable and it is widely used for this purpose). I'd say that the term optimization should be reserved to situations when the compiler deviates from the behavior of the abstract C++ machine while preserving the observable behavior of the program. In other words, true optimization implies violation of the abstract requirements of the language specification. Since in this case there's no violation (the copy elision is explicitly allowed by the standard), there's no real "optimization". What we observe here is just how the C++ language works at its abstract level. No need to involve the concept of "optimization" at all.
Even when passing back by value the compiler can optimise the extra copy away using Return Value Optimisation see; http://en.wikipedia.org/wiki/Return_value_optimization

How to check for C++ copy ellision

I ran across this article on copy ellision in C++ and I've seen comments about it in the boost library. This is appealing, as I prefer my functions to look like
verylargereturntype DoSomething(...)
rather than
void DoSomething(..., verylargereturntype& retval)
So, I have two questions about this
Google has virtually no documentation on this at all, how real is this?
How can I check that this optimization is actually occuring? I assume it involves looking at the assembly, but lets just say that isn't my strong suit. If anyone can give a very basic example as to what successful ellision looks like, that would be very useful
I won't be using copy ellision just to prettify things, but if I can be guaranteed that it works, it sounds pretty useful.
I think this is a very commonly applied optimization because:
it's not difficult for the compiler to do
it can be a huge gain
it's an area of C++ that was a commonly critiqued before the optimization became common
If you're just curious, put a debug printf() in your copy constructor:
class foo {
public:
foo(): x(0) {};
foo(int x_) : x( x_) {};
foo( foo const& other) : x( other.x) {
printf( "copied a foo\n");
};
static foo foobar() {
foo tmp( 2);
return tmp;
}
private:
int x;
};
int main()
{
foo myFoo;
myFoo = foo::foobar();
return 0;
}
Prints out "copied a foo" when I run an unoptimmized build, but nothing when I build optimized.
From your cited article:
Although copy elision is never required by the standard, recent versions of every compiler I’ve tested do perform these optimizations today. But even if you don’t feel comfortable returning heavyweight objects by value, copy elision should still change the way you write code.
It is better known as Return Value Optimization.
The only way to know for sure is to look at the assembly, but you're asking the wrong question. You don't need to know if the compiler is eliding the copy unless it matters to the program timing. A profiler should easily tell you if you're spending too much time in the copy constructor.
The poor man's way to figure it out is to put a static counter in the copy constructor and try both forms of your function. If the counts are the same, you've successfully avoided the copy.
Google "Named Return Value Optimization" and "Return Value Optimization" instead. Modern compilers will in fact not perform the copy in many cases.
You can check if it's occurring by returning a type with side effects -- such as printing a message. Wikipedia has some good examples of where program output changes when RVO and/or NRVO is in effect.
Example of what it looks like:
#include <iostream>
struct Foo {
int a;
Foo(int a) : a(a) {}
Foo(const Foo &rhs) : a(rhs.a) { std::cout << "copying\n"; }
};
int main() {
Foo f = Foo(1);
}
If you see no output, then copy elision has taken place. That's elision of a copy from an initializer. The other legal case of copy elision is a return value, and is tested by:
Foo getFoo() {
return Foo(1);
}
int main() {
Foo f = getFoo();
}
or more excitingly for a named return value:
Foo getFoo() {
Foo f(1);
return f;
}
int main() {
Foo f = getFoo();
}
g++ performs all those elisions for me with no optimisation flags, but you can't really know whether more complex code will outwit the compiler.
Note that copy elision doesn't help with assignment, so the following will always result in a call to operator= if that operator prints anything:
Foo f(1);
f = getFoo();
Returning by value therefore can still result in "a copy", even if copy constructor elision is performed. So for chunking great classes it's still a performance consideration at the design stage. You don't want to write your code such that fixing it later will be a big deal if it turns out your app spends a significant proportion of its time in copying that could have been avoided.
To answer question 2, you could write a demo program where you write a class DemoReturnType; which has instrumented constructors and destructors which just write to cout when they are called. This should give you enough information about what your compiler is capable of.
Rvalue references solve this problem in C++0x. Whether or not you can obtain an rvalue-enabled compiler is another question - last time I checked only Visual Studio 2010 supports it.