Why is "partial RVO" not performed? - c++

Please take a look at this silly function, which should only illustrate the problem and a simplification of the real code:
struct A;
A create(bool first){
A f(21), s(42);
if(first)
return f;
else
return s;
}
I understand, that because it is not clear which object will be returned during the compilation, we cannot expect return value optimization (RVO) to be always performed.
However, one maybe could expect RVO to be performed in 50% of the cases (assuming uniform distribution for true/false due to lack of further information): just decide for which case RVO (first==true or first==false) should be performed and apply it for this parameter-value, accepting that in the other case the copy constructor must be called.
Yet this "partial RVO" is not the case for all compilers I can get my hands on (see live with gcc, clang and MSVC) - in both cases (i.e. first==true or first==false) the copy-constructor is used and not omitted.
Is there something, that renders the "partial RVO" in the above case invalid or is this an unlikely case of missed optimization by all compilers?
Complete program:
#include <iostream>
struct A{
int val;
A(int val_):val(val_){}
A(const A&o):val(o.val){
std::cout<<"copying: "<<val<<"\n";
}
};
A create(bool first){
A f(21), s(42);
if(first)
return f;
else
return s;
}
int main(){
std::cout<<"With true: ";
create(true);
std::cout<<"With false: ";
create(false);
}

Let's consider what happens if RVO is done for f, meaning it is constructed directly in the return value. If first==true and f gets returned, great, no copy is needed. But if first==false then s gets returned instead, so the program will copy construct s over the top of f before the destructor for f has run. Then after that, the destructor for f will run, and now the return value is an invalid object that has already been destroyed!
If RVO is done for s instead the same argument applies, except that now the problem happens when first==true.
Whichever one you choose, you avoid a copy in 50% of cases and get undefined behaviour in the other 50% of cases! That's not a desirable optimization!
In order to make this work the order of destruction of the local variables would have to be altered so that f is destroyed before copying s into that memory location (or vice versa), and that's a very risky thing to mess with. The order of destruction is a fundamental property of C++ that should not be fiddled with, or you'll break RAII and who knows how many other assumptions.

My take on this, apart from reading Jonathan Wakely's answer with interest, is that one can always define a move constructor for the object being returned. This will then be favoured over the copy constructor if RVO cannot be applied for whatever reason and seems to me to be a good solution.
Things like std::vector define such a constructor, so you get that for free.

Related

How to force the call to move constructor and why should I do that?

I tested this code to see that the compiler automatically transfers temporary object to variables without needing a move constructor.
#include <iostream>
using namespace std;
class A
{
public:
A()
{
cout<<"Hi from default\n";
}
A(A && obj)
{
cout<<"Hi from move\n";
}
};
A getA()
{
A obj;
cout<<"from getA\n";
return obj;
}
int main()
{
A b(getA());
return 0;
}
This code prints "Hi from default from getA" and not the presumed "Hi from move"
In optimization terms, it's great. But, how to force the call to the move constructor without adding a copy? (if I wanted a specific behavior for my temporary objects)
Complementary question: I though that if I did not write a move constructor there would be a copy each time I would assign a rvalue to a lvalue (like in this code at the line A b(getA());). Since it's not the case and the compiler seems to do well, when is it really useful to implement the move semantics?
In optimisation terms, it's great. But, how to force the call to the move constructor without adding a copy ? (if I wanted a specific behavior for my temporary objects)
Normally this requires disabling an optimization flag in order to get this behavior. For gcc and clang you can use -fno-elide-constructors to turn off copy elison. For MSVS it will not do it in debug mode(optimizations turned off) but I'm not sure if they have a specific flag for this
You can also call std::move in the return statement which will disable the elision and force the compiler to generate the temporary and move from it.
return std::move(obj);
That said, RVO/NRVO is something that you should want. You shouldn't want to even have the temporary created if you can help it as less work done means more work you can do in the same time. To that end C++17 introduces guaranteed copy elision which stops those temporaries from even existing.
This doesn't mean you should not write a move constructor if you can (well if you follow the rule of zero then you would not write one and just use the compiler provided one). There may be times where the compiler cannot elide a temporary or you want to move an lvalue so it is still a useful thing to have.

What's happening in this return statement?

I'm reading on copy elision (and how it's supposed to be guaranteed in C++17) and this got me a bit confused (I'm not sure I know things I thought I knew before). So here's a minimal test case:
std::string nameof(int param)
{
switch (param)
{
case 1:
return "1"; // A
case 2:
return "2" // B
}
return std::string(); // C
}
The way I see it, cases A and B perform a direct construction on the return value so copy elision has no meaning here, while case C cannot perform copy elision because there are multiple return paths. Are these assumptions correct?
Also, I'd like to know if
there's a better way of writing the above (e.g. have a std::string retval; and always return that one or write cases A and B as return string("1") etc)
there's any move happening, for example "1" is a temporary but I'm assuming it's being used as a parameter for the constructor of std::string
there are optimization concerns I ommited (e.g. I believe C could be written as return{}, would that be a better choice?)
To make it NRVO-friendly, you should always return the same object. The value of the object might be different, but the object should be the same.
However, following above rule makes program harder to read, and often one should opt for readability over unnoticeable performance improvement. Since std::string has a move constructor defined, the difference between moving a pointer and a length and not doing so would be so tiny that I see no way of actually noticing this in the application.
As for your last question, return std::string() and return {} would be exactly the same.
There are also some incorrect statements in your question. For example, "1" is not a temporary. It's a string literal. Temporary is created from this literal.
Last, but not least, mandatory C++17 copy elision does not apply here. It is reserved for cases like
std::string x = "X";
which before the mandatory requirement could generate code to create a temporary std::string and initialize x with copy (or move) constructor.
In all cases, the copy might or might not be elided. Consider:
std::string j = nameof(whatever);
This could be implemented one of two ways:
Only one std::string object is ever constructed, j. (The copy is elided.)
A temporary std::string object is constructed, its value is copied to j, then the temporary is destroyed. (The function returns a temporary that is copied.)

How to write move so that it can potentially be optimized away?

Given the following code:
struct obj {
int i;
obj() : i(1) {}
obj(obj &&other) : i(other.i) {}
};
void f() {
obj o2(obj(obj(obj{})));
}
I expect release builds to only really create one object and never call a move constructor because the result is the same as if my code was executed. Most code is not that simple though, I can think of a few hard to predict side effects that could stop the optimizer from proving the "as if":
changes to global or "outside" things in either the move constructor or destructor.
potential exceptions in the move constructor or destructor (probably bad design anyway)
internal counting or caching mechanisms changing.
Since I don't use any of these often can I expect most of my moves in and out of functions which are later inlined to be optimized away or am I forgetting something?
P.S. I know that just because an optimization is possible does not mean it will be made by any given compiler.
This doesn't really have anything to do with the as-if rule. The compiler is allowed to elide moves and copies even if they have some side effect. It is the single optimization that a compiler is allowed to do that might change the result of your program. From §12.8/31:
When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class object, even if the copy/move constructor and/or destructor for the object have side effects.
So the compiler doesn't have to bother inspecting what happens inside your move constructor, it will likely get rid of any moves here anyway. To demonstrate this, consider the following example:
#include <iostream>
struct bad_mover
{
static int move_count;
bad_mover() = default;
bad_mover(bad_mover&& other) { move_count++; }
};
int bad_mover::move_count = 0;
int main(int argc, const char* argv[])
{
bad_mover b{bad_mover(bad_mover(bad_mover()))};
std::cout << "Move count: " << bad_mover::move_count << std::endl;
return 0;
}
Compiled with g++ -std=c++0x:
Move count: 0
Compiled with g++ -std=c++0x -fno-elide-constructors:
Move count: 3
However, I would question any reason you have for providing a move constructor that has additional side effects. The idea in allowing this optimization regardless of side effects is that a copy or move constructor shouldn't do anything other than copy or move. The program with the copy or move should be exactly the same as without.
Nonetheless, your calls to std::move are unnecessary. std::move is used to change an lvalue expression to an rvalue expression, but an expression that creates a temporary object is already an rvalue expression.
Using std::move( tmp(...) ) is completely pointless, the temporary tmp is already an rvalue, you don't need to use std::move to cast it to an rvalue.
Read this series of articles: Want Speed? Pass By Value
You'll learn more and understand better than you will by asking a question on Stackoverflow

Is RVO (Return Value Optimization) applicable for all objects?

Is RVO (Return Value Optimization) guaranteed or applicable for all objects and situations in C++ compilers (specially GCC)?
If answer is "no", what are the conditions of this optimization for a class/object? How can I force or encourage the compiler to do a RVO on a specific returned value?
Return Value Optimization can always be applied, what cannot be universally applied is Named Return Value Optimization. Basically, for the optimization to take place, the compiler must know what object is going to be returned at the place where the object is constructed.
In the case of RVO (where a temporary is returned) that condition is trivially met: the object is constructed in the return statement, and well, it is returned.
In the case of NRVO, you would have to analyze the code to understand whether the compiler can know or not that information. If the analysis of the function is simple, chances are that the compiler will optimize it (single return statement that does not contain a conditional, for example; multiple return statements of the same object; multiple return statements like T f() { if (condition) { T r; return r; } else { T r2; return r2; } } where the compiler knows that r or r2 will be returned...)
Note that you can only assume the optimization in simple cases, specifically, the example in wikipedia could actually be optimized by a smart enough compiler:
std::string f( bool x ) {
std::string a("a"), b("b");
if ( x ) return a;
else return b;
}
Can be rewritten by the compiler into:
std::string f( bool x ) {
if ( x ) {
std::string a("a"), b("b");
return a;
} else {
std::string a("a"), b("b");
return b;
}
}
And the compiler can know at this time that in the first branch a is to be constructed in place of the returned object, and in the second branch the same applies to b. But I would not count on that. If the code is complex, assume that the compiler will not be able to produce the optimization.
EDIT: There is one case that I have not mentioned explicitly, the compiler is not allowed (in most cases even if it was allowed, it could not possibly do it) to optimize away the copy from an argument to the function to the return statement:
T f( T value ) { return value; } // Cannot be optimized away --but can be converted into
// a move operation if available.
Is RVO (Return Value Optimization) guaranteed for all objects in gcc compilers?
No optimisation is ever guaranteed (though RVO is fairly dependable, there do exist some cases that throw it off).
If answer is "no", what is the conditions of this optimization for a class/object?
An implementation detail that's quite deliberately abstracted from you.
Neither know nor care about this, please.
To Jesper: if the object to be constructed is big, avoiding the copy might be necessary (or at the very least highly desirable).
If RVO happens, the copy is avoided and you need not write any more lines of code.
If it doesn't, you'll have to do it manually, writing extra scaffolding yourself. And this will probably involve designating a buffer in advance, forcing you to write a constructor for this empty (probably invalid, you can see how this is not clean) object and a method to ‘construct’ this invalid object.
So ‘It can reduce my lines of code if it's guaranteed. Isn't it?’ does not mean that Masoud is a moron. Unfortunately for him however, RVO is not guaranteed. You have to test if it happens and if it doesn't, write the scaffolding and pollute your design. It can't be herped.
Move semantics (new feature of C++11) is a solution to your problem, which allows you to use Type(Type &&r); (the move constructor) explicitly, instead of Type(const Type &r) (the copy constructor).
For example:
class String {
public:
char *buffer;
String(const char *s) {
int n = strlen(s) + 1;
buffer = new char[n];
memcpy(buffer, s, n);
}
~String() { delete [] buffer; }
String(const String &r) {
// traditional copy ...
}
String(String &&r) {
buffer = r.buffer; // O(1), No copying, saves time.
r.buffer = 0;
}
};
String hello(bool world) {
if (world) {
return String("Hello, world.");
} else {
return String("Hello.");
}
}
int main() {
String foo = hello();
std::cout <<foo.buffer <<std::endl;
}
And this will not trigger the copy constructor.
I don't have a yes or no answer, but you say that you can write fewer lines of code if the optimization you're looking for is guaranteed.
If you write the code you need to write, the program will always work, and if the optimization is there, it will work faster. If there is indeed a case where the optimization "fills in the blank" in the logic rather than the mechanics of the code and makes it work, or outright changes the logic, that seems like a bug that I'd want fixed rather than an implementation detail I'd want to rely on or exploit.

Strange behavior of copy-initialization, doesn't call the copy-constructor!

I was reading the difference between direct-initialization and copy-initialization (§8.5/12):
T x(a); //direct-initialization
T y = a; //copy-initialization
What I understand from reading about copy-initialization is that it needs accessible & non-explicit copy-constructor, or else the program wouldn't compile. I verified it by writing the following code:
struct A
{
int i;
A(int i) : i(i) { std::cout << " A(int i)" << std::endl; }
private:
A(const A &a) { std::cout << " A(const A &)" << std::endl; }
};
int main() {
A a = 10; //error - copy-ctor is private!
}
GCC gives an error (ideone) saying:
prog.cpp:8: error: ‘A::A(const A&)’ is private
So far everything is fine, reaffirming what Herb Sutter says,
Copy initialization means the object is initialized using the copy constructor, after first calling a user-defined conversion if necessary, and is equivalent to the form "T t = u;":
After that I made the copy-ctor accessible by commenting the private keyword. Now, naturally I would expect the following to get printed:
A(const A&)
But to my surprise, it prints this instead (ideone):
A(int i)
Why?
Alright, I understand that first a temporary object of type A is created out of 10 which is int type, by using A(int i), applying the conversion rule as its needed here (§8.5/14), and then it was supposed to call copy-ctor to initialize a. But it didn't. Why?
If an implementation is permitted to eliminate the need to call copy-constructor (§8.5/14), then why is it not accepting the code when the copy-constructor is declared private? After all, its not calling it. Its like a spoiled kid who first irritatingly asks for a specific toy, and when you give him one, the specific one, he throws it away, behind your back. :|
Could this behavior be dangerous? I mean, I might do some other useful thing in the copy-ctor, but if it doesn't call it, then does it not alter the behavior of the program?
Are you asking why the compiler does the access check? 12.8/14 in C++03:
A program is ill-formed if the copy
constructor or the copy assignment
operator for an object is implicitly
used and the special member function
is not accessible
When the implementation "omits the copy construction" (permitted by 12.8/15), I don't believe this means that the copy ctor is no longer "implicitly used", it just isn't executed.
Or are you asking why the standard says that? If copy elision were an exception to this rule about the access check, your program would be well-formed in implementations that successfully perform the elision, but ill-formed in implementations that don't.
I'm pretty sure the authors would consider this a Bad Thing. Certainly it's easier to write portable code this way -- the compiler tells you if you write code that attempts to copy a non-copyable object, even if the copy happens to be elided in your implementation. I suspect that it could also inconvenience implementers to figure out whether the optimization will be successful before checking access (or to defer the access check until after the optimization is attempted), although I have no idea whether that warranted consideration.
Could this behavior be dangerous? I
mean, I might do some other useful
thing in the copy-ctor, but if it
doesn't call it, then does it not
alter the behavior of the program?
Of course it could be dangerous - side-effects in copy constructors occur if and only if the object is actually copied, and you should design them accordingly: the standard says copies can be elided, so don't put code in a copy constructor unless you're happy for it to be elided under the conditions defined in 12.8/15:
MyObject(const MyObject &other) {
std::cout << "copy " << (void*)(&other) << " to " << (void*)this << "\n"; // OK
std::cout << "object returned from function\n"; // dangerous: if the copy is
// elided then an object will be returned but you won't see the message.
}
C++ explicitly allows several optimizations involving the copy constructor that actually change the semantics of the program. (This is in contrast with most optimizations, which do not affect the semantics of the program). In particular, there are several cases where the compiler is allowed to re-use an existing object, rather than copying one, if it knows that the existing object will become unreachable. This (copy construction) is one such case; another similar case is the "return value optimization" (RVO), where if you declare the variable that holds the return value of a function, then C++ can choose to allocate that on the frame of the caller, so that it doesn't need to copy it back to the caller when the function completes.
In general, in C++, you are playing with fire if you define a copy constructor that has side effects or does anything other than just copying.
In any compiler, syntax [and semantic] analysis process are done prior to the code optimization process.
The code must be syntactically valid otherwise it won't even compile. Its only in the later phase (i.e code optimization) that the compiler decides to elide the temporary that it creates.
So you need an accessible copy c-tor.
Here you can find this (with your comment ;)):
[the standard] also says that the temporary copy
can be elided, but the semantic
constraints (eg. accessibility) of the
copy constructor still have to be
checked.
RVO and NRVO, buddy. Perfectly good case of copy ellision.
This is an optimization by the compiler.
In evaluating: A a = 10; instead of:
first constructing a temporary object through A(int);
constructing a through the copy constructor and passing in the temporary;
the compiler will simply construct a using A(int).