Passing rvalue references vs non-const lvalue references - c++

If I have no use for a variable after I pass it to a function, does it matter whether I pass it a non-const lvalue reference or use std::move to pass it an rvalue reference. The assumption is that there are two different overloads. The only difference in the two cases is the lifetime of the passed object, which ends earlier if I pass by rvalue reference. Are there other factors to consider?
If I have a function foo overloaded like:
void foo(X& x);
void foo(X&& x);
X x;
foo(std::move(x)); // Does it matter if I called foo(x) instead?
// ... no further accesses to x
// end-of-scope

The lifetime of an object does not end when it is passed by rvalue reference. The rvalue reference merely gives foo permission to take ownership of its argument and potentially change its value to nonsense. This might involve deallocating its members, which is a kind of end of lifetime, but the argument itself lives to the end of the scope of its declaration.
Using std::move on the last access is idiomatic. There is no potential downside. Presumably if there are two overloads, the rvalue reference one has the same semantics but higher efficiency. Of course, they could do completely different things, just for the sake of insane sadism.

It depends on what you do in foo():
Inside foo(), if you store the argument in some internal storage, then yes it does matter, from readability point of view, because it is explicit at the call site that this particular argument is being moved and it should not be used here at call site, after the function call returns.
If you simply read/write its value, then it doesn't matter. Note that even if you pass by T&, the argument can still be moved to some internal storage, but that is less preferred approach — in fact it should be considered a dangerous approach.
Also note that std::move does NOT actually move the object. It simply makes the object moveable. An object is moved if it invokes the move-constructor or move-assignment:
void f(X && x) { return; }
void g(X x) { return; }
X x1,x2;
f(std::move(x1)); //x1 is NOT actually moved (no move constructor invocation).
g(std::move(x2)); //x2 is actually moved (by the move-constructor).
//here it is safe to use x1
//here it is unsafe to use x2
Alright it is more complex than this. Consider another example:
void f(X && x) { vec_storage.push_back(std::move(x)); return; }
void g(X x) { return; }
X x1,x2;
f(std::move(x1)); //x1 is actually moved (move-constructor invocation in push_back)
g(std::move(x2)); //x2 is actually moved (move-constructor invocation when passing argument by copy).
//here it is unsafe to use x1 and x2 both.
Hope that helps.

Related

Passing an argument value from constructor to a function that takes a reference [duplicate]

Why is it not allowed to get non-const reference to a temporary object,
which function getx() returns? Clearly, this is prohibited by C++ Standard
but I am interested in the purpose of such restriction, not a reference to the standard.
struct X
{
X& ref() { return *this; }
};
X getx() { return X();}
void g(X & x) {}
int f()
{
const X& x = getx(); // OK
X& x = getx(); // error
X& x = getx().ref(); // OK
g(getx()); //error
g(getx().ref()); //OK
return 0;
}
It is clear that the lifetime of the object cannot be the cause, because
constant reference to an object is not prohibited by C++ Standard.
It is clear that the temporary object is not constant in the sample above, because calls to non-constant functions are permitted. For instance, ref() could modify the temporary object.
In addition, ref() allows you to fool the compiler and get a link to this temporary object and that solves our problem.
In addition:
They say "assigning a temporary object to the const reference extends the lifetime of this object" and " Nothing is said about non-const references though".
My additional question. Does following assignment extend the lifetime of temporary object?
X& x = getx().ref(); // OK
From this Visual C++ blog article about rvalue references:
... C++ doesn't want you to accidentally
modify temporaries, but directly
calling a non-const member function on
a modifiable rvalue is explicit, so
it's allowed ...
Basically, you shouldn't try to modify temporaries for the very reason that they are temporary objects and will die any moment now. The reason you are allowed to call non-const methods is that, well, you are welcome to do some "stupid" things as long as you know what you are doing and you are explicit about it (like, using reinterpret_cast). But if you bind a temporary to a non-const reference, you can keep passing it around "forever" just to have your manipulation of the object disappear, because somewhere along the way you completely forgot this was a temporary.
If I were you, I would rethink the design of my functions. Why is g() accepting reference, does it modify the parameter? If no, make it const reference, if yes, why do you try to pass temporary to it, don't you care it's a temporary you are modifying? Why is getx() returning temporary anyway? If you share with us your real scenario and what you are trying to accomplish, you may get some good suggestions on how to do it.
Going against the language and fooling the compiler rarely solves problems - usually it creates problems.
Edit: Addressing questions in comment:
1) `X& x = getx().ref(); // OK when will x die?` - I don't know and I don't care, because this is exactly what I mean by "going against the language". The language says "temporaries die at the end of the statement, unless they are bound to const reference, in which case they die when the reference goes out of scope". Applying that rule, it seems x is already dead at the beginning of the next statement, since it's not bound to const reference (the compiler doesn't know what ref() returns). This is just a guess however.
I stated the purpose clearly: you are not allowed to modify temporaries, because it just does not make sense (ignoring C++0x rvalue references). The question "then why am I allowed to call non-const members?" is a good one, but I don't have better answer than the one I already stated above.
Well, if I'm right about x in X& x = getx().ref(); dying at the end of the statement, the problems are obvious.
Anyway, based on your question and comments I don't think even these extra answers will satisfy you. Here is a final attempt/summary: The C++ committee decided it doesn't make sense to modify temporaries, therefore, they disallowed binding to non-const references. May be some compiler implementation or historic issues were also involved, I don't know. Then, some specific case emerged, and it was decided that against all odds, they will still allow direct modification through calling non-const method. But that's an exception - you are generally not allowed to modify temporaries. Yes, C++ is often that weird.
In your code getx() returns a temporary object, a so-called "rvalue". You can copy rvalues into objects (aka. variables) or bind them to to const references (which will extend their life-time until the end of the reference's life). You cannot bind rvalues to non-const references.
This was a deliberate design decision in order to prevent users from accidentally modifying an object that is going to die at the end of the expression:
g(getx()); // g() would modify an object without anyone being able to observe
If you want to do this, you will have to either make a local copy or of the object first or bind it to a const reference:
X x1 = getx();
const X& x2 = getx(); // extend lifetime of temporary to lifetime of const reference
g(x1); // fine
g(x2); // can't bind a const reference to a non-const reference
Note that the next C++ standard will include rvalue references. What you know as references is therefore becoming to be called "lvalue references". You will be allowed to bind rvalues to rvalue references and you can overload functions on "rvalue-ness":
void g(X&); // #1, takes an ordinary (lvalue) reference
void g(X&&); // #2, takes an rvalue reference
X x;
g(x); // calls #1
g(getx()); // calls #2
g(X()); // calls #2, too
The idea behind rvalue references is that, since these objects are going to die anyway, you can take advantage of that knowledge and implement what's called "move semantics", a certain kind of optimization:
class X {
X(X&& rhs)
: pimpl( rhs.pimpl ) // steal rhs' data...
{
rhs.pimpl = NULL; // ...and leave it empty, but deconstructible
}
data* pimpl; // you would use a smart ptr, of course
};
X x(getx()); // x will steal the rvalue's data, leaving the temporary object empty
What you are showing is that operator chaining is allowed.
X& x = getx().ref(); // OK
The expression is 'getx().ref();' and this is executed to completion before assignment to 'x'.
Note that getx() does not return a reference but a fully formed object into the local context. The object is temporary but it is not const, thus allowing you to call other methods to compute a value or have other side effects happen.
// It would allow things like this.
getPipeline().procInstr(1).procInstr(2).procInstr(3);
// or more commonly
std::cout << getManiplator() << 5;
Look at the end of this answer for a better example of this
You can not bind a temporary to a reference because doing so will generate a reference to an object that will be destroyed at the end of the expression thus leaving you with a dangling reference (which is untidy and the standard does not like untidy).
The value returned by ref() is a valid reference but the method does not pay any attention to the lifespan of the object it is returning (because it can not have that information within its context). You have basically just done the equivalent of:
x& = const_cast<x&>(getX());
The reason it is OK to do this with a const reference to a temporary object is that the standard extends the lifespan of the temporary to the lifespan of the reference so the temporary objects lifespan is extended beyond the end of the statement.
So the only remaining question is why does the standard not want to allow reference to temporaries to extend the life of the object beyond the end of the statement?
I believe it is because doing so would make the compiler very hard to get correct for temporary objects. It was done for const references to temporaries as this has limited usage and thus forced you to make a copy of the object to do anything useful but does provide some limited functionality.
Think of this situation:
int getI() { return 5;}
int x& = getI();
x++; // Note x is an alias to a variable. What variable are you updating.
Extending the lifespan of this temporary object is going to be very confusing.
While the following:
int const& y = getI();
Will give you code that it is intuitive to use and understand.
If you want to modify the value you should be returning the value to a variable. If you are trying to avoid the cost of copying the obejct back from the function (as it seems that the object is copy constructed back (technically it is)). Then don't bother the compiler is very good at 'Return Value Optimization'
Why is discussed in the C++ FAQ (boldfacing mine):
In C++, non-const references can bind to lvalues and const references can bind to lvalues or rvalues, but there is nothing that can bind to a non-const rvalue. That's to protect people from changing the values of temporaries that are destroyed before their new value can be used. For example:
void incr(int& a) { ++a; }
int i = 0;
incr(i); // i becomes 1
incr(0); // error: 0 is not an lvalue
If that incr(0) were allowed either some temporary that nobody ever saw would be incremented or - far worse - the value of 0 would become 1. The latter sounds silly, but there was actually a bug like that in early Fortran compilers that set aside a memory location to hold the value 0.
The main issue is that
g(getx()); //error
is a logical error: g is modifying the result of getx() but you don't have any chance to examine the modified object. If g didn't need to modify its parameter then it wouldn't have required an lvalue reference, it could have taken the parameter by value or by const reference.
const X& x = getx(); // OK
is valid because you sometimes need to reuse the result of an expression, and it's pretty clear that you're dealing with a temporary object.
However it is not possible to make
X& x = getx(); // error
valid without making g(getx()) valid, which is what the language designers were trying to avoid in the first place.
g(getx().ref()); //OK
is valid because methods only know about the const-ness of the this, they don't know if they are called on an lvalue or on an rvalue.
As always in C++, you have a workaround for this rule but you have to signal the compiler that you know what you're doing by being explicit:
g(const_cast<x&>(getX()));
Seems like the original question as to why this is not allowed has been answered clearly: "because it is most likely an error".
FWIW, I thought I'd show how to it could be done, even though I don't think it's a good technique.
The reason I sometimes want to pass a temporary to a method taking a non-const reference is to intentionally throw away a value returned by-reference that the calling method doesn't care about. Something like this:
// Assuming: void Person::GetNameAndAddr(std::string &name, std::string &addr);
string name;
person.GetNameAndAddr(name, string()); // don't care about addr
As explained in previous answers, that doesn't compile. But this compiles and works correctly (with my compiler):
person.GetNameAndAddr(name,
const_cast<string &>(static_cast<const string &>(string())));
This just shows that you can use casting to lie to the compiler. Obviously, it would be much cleaner to declare and pass an unused automatic variable:
string name;
string unused;
person.GetNameAndAddr(name, unused); // don't care about addr
This technique does introduce an unneeded local variable into the method's scope. If for some reason you want to prevent it from being used later in the method, e.g., to avoid confusion or error, you can hide it in a local block:
string name;
{
string unused;
person.GetNameAndAddr(name, unused); // don't care about addr
}
-- Chris
Why would you ever want X& x = getx();? Just use X x = getx(); and rely on RVO.
The evil workaround involves the 'mutable' keyword. Actually being evil is left as an exercise for the reader. Or see here: http://www.ddj.com/cpp/184403758
Excellent question, and here's my attempt at a more concise answer (since a lot of useful info is in comments and hard to dig out in the noise.)
Any reference bound directly to a temporary will extend its life [12.2.5]. On the other hand, a reference initialized with another reference will not (even if it's ultimately the same temporary). That makes sense (the compiler doesn't know what that reference ultimately refers to).
But this whole idea is extremely confusing. E.g. const X &x = X(); will make the temporary last as long as the x reference, but const X &x = X().ref(); will NOT (who knows what ref() actually returned). In the latter case, the destructor for X gets called at the end of this line. (This is observable with a non-trivial destructor.)
So it seems generally confusing and dangerous (why complicate the rules about object lifetimes?), but presumably there was a need at least for const references, so the standard does set this behavior for them.
[From sbi comment]: Note that the fact that binding it to a const reference enhances a
temporary's lifetimes is an exception that's been added deliberately
(TTBOMK in order to allow manual optimizations). There wasn't an
exception added for non-const references, because binding a temporary
to a non-const reference was seen to most likely be a programmer
error.
All temporaries do persist until the end of the full-expression. To make use of them, however, you need a trick like you have with ref(). That's legal. There doesn't seem to be a good reason for the extra hoop to jump through, except to remind the programmer that something unusual is going on (namely, a reference parameter whose modifications will be quickly lost).
[Another sbi comment] The reason Stroustrup gives (in D&E) for disallowing the binding of
rvalues to non-const references is that, if Alexey's g() would modify
the object (which you'd expect from a function taking a non-const
reference), it would modify an object that's going to die, so nobody
could get at the modified value anyway. He says that this, most
likely, is an error.
"It is clear that the temporary object is not constant in the sample above, because calls
to non-constant functions are permitted. For instance, ref() could modify the temporary
object."
In your example getX() does not return a const X so you are able to call ref() in much the same way as you could call X().ref(). You are returning a non const ref and so can call non const methods, what you can't do is assign the ref to a non const reference.
Along with SadSidos comment this makes your three points incorrect.
I have a scenario I would like to share where I wish I could do what Alexey is asking. In a Maya C++ plugin, I have to do the following shenanigan in order to get a value into a node attribute:
MFnDoubleArrayData myArrayData;
MObject myArrayObj = myArrayData.create(myArray);
MPlug myPlug = myNode.findPlug(attributeName);
myPlug.setValue(myArrayObj);
This is tedious to write, so I wrote the following helper functions:
MPlug operator | (MFnDependencyNode& node, MObject& attribute){
MStatus status;
MPlug returnValue = node.findPlug(attribute, &status);
return returnValue;
}
void operator << (MPlug& plug, MDoubleArray& doubleArray){
MStatus status;
MFnDoubleArrayData doubleArrayData;
MObject doubleArrayObject = doubleArrayData.create(doubleArray, &status);
status = plug.setValue(doubleArrayObject);
}
And now I can write the code from the beginning of the post as:
(myNode | attributeName) << myArray;
The problem is it doesn't compile outside of Visual C++, because it's trying to bind the temporary variable returned from the | operator to the MPlug reference of the << operator. I would like it to be a reference because this code is called many times and I'd rather not have MPlug being copied so much. I only need the temporary object to live until the end of the second function.
Well, this is my scenario. Just thought I'd show an example where one would like to do what Alexey describe. I welcome all critiques and suggestions!
Thanks.

What is the difference between these two versions of the same Template Class [duplicate]

Why is it not allowed to get non-const reference to a temporary object,
which function getx() returns? Clearly, this is prohibited by C++ Standard
but I am interested in the purpose of such restriction, not a reference to the standard.
struct X
{
X& ref() { return *this; }
};
X getx() { return X();}
void g(X & x) {}
int f()
{
const X& x = getx(); // OK
X& x = getx(); // error
X& x = getx().ref(); // OK
g(getx()); //error
g(getx().ref()); //OK
return 0;
}
It is clear that the lifetime of the object cannot be the cause, because
constant reference to an object is not prohibited by C++ Standard.
It is clear that the temporary object is not constant in the sample above, because calls to non-constant functions are permitted. For instance, ref() could modify the temporary object.
In addition, ref() allows you to fool the compiler and get a link to this temporary object and that solves our problem.
In addition:
They say "assigning a temporary object to the const reference extends the lifetime of this object" and " Nothing is said about non-const references though".
My additional question. Does following assignment extend the lifetime of temporary object?
X& x = getx().ref(); // OK
From this Visual C++ blog article about rvalue references:
... C++ doesn't want you to accidentally
modify temporaries, but directly
calling a non-const member function on
a modifiable rvalue is explicit, so
it's allowed ...
Basically, you shouldn't try to modify temporaries for the very reason that they are temporary objects and will die any moment now. The reason you are allowed to call non-const methods is that, well, you are welcome to do some "stupid" things as long as you know what you are doing and you are explicit about it (like, using reinterpret_cast). But if you bind a temporary to a non-const reference, you can keep passing it around "forever" just to have your manipulation of the object disappear, because somewhere along the way you completely forgot this was a temporary.
If I were you, I would rethink the design of my functions. Why is g() accepting reference, does it modify the parameter? If no, make it const reference, if yes, why do you try to pass temporary to it, don't you care it's a temporary you are modifying? Why is getx() returning temporary anyway? If you share with us your real scenario and what you are trying to accomplish, you may get some good suggestions on how to do it.
Going against the language and fooling the compiler rarely solves problems - usually it creates problems.
Edit: Addressing questions in comment:
1) `X& x = getx().ref(); // OK when will x die?` - I don't know and I don't care, because this is exactly what I mean by "going against the language". The language says "temporaries die at the end of the statement, unless they are bound to const reference, in which case they die when the reference goes out of scope". Applying that rule, it seems x is already dead at the beginning of the next statement, since it's not bound to const reference (the compiler doesn't know what ref() returns). This is just a guess however.
I stated the purpose clearly: you are not allowed to modify temporaries, because it just does not make sense (ignoring C++0x rvalue references). The question "then why am I allowed to call non-const members?" is a good one, but I don't have better answer than the one I already stated above.
Well, if I'm right about x in X& x = getx().ref(); dying at the end of the statement, the problems are obvious.
Anyway, based on your question and comments I don't think even these extra answers will satisfy you. Here is a final attempt/summary: The C++ committee decided it doesn't make sense to modify temporaries, therefore, they disallowed binding to non-const references. May be some compiler implementation or historic issues were also involved, I don't know. Then, some specific case emerged, and it was decided that against all odds, they will still allow direct modification through calling non-const method. But that's an exception - you are generally not allowed to modify temporaries. Yes, C++ is often that weird.
In your code getx() returns a temporary object, a so-called "rvalue". You can copy rvalues into objects (aka. variables) or bind them to to const references (which will extend their life-time until the end of the reference's life). You cannot bind rvalues to non-const references.
This was a deliberate design decision in order to prevent users from accidentally modifying an object that is going to die at the end of the expression:
g(getx()); // g() would modify an object without anyone being able to observe
If you want to do this, you will have to either make a local copy or of the object first or bind it to a const reference:
X x1 = getx();
const X& x2 = getx(); // extend lifetime of temporary to lifetime of const reference
g(x1); // fine
g(x2); // can't bind a const reference to a non-const reference
Note that the next C++ standard will include rvalue references. What you know as references is therefore becoming to be called "lvalue references". You will be allowed to bind rvalues to rvalue references and you can overload functions on "rvalue-ness":
void g(X&); // #1, takes an ordinary (lvalue) reference
void g(X&&); // #2, takes an rvalue reference
X x;
g(x); // calls #1
g(getx()); // calls #2
g(X()); // calls #2, too
The idea behind rvalue references is that, since these objects are going to die anyway, you can take advantage of that knowledge and implement what's called "move semantics", a certain kind of optimization:
class X {
X(X&& rhs)
: pimpl( rhs.pimpl ) // steal rhs' data...
{
rhs.pimpl = NULL; // ...and leave it empty, but deconstructible
}
data* pimpl; // you would use a smart ptr, of course
};
X x(getx()); // x will steal the rvalue's data, leaving the temporary object empty
What you are showing is that operator chaining is allowed.
X& x = getx().ref(); // OK
The expression is 'getx().ref();' and this is executed to completion before assignment to 'x'.
Note that getx() does not return a reference but a fully formed object into the local context. The object is temporary but it is not const, thus allowing you to call other methods to compute a value or have other side effects happen.
// It would allow things like this.
getPipeline().procInstr(1).procInstr(2).procInstr(3);
// or more commonly
std::cout << getManiplator() << 5;
Look at the end of this answer for a better example of this
You can not bind a temporary to a reference because doing so will generate a reference to an object that will be destroyed at the end of the expression thus leaving you with a dangling reference (which is untidy and the standard does not like untidy).
The value returned by ref() is a valid reference but the method does not pay any attention to the lifespan of the object it is returning (because it can not have that information within its context). You have basically just done the equivalent of:
x& = const_cast<x&>(getX());
The reason it is OK to do this with a const reference to a temporary object is that the standard extends the lifespan of the temporary to the lifespan of the reference so the temporary objects lifespan is extended beyond the end of the statement.
So the only remaining question is why does the standard not want to allow reference to temporaries to extend the life of the object beyond the end of the statement?
I believe it is because doing so would make the compiler very hard to get correct for temporary objects. It was done for const references to temporaries as this has limited usage and thus forced you to make a copy of the object to do anything useful but does provide some limited functionality.
Think of this situation:
int getI() { return 5;}
int x& = getI();
x++; // Note x is an alias to a variable. What variable are you updating.
Extending the lifespan of this temporary object is going to be very confusing.
While the following:
int const& y = getI();
Will give you code that it is intuitive to use and understand.
If you want to modify the value you should be returning the value to a variable. If you are trying to avoid the cost of copying the obejct back from the function (as it seems that the object is copy constructed back (technically it is)). Then don't bother the compiler is very good at 'Return Value Optimization'
Why is discussed in the C++ FAQ (boldfacing mine):
In C++, non-const references can bind to lvalues and const references can bind to lvalues or rvalues, but there is nothing that can bind to a non-const rvalue. That's to protect people from changing the values of temporaries that are destroyed before their new value can be used. For example:
void incr(int& a) { ++a; }
int i = 0;
incr(i); // i becomes 1
incr(0); // error: 0 is not an lvalue
If that incr(0) were allowed either some temporary that nobody ever saw would be incremented or - far worse - the value of 0 would become 1. The latter sounds silly, but there was actually a bug like that in early Fortran compilers that set aside a memory location to hold the value 0.
The main issue is that
g(getx()); //error
is a logical error: g is modifying the result of getx() but you don't have any chance to examine the modified object. If g didn't need to modify its parameter then it wouldn't have required an lvalue reference, it could have taken the parameter by value or by const reference.
const X& x = getx(); // OK
is valid because you sometimes need to reuse the result of an expression, and it's pretty clear that you're dealing with a temporary object.
However it is not possible to make
X& x = getx(); // error
valid without making g(getx()) valid, which is what the language designers were trying to avoid in the first place.
g(getx().ref()); //OK
is valid because methods only know about the const-ness of the this, they don't know if they are called on an lvalue or on an rvalue.
As always in C++, you have a workaround for this rule but you have to signal the compiler that you know what you're doing by being explicit:
g(const_cast<x&>(getX()));
Seems like the original question as to why this is not allowed has been answered clearly: "because it is most likely an error".
FWIW, I thought I'd show how to it could be done, even though I don't think it's a good technique.
The reason I sometimes want to pass a temporary to a method taking a non-const reference is to intentionally throw away a value returned by-reference that the calling method doesn't care about. Something like this:
// Assuming: void Person::GetNameAndAddr(std::string &name, std::string &addr);
string name;
person.GetNameAndAddr(name, string()); // don't care about addr
As explained in previous answers, that doesn't compile. But this compiles and works correctly (with my compiler):
person.GetNameAndAddr(name,
const_cast<string &>(static_cast<const string &>(string())));
This just shows that you can use casting to lie to the compiler. Obviously, it would be much cleaner to declare and pass an unused automatic variable:
string name;
string unused;
person.GetNameAndAddr(name, unused); // don't care about addr
This technique does introduce an unneeded local variable into the method's scope. If for some reason you want to prevent it from being used later in the method, e.g., to avoid confusion or error, you can hide it in a local block:
string name;
{
string unused;
person.GetNameAndAddr(name, unused); // don't care about addr
}
-- Chris
Why would you ever want X& x = getx();? Just use X x = getx(); and rely on RVO.
The evil workaround involves the 'mutable' keyword. Actually being evil is left as an exercise for the reader. Or see here: http://www.ddj.com/cpp/184403758
Excellent question, and here's my attempt at a more concise answer (since a lot of useful info is in comments and hard to dig out in the noise.)
Any reference bound directly to a temporary will extend its life [12.2.5]. On the other hand, a reference initialized with another reference will not (even if it's ultimately the same temporary). That makes sense (the compiler doesn't know what that reference ultimately refers to).
But this whole idea is extremely confusing. E.g. const X &x = X(); will make the temporary last as long as the x reference, but const X &x = X().ref(); will NOT (who knows what ref() actually returned). In the latter case, the destructor for X gets called at the end of this line. (This is observable with a non-trivial destructor.)
So it seems generally confusing and dangerous (why complicate the rules about object lifetimes?), but presumably there was a need at least for const references, so the standard does set this behavior for them.
[From sbi comment]: Note that the fact that binding it to a const reference enhances a
temporary's lifetimes is an exception that's been added deliberately
(TTBOMK in order to allow manual optimizations). There wasn't an
exception added for non-const references, because binding a temporary
to a non-const reference was seen to most likely be a programmer
error.
All temporaries do persist until the end of the full-expression. To make use of them, however, you need a trick like you have with ref(). That's legal. There doesn't seem to be a good reason for the extra hoop to jump through, except to remind the programmer that something unusual is going on (namely, a reference parameter whose modifications will be quickly lost).
[Another sbi comment] The reason Stroustrup gives (in D&E) for disallowing the binding of
rvalues to non-const references is that, if Alexey's g() would modify
the object (which you'd expect from a function taking a non-const
reference), it would modify an object that's going to die, so nobody
could get at the modified value anyway. He says that this, most
likely, is an error.
"It is clear that the temporary object is not constant in the sample above, because calls
to non-constant functions are permitted. For instance, ref() could modify the temporary
object."
In your example getX() does not return a const X so you are able to call ref() in much the same way as you could call X().ref(). You are returning a non const ref and so can call non const methods, what you can't do is assign the ref to a non const reference.
Along with SadSidos comment this makes your three points incorrect.
I have a scenario I would like to share where I wish I could do what Alexey is asking. In a Maya C++ plugin, I have to do the following shenanigan in order to get a value into a node attribute:
MFnDoubleArrayData myArrayData;
MObject myArrayObj = myArrayData.create(myArray);
MPlug myPlug = myNode.findPlug(attributeName);
myPlug.setValue(myArrayObj);
This is tedious to write, so I wrote the following helper functions:
MPlug operator | (MFnDependencyNode& node, MObject& attribute){
MStatus status;
MPlug returnValue = node.findPlug(attribute, &status);
return returnValue;
}
void operator << (MPlug& plug, MDoubleArray& doubleArray){
MStatus status;
MFnDoubleArrayData doubleArrayData;
MObject doubleArrayObject = doubleArrayData.create(doubleArray, &status);
status = plug.setValue(doubleArrayObject);
}
And now I can write the code from the beginning of the post as:
(myNode | attributeName) << myArray;
The problem is it doesn't compile outside of Visual C++, because it's trying to bind the temporary variable returned from the | operator to the MPlug reference of the << operator. I would like it to be a reference because this code is called many times and I'd rather not have MPlug being copied so much. I only need the temporary object to live until the end of the second function.
Well, this is my scenario. Just thought I'd show an example where one would like to do what Alexey describe. I welcome all critiques and suggestions!
Thanks.

Returning named rvalue reference [duplicate]

If I have a class A and functions
A f(A &&a)
{
doSomething(a);
return a;
}
A g(A a)
{
doSomething(a);
return a;
}
the copy constructor is called when returning a from f, but the move constructor is used when returning from g. However, from what I understand, f can only be passed an object that it is safe to move (either a temporary or an object marked as moveable, e.g., using std::move). Is there any example when it would not be safe to use the move constructor when returning from f? Why do we require a to have automatic storage duration?
I read the answers here, but the top answer only shows that the spec should not allow moving when passing a to other functions in the function body; it does not explain why moving when returning is safe for g but not for f. Once we get to the return statement, we will not need a anymore inside f.
Update 0
So I understand that temporaries are accessible until the end of the full expression. However, the behavior when returning from f still seems to go against the semantics ingrained into the language that it is safe to move a temporary or an xvalue. For example, if you call g(A()), the temporary is moved into the argument for g even though there could be references to the temporary stored somewhere. The same happens if we call g with an xvalue. Since only temporaries and xvalues bind to rvalue references, it seems like to be consistent about the semantics we should still move a when returning from f, since we know a was passed either a temporary or an xvalue.
Second attempt. Hopefully this is more succinct and clear.
I am going to ignore RVO almost entirely for this discussion. It makes it really confusing as to what should happen sans optimizations - this is just about move vs copy semantics.
To assist this a reference is going to be very helpful here on the sorts of value types in c++11.
When to move?
lvalue
These are never moved. They refer to variables or storage locations that are potentially being referred to elsewhere, and as such should not have their contents transferred to another instance.
prvalue
The above defines them as "expressions that do not have identity". Clearly nothing else can refer to a nameless value so these can be moved.
rvalue
The general case of "right-hand" value, and the only thing that's certain is they can be moved from. They may or may not have a named reference, but if they do it is the last such usage.
xvalue
These are sort of a mix of both - they have identity (are a reference) and they can be moved from. They need not have a named variable. The reason? They are eXpiring values, about to be destroyed. Consider them the 'final reference'. xvalues can only be generated from rvalues which is why/how std::move works in converting lvalues to xvalues (through the result of a function call).
glvalue
Another mutant type with its rvalue cousin, it can be either an xvalue or an lvalue - it has identity but it's unclear if this is the last reference to the variable / storage or not, hence it is unclear if it can or cannot be moved from.
Resolution Order
Where an overload exists that can accept either a const lvalue ref or rvalue ref, and an rvalue is passed, the rvalue is bound otherwise the lvalue version is used. (move for rvalues, copy otherwise).
Where it potentially happens
(assume all types are A where not mentioned)
It only occurs where an object is "initialized from an xvalue of the same type". xvalues bind to rvalues but are not as restricted as pure expressions. In other words, movable things are more than unnamed references, they can also be the 'last' reference to an object with respect to the compiler's awareness.
initialization
A a = std::move(b); // assign-move
A a( std::move(b) ); // construct-move
function argument passing
void f( A a );
f( std::move(b) );
function return
A f() {
// A a exists, will discuss shortly
return a;
}
Why it will not happen in f
Consider this variation on f:
void action1(A & a) {
// alter a somehow
}
void action2(A & a) {
// alter a somehow
}
A f(A && a) {
action1( a );
action2( a );
return a;
}
It is not illegal to treat a as an lvalue within f. Because it is an lvalue it must be a reference, whether explicit or not. Every plain-old variable is technically a reference to itself.
That's where we trip up. Because a is an lvalue for the purposes of f, we are in fact returning an lvalue.
To explicitly generate an rvalue, we must use std::move (or generate an A&& result some other way).
Why it will happen in g
With that under our belts, consider g
A g(A a) {
action1( a ); // as above
action2( a ); // as above
return a;
}
Yes, a is an lvalue for the purposes of action1 and action2. However, because all references to a only exist within g (it's a copy or moved-into copy), it can be considered an xvalue in the return.
But why not in f?
There is no specific magic to &&. Really, you should think of it as a reference first and foremost. The fact that we are demanding an rvalue reference in f as opposed to an lvalue reference with A& does not alter the fact that, being a reference, it must be an lvalue, because the storage location of a is external to f and that's as far as any compiler will be concerned.
The same does not apply in g, where it's clear that a's storage is temporary and exists only when g is called and at no other time. In this case it is clearly an xvalue and can be moved.
rvalue ref vs lvalue ref and safety of reference passing
Suppose we overload a function to accept both types of references. What would happen?
void v( A & lref );
void v( A && rref );
The only time void v( A&& ) will be used per the above ("Where it potentially happens"), otherwise void v( A& ). That is, an rvalue ref will always attempt to bind to an rvalue ref signature before an lvalue ref overload is attempted. An lvalue ref should not ever bind to the rvalue ref except in the case where it can be treated as an xvalue (guaranteed to be destroyed in the current scope whether we want it to or not).
It is tempting to say that in the rvalue case we know for sure that the object being passed is temporary. That is not the case. It is a signature intended for binding references to what appears to be a temporary object.
For analogy, it's like doing int * x = 23; - it may be wrong, but you could (eventually) force it to compile with bad results if you run it. The compiler can't say for sure if you're being serious about that or pulling its leg.
With respect to safety one must consider functions that do this (and why not to do this - if it still compiles at all):
A & make_A(void) {
A new_a;
return new_a;
}
While there is nothing ostensibly wrong with the language aspect - the types work and we will get a reference to somewhere back - because new_a's storage location is inside a function, the memory will be reclaimed / invalid when the function returns. Therefore anything that uses the result of this function will be dealing with freed memory.
Similarly, A f( A && a ) is intended to but is not limited to accepting prvalues or xvalues if we really want to force something else through. That's where std::move comes in, and let's us do just that.
The reason this is the case is because it differs from A f( A & a ) only with respect to which contexts it will be preferred, over the rvalue overload. In all other respects it is identical in how a is treated by the compiler.
The fact that we know that A&& is a signature reserved for moves is a moot point; it is used to determine which version of "reference to A -type parameter" we want to bind to, the sort where we should take ownership (rvalue) or the sort where we should not take ownership (lvalue) of the underlying data (that is, move it elsewhere and wipe the instance / reference we're given). In both cases, what we are working with is a reference to memory that is not controlled by f.
Whether we do or not is not something the compiler can tell; it falls into the 'common sense' area of programming, such as not to use memory locations that don't make sense to use but are otherwise valid memory locations.
What the compiler knows about A f( A && a ) is to not create new storage for a, since we're going to be given an address (reference) to work with. We can choose to leave the source address untouched, but the whole idea here is that by declaring A&& we're telling the compiler "hey! give me references to objects that are about to disappear so I might be able to do something with it before that happens". The key word here is might, and again also the fact that we can explicitly target this function signature incorrectly.
Consider if we had a version of A that, when move-constructing, did not erase the old instance's data, and for some reason we did this by design (let's say we had our own memory allocation functions and knew exactly how our memory model would keep data beyond the lifetime of objects).
The compiler cannot know this, because it would take code analysis to determine what happens to the objects when they're handled in rvalue bindings - it's a human judgement issue at that point. At best the compiler sees 'a reference, yay, no allocating extra memory here' and follows rules of reference passing.
It's safe to assume the compiler is thinking: "it's a reference, I don't need to deal with its memory lifetime inside f, it being a temporary will be removed after f is finished".
In that case, when a temporary is passed to f, the storage of that temporary will disappear as soon as we leave f, and then we're potentially in the same situation as A & make_A(void) - a very bad one.
An issue of semantics...
std::move
The very purpose of std::move is to create rvalue references. By and large what it does (if nothing else) is force the resulting value to bind to rvalues as opposed to lvalues. The reason for this is a return signature of A& prior to rvalue references being available, was ambiguous for things like operator overloads (and other uses surely).
Operators - an example
class A {
// ...
public:
A & operator= (A & rhs); // what is the lifetime of rhs? move or copy intended?
A & operator+ (A & rhs); // ditto
// ...
};
int main() {
A result = A() + A(); // wont compile!
}
Note that this will not accept temporary objects for either operator! Nor does it make sense to do this in the case of object copy operations - why do we need to modify an original object that we are copying, probably in order to have a copy we can modify later. This is the reason we have to declare const A & parameters for copy operators and any situation where a copy is to be taken of the reference, as a guarantee that we are not altering the original object.
Naturally this is an issue with moves, where we must modify the original object to avoid the new container's data being freed prematurely. (hence "move" operation).
To solve this mess along comes T&& declarations, which are a replacement to the above example code, and specifically target references to objects in the situations where the above won't compile. But, we wouldn't need to modify operator+ to be a move operation, and you'd be hard pressed to find a reason for doing so (though you could I think). Again, because of the assumption that addition should not modify the original object, only the left-operand object in the expression. So we can do this:
class A {
// ...
public:
A & operator= (const A & rhs); // copy-assign
A & operator= (A && rhs); // move-assign
A & operator+ (const A & rhs); // don't modify rhs operand
// ...
};
int main() {
A result = A() + A(); // const A& in addition, and A&& for assign
A result2 = A().operator+(A()); // literally the same thing
}
What you should take note of here is that despite the fact that A() returns a temporary, it not only is able to bind to const A& but it should because of the expected semantics of addition (that it does not modify its right operand). The second version of the assignment is clearer why only one of the arguments should be expected to be modified.
It's also clear that a move will occur on the assignment, and no move will occur with rhs in operator+.
Separation of return value semantics and argument binding semantics
The reason that there is only one move above is clear from the function (well, operator) definitions. What's important is we are indeed binding what is clearly an xvalue / rvalue, to what is unmistakably an lvalue in operator+.
I have to stress this point: there is no effective difference in this example in the way that operator+ and operator= refer to their argument. As far as the compiler is concerned, within either's function body the argument is effectively const A& for + and A& for =. The difference is purely in constness. The only way in which A& and A&& differ is to distinguish signatures, not types.
With different signatures come different semantics, it's the compiler's toolkit for distinguishing certain cases where there otherwise is no clear distinction from the code. The behavior of the functions themselves - the code body - may not be able to tell the cases apart either!
Another example of this is operator++(void) vs operator++(int). The former expects to return its underlying value before an increment operation and the latter afterwards. There is no int being passed, it's just so the compiler has two signatures to work with - there is just no other way to specify two identical functions with the same name, and as you may or may not know, it is illegal to overload a function on just the return type for similar reasons of ambiguity.
rvalue variables and other odd situations - an exhaustive test
To understand unambiguously what is happening in f I've put together a smorgasbord of things one "should not attempt but look like they'd work" that forces the compiler's hand on the matter almost exhaustively:
void bad (int && x, int && y) {
x += y;
}
int & worse (int && z) {
return z++, z + 1, 1 + z;
}
int && justno (int & no) {
return worse( no );
}
int num () {
return 1;
}
int main () {
int && a = num();
++a = 0;
a++ = 0;
bad( a, a );
int && b = worse( a );
int && c = justno( b );
++c = (int) 'y';
c++ = (int) 'y';
return 0;
}
g++ -std=gnu++11 -O0 -Wall -c -fmessage-length=0 -o "src\\basictest.o" "..\\src\\basictest.cpp"
..\src\basictest.cpp: In function 'int& worse(int&&)':
..\src\basictest.cpp:5:17: warning: right operand of comma operator has no effect [-Wunused-value]
return z++, z + 1, 1 + z;
^
..\src\basictest.cpp:5:26: error: invalid initialization of non-const reference of type 'int&' from an rvalue of type 'int'
return z++, z + 1, 1 + z;
^
..\src\basictest.cpp: In function 'int&& justno(int&)':
..\src\basictest.cpp:8:20: error: cannot bind 'int' lvalue to 'int&&'
return worse( no );
^
..\src\basictest.cpp:4:7: error: initializing argument 1 of 'int& worse(int&&)'
int & worse (int && z) {
^
..\src\basictest.cpp: In function 'int main()':
..\src\basictest.cpp:16:13: error: cannot bind 'int' lvalue to 'int&&'
bad( a, a );
^
..\src\basictest.cpp:1:6: error: initializing argument 1 of 'void bad(int&&, int&&)'
void bad (int && x, int && y) {
^
..\src\basictest.cpp:17:23: error: cannot bind 'int' lvalue to 'int&&'
int && b = worse( a );
^
..\src\basictest.cpp:4:7: error: initializing argument 1 of 'int& worse(int&&)'
int & worse (int && z) {
^
..\src\basictest.cpp:21:7: error: lvalue required as left operand of assignment
c++ = (int) 'y';
^
..\src\basictest.cpp: In function 'int& worse(int&&)':
..\src\basictest.cpp:6:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
..\src\basictest.cpp: In function 'int&& justno(int&)':
..\src\basictest.cpp:9:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
01:31:46 Build Finished (took 72ms)
This is the unaltered output sans build header which you don't need to see :) I will leave it as an exercise to understand the errors found but re-reading my own explanations (particularly in what follows) it should be apparent what each error was caused by and why, imo anyway.
Conclusion - What can we learn from this?
First, note that the compiler treats function bodies as individual code units. This is basically the key here. Whatever the compiler does with a function body, it cannot make assumptions about the behavior of the function that would require the function body to be altered. To deal with those cases there are templates but that's beyond the scope of this discussion - just note that templates generate multiple function bodies to handle different cases, while otherwise the same function body must be re-usable in every case the function could be used.
Second, rvalue types were predominantly envisioned for move operations - a very specific circumstance that was expected to occur in assignment and construction of objects. Other semantics using rvalue reference bindings are beyond the scope of any compiler to deal with. In other words, it's better to think of rvalue references as syntax sugar than actual code. The signature differs in A&& vs A& but the argument type for the purposes of the function body does not, it is always treated as A& with the intention that the object being passed should be modified in some way because const A&, while correct syntactically, would not allow the desired behavior.
I can be very sure at this point when I say that the compiler will generate the code body for f as if it were declared f(A&). Per above, A&& assists the compiler in choosing when to allow binding a mutable reference to f but otherwise the compiler doesn't consider the semantics of f(A&) and f(A&&) to be different with respect to what f returns.
It's a long way of saying: the return method of f does not depend on the type of argument it receives.
The confusion is elision. In reality there are two copies in the returning of a value. First a copy is created as a temporary, then this temporary is assigned to something (or it isn't and remains purely temporary). The second copy is very likely elided via return optimization. The first copy can be moved in g and cannot in f. I expect in a situation where f cannot be elided, there will be a copy then a move from f in the original code.
To override this the temporary must be explicitly constructed using std::move, that is, in the return statement in f. However in g we're returning something that is known to be temporary to the function body of g, hence it is either moved twice, or moved once then elided.
I would suggest compiling the original code with all optimizations disabled and adding in diagnostic messages to copy and move constructors to keep tabs on when and where the values are moved or copied before elision becomes a factor. Even if I'm mistaken, an un-optimized trace of the constructors / operations used would paint an unambiguous picture of what the compiler has done, hopefully it will be apparent why it did what it did as well...
Short story: it only depends on doSomething.
Medium story: if doSomething never change a, then f is safe. It receives a rvalue reference and returns a new temporary moved from there.
Long story: things will go bad as soon as doSomething uses a in a move operation, because a may be in an undefined state before it is used in the return statement - it would be the same in g but at least the conversion to a rvalue reference should be explicit
TL/DR: both f and g are safe as long as there is no move operation inside doSomething. The difference comes that a move will silently executed in f, while it will require an explicit conversion to a rvalue reference (eg with std::move) in g.
Third attempt. The second became very long in the process of explaining every nook and cranny of the situation. But hey, I learned a lot too in the process, which I suppose is the point, no? :) Anyway. I'll re-address the question anew, keeping my longer answer as it in itself is a useful reference but falls short of a 'clear explanation'.
What are we dealing with here?
f and g are not trivial situations. They take time to understand and appreciate the first few times you encounter them. The issues at play are the lifetime of objects, Return Value Optimization, confusion of returning object values, and confusion with overloads of reference types. I'll address each and explain their relevance.
References
First thing's first. What's a reference? Aren't they just pointers without the syntax?
They are, but in an important way they're much more than that. Pointers are literally that, they refer to memory locations in general. There are few if any guarantees about the values located at wherever the pointer is set to. References on the other hand are bound to addresses of real values - values that guarantee to exist for the duration they can be accessed, but may not have a name for them available to be accessed in any other way (such as temporaries).
As a rule of thumb, if you can 'take its address' then you're dealing with a reference, a rather special one known as an lvalue. You can assign to an lvalue. This is why *pointer = 3 works, the operator * creates a reference to the address being pointed to.
This doesn't make the reference any more or less valid than the address it points to, however, references you naturally find in C++ do have this guarantee (as would well-written C++ code) - that they are referring to real values in a way where we don't need to know about its lifetime for the duration of our interactions with them.
Lifetime of Objects
We all should know by now when the c'tors and d'tors will be called for something like this:
{
A temp;
temp.property = value;
}
temp's scope is set. We know exactly when it's created and destroyed. One way we can be sure it's destroyed is because this is impossible:
A & ref_to_temp = temp; // nope
A * ptr_to_temp = &temp; // double nope
The compiler stops us from doing that because very clearly we should not expect that object to still exist. This can arise subtly whenever using references, which is why sometimes people can be found suggesting avoidance of references until you know what you're doing with them (or entirely if they've given up understanding them and just want to move on with their lives).
Scope of Expressions
On the other hand we also have to be mindful that temporaries exist until the outer-most expression they're found in has completed. That means up to the semicolon. An expression existing in the LHS of a comma operator, for example, doesn't get destroyed until the semicolon. Ie:
struct scopetester {
static int counter = 0;
scopetester(){++counter;}
~scopetester(){--counter;}
};
scopetester(), std::cout << scopetester::counter; // prints 1
scopetester(), scopetester(), std::cout << scopetester::counter; // prints 2
This still does not avoid issues of sequencing of execution, you still have to deal with ++i++ and other things - operator precedence and the dreaded undefined behavior that can result when forcing ambiguous cases (eg i++ = ++i). What is important is that all temporaries created exist until the semicolon and no longer.
There are two exceptions - elision / in-place-construction (aka RVO) and reference-assignment-from-temporary.
Returning by value and Elision
What is elision? Why use RVO and similar things? All of these come down under a single term that's far easier to appreciate - "in-place construction". Suppose we were using the result of a function call to initialize or set an object. Eg:
A x (void) {return A();}
A y( x() );
Lets consider the longest possible sequence of events that could happen here.
A new A is constructed in x
The temporary value returned by x() is a new A, initialized using a reference to the previous
A new A - y - is initialized using the temporary value
Where possible, the compiler should re-arrange things so that as few as possible intermediate A's are constructed where it's safe to assume the intermediate is inaccessible or otherwise unnecessary. The question is which of the objects can we do without?
Case #1 is an explicit new object. If we are to avoid this being created, we need to have a reference to an object that already exists. This is the most straightforward one and nothing more needs to be said.
In #2 we cannot avoid constructing some result. After all, we are returning by value. However, there are two important exceptions (not including exceptions themselves which are also affected when thrown): NRVO and RVO. These affect what happens in #3, but there are important consequences and rules regarding #2...
This is due to an interesting quirk of elision:
Notes
Copy elision is the only allowed form of optimization that can change the observable side-effects. Because some compilers do not perform copy elision in every situation where it is allowed (e.g., in debug mode), programs that rely on the side-effects of copy/move constructors and destructors are not portable.
Even when copy elision takes place and the copy-/move-constructor is not called, it must be present and accessible (as if no optimization happened at all), otherwise the program is ill-formed.
(Since C++11)
In a return statement or a throw-expression, if the compiler cannot perform copy elision but the conditions for copy elision are met or would be met, except that the source is a function parameter, the compiler will attempt to use the move constructor even if the object is designated by an lvalue; see return statement for details.
And more on that in the return statement notes:
Notes
Returning by value may involve construction and copy/move of a temporary object, unless copy elision is used.
(Since C++11)
If expression is an lvalue expression and the conditions for copy elision are met, or would be met, except that expression names a function parameter, then overload resolution to select the constructor to use for initialization of the returned value is performed twice: first as if expression were an rvalue expression (thus it may select the move constructor or a copy constructor taking reference to const), and if no suitable conversion is available, overload resolution is performed the second time, with lvalue expression (so it may select the copy constructor taking a reference to non-const).
The above rule applies even if the function return type is different from the type of expression (copy elision requires same type)
The compiler is allowed to even chain together multiple elisions. All it means is that two sides of a move / copy that would involve an intermediate object, could potentially be made to refer directly to each-other or even be made to be the same object. We don't know and shouldn't need to know when the compiler chooses to do this - it's an optimization, for one, but importantly you should think of move and copy constructors et al as a "last resort" usage.
We can agree the goal is to reduce the number of unnecessary operations in any optimization, provided the observable behavior is the same. Move and copy constructors are used wherever moves and copy operations happen, so what about when the compiler sees fit to remove a move/copy operation itself as an optimization? Should the functionally unnecessary intermediate objects exist in the final program just for the purposes of their side effects? The way the standard is right now, and compilers, seems to be: no - the move and copy constructors satisfy the how of those operations, not the when or why.
The short version: You have less temporary objects, that you ought to not care about to begin with, so why should you miss them. If you do miss them it may just be that your code relies on intermediate copies and moves to do things beyond their stated purpose and contexts.
Lastly, you need to be aware that the elided object is always stored (and constructed) in the receiving location, not the location of its inception.
Quoting this reference -
Named Return Value Optimization
If a function returns a class type by value, and the return statement's expression is the name of a non-volatile object with automatic storage duration, which isn't the function parameter, or a catch clause parameter, and which has the same type (ignoring top-level cv-qualification) as the return type of the function, then copy/move is omitted. When that local object is constructed, it is constructed directly in the storage where the function's return value would otherwise be moved or copied to. This variant of copy elision is known as NRVO, "named return value optimization".
Return Value Optimization
When a nameless temporary, not bound to any references, would be moved or copied into an object of the same type (ignoring top-level cv-qualification), the copy/move is omitted. When that temporary is constructed, it is constructed directly in the storage where it would otherwise be moved or copied to. When the nameless temporary is the argument of a return statement, this variant of copy elision is known as RVO, "return value optimization".
Lifetime of References
One thing we should not do, is this:
A & func() {
A result;
return result;
}
While tempting because it would avoid implicit copying of anything (we're just passing an address right?) it's also a short-sighted approach. Remember the compiler above preventing something looking like this with temp? Same thing here - result is gone once we're done with func, it could be reclaimed and could be anything now.
The reason we cannot is because we cannot pass an address to result out of func - whether as reference or as pointer - and consider it valid memory. We would get no further passing A* out.
In this situation it is best to use an object-copy return type and rely on moves, elision or both to occur as the compiler finds suitable. Always think of copy and move constructors as 'measures of last resort' - you should not rely on the compiler to use them because the compiler can find ways to avoid copy and move operations entirely, and is allowed to do so even if it means the side effects of those constructors wouldn't happen any more.
There is however a special case, alluded to earlier.
Recall that references are guarantees to real values. This implies that the first occurrence of the reference initializes the object and the last (as far as known at compile time) destroys it when going out of scope.
Broadly this covers two situations: when we return a temporary from a function. and when we assign from a function result. The first, returning a temporary, is basically what elision does but you can in effect elide explicitly with reference passing - like passing a pointer in a call chain. It constructs the object at the time of return, but what changes is the object is no longer destroyed after leaving scope (the return statement). And on the other end the second kind happens - the variable storing the result of the function call now has the honor of destroying the value when it goes out of scope.
The important point here is that elision and reference passing are related concepts. You can emulate elision by using pointers to uninitialized variables' storage location (of known type), for example, as you can with reference passing semantics (basically what they're for).
Overloads of Reference Types
References allow us to treat non-local variables as if they are local variables - to take their address, write to that address, read from that address, and importantly, be able to destroy the object at the right time - when the address can no longer be reached by anything.
Regular variables when they leave scope, have their only reference to them disappear, and are promptly destroyed at that time. Reference variables can refer to regular variables, but except for elision / RVO circumstances they do not affect the scope of the original object - not even if the object they referred to goes out of scope early, which can happen if you make references to dynamic memory and are not careful to manage those references yourself.
This means you can capture the results of an expression explicitly by reference. How? Well, this may seem odd at first but if you read the above it will make sense why this works:
class A {
/* assume rule-of-5 (inc const-overloads) has been followed but unless
* otherwise noted the members are private */
public:
A (void) { /* ... */ }
A operator+ ( const A & rhs ) {
A res;
// do something with `res`
return res;
}
};
A x = A() + A(); // doesn't compile
A & y = A() + A(); // doesn't compile
A && z = A() + A(); // compiles
Why? What's going on?
A x = ... - we can't because constructors and assignment is private.
A & y = ... - we can't because we're returning a value, not a reference to a value who's scope is greater or equal to our current scope.
A && z = ... - we can because we're able to refer to xvalues. As consequence of this assignment the lifetime of the temporary value is extended to this capturing lvalue because it in effect has become an lvalue reference. Sound familiar? It's explicit elision if I were to call it anything. This is more apparent when you consider this syntax must involve a new value and must involve assigning that value to a reference.
In all three cases when all constructors and assignment is made public, there is always only three objects constructed, with the address of res always matching the variable storing the result. (on my compiler anyway, optimizations disabled, -std=gnu++11, g++ 4.9.3).
Which means the differences really do come down to just the storage duration of function arguments themselves. Elision and move operations cannot happen on anything but pure expressions, expiring values, or explicit targeting of the "expiring values" reference overload Type&&.
Re-examining f and g
I've annotated the situation in both functions to get things rolling, a shortlist of assumptions the compiler would note when generating (reusable) code for each.
A f( A && a ) {
// has storage duration exceeding f's scope.
// already constructed.
return a;
// can be elided.
// must be copy-constructed, a exceeds f's scope.
}
A g( A a ) {
// has storage duration limited to this function's scope.
// was just constructed somehow, whether by elision, move or copy.
return a;
// elision may occur.
// can move-construct if can't elide.
// can copy-construct if can't move.
}
What we can say for sure about f's a is that it's expecting to capture moved or expression-type values. Because f can accept either expression-references (prvalues) or lvalue-references about to disappear (xvalues) or moved lvalue-references (converted to xvalues via std::move), and because f must be homogenous in the treatment of a for all three cases, a is seen as a reference first and foremost to an area of memory who's lifetime exists for longer than a call to f. That is, it is not possible to distinguish which of the three cases we called f with from within f, so the compiler assumes the longest storage duration it needs for any of the cases, and finds it safest not to assume anything about the storage duration of a's data.
Unlike the situation in g. Here, a - however it happens upon its value - will cease to be accessible beyond a call to g. As such returning it is tantamount to moving it, since it's seen as an xvalue in that case. We could still copy it or more probably even elide it, it can depend on which is allowed / defined for A at the time.
The issues with f
// we can't tell these apart.
// `f` when compiled cannot assume either will always happen.
// case-by-case optimizations can only happen if `f` is
// inlined into the final code and then re-arranged, or if `f`
// is made into a template to specifically behave differently
// against differing types.
A case_1() {
// prvalues
return f( A() + A() );
}
A make_case_2() {
// xvalues
A temp;
return temp;
}
A case_2 = f( make_case_2() )
A case_3(A & other) {
// lvalues
return f( std::move( other ) );
}
Because of the ambiguity of usage the compiler and standards are designed to make f usable consistently in all cases. There can be no assumptions that A&& will always be a new expression or that you will only use it with std::move for its argument etc. Once f is made external to your code, leaving only its call signature, that cannot be the excuse anymore. The function signature - which reference overload to target - is a clue to what the function should be doing with it and how much (or little) it can assume about the context.
rvalue references are not a panacea for targeting only "moved values", they can target a good deal more things and even be targeted incorrectly or unexpectedly if you assume that's all they do. A reference to anything in general should be expected to and be made to exist for longer than the reference does, with the one exception being rvalue reference variables.
rvalue reference variables are in essence, elision operators. Wherever they exist there is in-place construction going on of some description.
As regular variables, they extend the scope of any xvalue or rvalue they receive - they hold the result of the expression as it's constructed rather than by move or copy, and from thereon are equivalent to regular reference variables in usage.
As function variables they can also elide and construct objects in-place, but there is a very important difference between this:
A c = f( A() );
and this:
A && r = f( A() );
The difference is there is no guarantee that c will be move-constructed vs elided, but r definitely will be elided / constructed in-place at some point, owing to the nature of what we're binding to. For this reason we can only assign to r in situations where there will be a new temporary value created.
But why is A&&a not destroyed if it is captured?
Consider this:
void bad_free(A && a) {
A && clever = std::move( a );
// 'clever' should be the last reference to a?
}
This won't work. The reason is subtle. a's scope is longer, and rvalue reference assignments can only extend the lifetime, not control it. clever exists for less time than a, and therefore is not an xvalue itself (unless using std::move again, but then you're back to the same situation, and it continues forth etc).
lifetime extension
Remember that what makes lvalues different to rvalues is that they cannot be bound to objects that have less lifetime than themselves. All lvalue references are either the original variable or a reference that has less lifetime than the original.
rvalues allow binding to reference variables that have longer lifetime than the original value - that's half the point. Consider:
A r = f( A() ); // v1
A && s = f( A() ); // v2
What happens? In both cases f is given a temporary value that outlives the call, and a result object (because f returns by value) is constructed somehow (it will not matter as you shall see). In v1 we are constructing a new object r using the temporary result - we can do this in three ways: move, copy, elide. In v2 we are not constructing a new object, we are extending the lifetime of the result of f to the scope of s, alternatively saying the same: s is constructed in-place using f and therefore the temporary returned by f has its lifetime extended rather than being moved or copied.
The main distinction is v1 requires move and copy constructors (at least one) to be defined even if the process is elided. For v2 you are not invoking constructors and are explicitly saying you want to reference and/or extend the lifetime of a temporary value, and because you don't invoke move or copy constructors the compiler can only elide / construct in-place!
Remember that this has nothing to do with the argument given to f. It works identically with g:
A r = g( A() ); // v1
A && s = g( A() ); // v2
g will create a temporary for its argument and move-construct it using A() for both cases. It like f also constructs a temporary for its return value, but it can use an xvalue because the result is constructed using a temporary (temporary to g). Again, this will not matter because in v1 we have a new object that could be copy-constructed or move-constructed (either is required but not both) while in v2 we are demanding reference to something that's constructed but will disappear if we don't catch it.
Explicit xvalue capture
Example to show this is possible in theory (but useless):
A && x (void) {
A temp;
// return temp; // even though xvalue, can't do this
return std::move(temp);
}
A && y = x(); // y now refers to temp, which is destroyed
Which object does y refer to? We have left the compiler no choice: y must refer to the result of some function or expression, and we've given it temp which works based on type. But no move has occurred, and temp will be deallocated by the time we use it via y.
Why didn't lifetime extension kick in for temp like it did for a in g / f? Because of what we're returning: we can't specify a function to construct things in-place, we can specify a variable to be constructed in place. It also goes to show that the compiler does not look across function / call boundaries to determine lifetime, it will just look at which variables are on the calling side or local, how they're assigned to and how they're initialized if local.
If you want to clear all doubts, try passing this as an rvalue reference: std::move(*(new A)) - what should happen is that nothing should ever destroy it, because it isn't on the stack and because rvalue references do not alter the lifetime of anything but temporary objects (ie, intermediates / expressions). xvalues are candidates for move construction / move assignment and can't be elided (already constructed) but all other move / copy operations can in theory be elided on the whim of the compiler; when using rvalue references the compiler has no choice but to elide or pass on the address.

Returning an argument passed by rvalue reference

If I have a class A and functions
A f(A &&a)
{
doSomething(a);
return a;
}
A g(A a)
{
doSomething(a);
return a;
}
the copy constructor is called when returning a from f, but the move constructor is used when returning from g. However, from what I understand, f can only be passed an object that it is safe to move (either a temporary or an object marked as moveable, e.g., using std::move). Is there any example when it would not be safe to use the move constructor when returning from f? Why do we require a to have automatic storage duration?
I read the answers here, but the top answer only shows that the spec should not allow moving when passing a to other functions in the function body; it does not explain why moving when returning is safe for g but not for f. Once we get to the return statement, we will not need a anymore inside f.
Update 0
So I understand that temporaries are accessible until the end of the full expression. However, the behavior when returning from f still seems to go against the semantics ingrained into the language that it is safe to move a temporary or an xvalue. For example, if you call g(A()), the temporary is moved into the argument for g even though there could be references to the temporary stored somewhere. The same happens if we call g with an xvalue. Since only temporaries and xvalues bind to rvalue references, it seems like to be consistent about the semantics we should still move a when returning from f, since we know a was passed either a temporary or an xvalue.
Second attempt. Hopefully this is more succinct and clear.
I am going to ignore RVO almost entirely for this discussion. It makes it really confusing as to what should happen sans optimizations - this is just about move vs copy semantics.
To assist this a reference is going to be very helpful here on the sorts of value types in c++11.
When to move?
lvalue
These are never moved. They refer to variables or storage locations that are potentially being referred to elsewhere, and as such should not have their contents transferred to another instance.
prvalue
The above defines them as "expressions that do not have identity". Clearly nothing else can refer to a nameless value so these can be moved.
rvalue
The general case of "right-hand" value, and the only thing that's certain is they can be moved from. They may or may not have a named reference, but if they do it is the last such usage.
xvalue
These are sort of a mix of both - they have identity (are a reference) and they can be moved from. They need not have a named variable. The reason? They are eXpiring values, about to be destroyed. Consider them the 'final reference'. xvalues can only be generated from rvalues which is why/how std::move works in converting lvalues to xvalues (through the result of a function call).
glvalue
Another mutant type with its rvalue cousin, it can be either an xvalue or an lvalue - it has identity but it's unclear if this is the last reference to the variable / storage or not, hence it is unclear if it can or cannot be moved from.
Resolution Order
Where an overload exists that can accept either a const lvalue ref or rvalue ref, and an rvalue is passed, the rvalue is bound otherwise the lvalue version is used. (move for rvalues, copy otherwise).
Where it potentially happens
(assume all types are A where not mentioned)
It only occurs where an object is "initialized from an xvalue of the same type". xvalues bind to rvalues but are not as restricted as pure expressions. In other words, movable things are more than unnamed references, they can also be the 'last' reference to an object with respect to the compiler's awareness.
initialization
A a = std::move(b); // assign-move
A a( std::move(b) ); // construct-move
function argument passing
void f( A a );
f( std::move(b) );
function return
A f() {
// A a exists, will discuss shortly
return a;
}
Why it will not happen in f
Consider this variation on f:
void action1(A & a) {
// alter a somehow
}
void action2(A & a) {
// alter a somehow
}
A f(A && a) {
action1( a );
action2( a );
return a;
}
It is not illegal to treat a as an lvalue within f. Because it is an lvalue it must be a reference, whether explicit or not. Every plain-old variable is technically a reference to itself.
That's where we trip up. Because a is an lvalue for the purposes of f, we are in fact returning an lvalue.
To explicitly generate an rvalue, we must use std::move (or generate an A&& result some other way).
Why it will happen in g
With that under our belts, consider g
A g(A a) {
action1( a ); // as above
action2( a ); // as above
return a;
}
Yes, a is an lvalue for the purposes of action1 and action2. However, because all references to a only exist within g (it's a copy or moved-into copy), it can be considered an xvalue in the return.
But why not in f?
There is no specific magic to &&. Really, you should think of it as a reference first and foremost. The fact that we are demanding an rvalue reference in f as opposed to an lvalue reference with A& does not alter the fact that, being a reference, it must be an lvalue, because the storage location of a is external to f and that's as far as any compiler will be concerned.
The same does not apply in g, where it's clear that a's storage is temporary and exists only when g is called and at no other time. In this case it is clearly an xvalue and can be moved.
rvalue ref vs lvalue ref and safety of reference passing
Suppose we overload a function to accept both types of references. What would happen?
void v( A & lref );
void v( A && rref );
The only time void v( A&& ) will be used per the above ("Where it potentially happens"), otherwise void v( A& ). That is, an rvalue ref will always attempt to bind to an rvalue ref signature before an lvalue ref overload is attempted. An lvalue ref should not ever bind to the rvalue ref except in the case where it can be treated as an xvalue (guaranteed to be destroyed in the current scope whether we want it to or not).
It is tempting to say that in the rvalue case we know for sure that the object being passed is temporary. That is not the case. It is a signature intended for binding references to what appears to be a temporary object.
For analogy, it's like doing int * x = 23; - it may be wrong, but you could (eventually) force it to compile with bad results if you run it. The compiler can't say for sure if you're being serious about that or pulling its leg.
With respect to safety one must consider functions that do this (and why not to do this - if it still compiles at all):
A & make_A(void) {
A new_a;
return new_a;
}
While there is nothing ostensibly wrong with the language aspect - the types work and we will get a reference to somewhere back - because new_a's storage location is inside a function, the memory will be reclaimed / invalid when the function returns. Therefore anything that uses the result of this function will be dealing with freed memory.
Similarly, A f( A && a ) is intended to but is not limited to accepting prvalues or xvalues if we really want to force something else through. That's where std::move comes in, and let's us do just that.
The reason this is the case is because it differs from A f( A & a ) only with respect to which contexts it will be preferred, over the rvalue overload. In all other respects it is identical in how a is treated by the compiler.
The fact that we know that A&& is a signature reserved for moves is a moot point; it is used to determine which version of "reference to A -type parameter" we want to bind to, the sort where we should take ownership (rvalue) or the sort where we should not take ownership (lvalue) of the underlying data (that is, move it elsewhere and wipe the instance / reference we're given). In both cases, what we are working with is a reference to memory that is not controlled by f.
Whether we do or not is not something the compiler can tell; it falls into the 'common sense' area of programming, such as not to use memory locations that don't make sense to use but are otherwise valid memory locations.
What the compiler knows about A f( A && a ) is to not create new storage for a, since we're going to be given an address (reference) to work with. We can choose to leave the source address untouched, but the whole idea here is that by declaring A&& we're telling the compiler "hey! give me references to objects that are about to disappear so I might be able to do something with it before that happens". The key word here is might, and again also the fact that we can explicitly target this function signature incorrectly.
Consider if we had a version of A that, when move-constructing, did not erase the old instance's data, and for some reason we did this by design (let's say we had our own memory allocation functions and knew exactly how our memory model would keep data beyond the lifetime of objects).
The compiler cannot know this, because it would take code analysis to determine what happens to the objects when they're handled in rvalue bindings - it's a human judgement issue at that point. At best the compiler sees 'a reference, yay, no allocating extra memory here' and follows rules of reference passing.
It's safe to assume the compiler is thinking: "it's a reference, I don't need to deal with its memory lifetime inside f, it being a temporary will be removed after f is finished".
In that case, when a temporary is passed to f, the storage of that temporary will disappear as soon as we leave f, and then we're potentially in the same situation as A & make_A(void) - a very bad one.
An issue of semantics...
std::move
The very purpose of std::move is to create rvalue references. By and large what it does (if nothing else) is force the resulting value to bind to rvalues as opposed to lvalues. The reason for this is a return signature of A& prior to rvalue references being available, was ambiguous for things like operator overloads (and other uses surely).
Operators - an example
class A {
// ...
public:
A & operator= (A & rhs); // what is the lifetime of rhs? move or copy intended?
A & operator+ (A & rhs); // ditto
// ...
};
int main() {
A result = A() + A(); // wont compile!
}
Note that this will not accept temporary objects for either operator! Nor does it make sense to do this in the case of object copy operations - why do we need to modify an original object that we are copying, probably in order to have a copy we can modify later. This is the reason we have to declare const A & parameters for copy operators and any situation where a copy is to be taken of the reference, as a guarantee that we are not altering the original object.
Naturally this is an issue with moves, where we must modify the original object to avoid the new container's data being freed prematurely. (hence "move" operation).
To solve this mess along comes T&& declarations, which are a replacement to the above example code, and specifically target references to objects in the situations where the above won't compile. But, we wouldn't need to modify operator+ to be a move operation, and you'd be hard pressed to find a reason for doing so (though you could I think). Again, because of the assumption that addition should not modify the original object, only the left-operand object in the expression. So we can do this:
class A {
// ...
public:
A & operator= (const A & rhs); // copy-assign
A & operator= (A && rhs); // move-assign
A & operator+ (const A & rhs); // don't modify rhs operand
// ...
};
int main() {
A result = A() + A(); // const A& in addition, and A&& for assign
A result2 = A().operator+(A()); // literally the same thing
}
What you should take note of here is that despite the fact that A() returns a temporary, it not only is able to bind to const A& but it should because of the expected semantics of addition (that it does not modify its right operand). The second version of the assignment is clearer why only one of the arguments should be expected to be modified.
It's also clear that a move will occur on the assignment, and no move will occur with rhs in operator+.
Separation of return value semantics and argument binding semantics
The reason that there is only one move above is clear from the function (well, operator) definitions. What's important is we are indeed binding what is clearly an xvalue / rvalue, to what is unmistakably an lvalue in operator+.
I have to stress this point: there is no effective difference in this example in the way that operator+ and operator= refer to their argument. As far as the compiler is concerned, within either's function body the argument is effectively const A& for + and A& for =. The difference is purely in constness. The only way in which A& and A&& differ is to distinguish signatures, not types.
With different signatures come different semantics, it's the compiler's toolkit for distinguishing certain cases where there otherwise is no clear distinction from the code. The behavior of the functions themselves - the code body - may not be able to tell the cases apart either!
Another example of this is operator++(void) vs operator++(int). The former expects to return its underlying value before an increment operation and the latter afterwards. There is no int being passed, it's just so the compiler has two signatures to work with - there is just no other way to specify two identical functions with the same name, and as you may or may not know, it is illegal to overload a function on just the return type for similar reasons of ambiguity.
rvalue variables and other odd situations - an exhaustive test
To understand unambiguously what is happening in f I've put together a smorgasbord of things one "should not attempt but look like they'd work" that forces the compiler's hand on the matter almost exhaustively:
void bad (int && x, int && y) {
x += y;
}
int & worse (int && z) {
return z++, z + 1, 1 + z;
}
int && justno (int & no) {
return worse( no );
}
int num () {
return 1;
}
int main () {
int && a = num();
++a = 0;
a++ = 0;
bad( a, a );
int && b = worse( a );
int && c = justno( b );
++c = (int) 'y';
c++ = (int) 'y';
return 0;
}
g++ -std=gnu++11 -O0 -Wall -c -fmessage-length=0 -o "src\\basictest.o" "..\\src\\basictest.cpp"
..\src\basictest.cpp: In function 'int& worse(int&&)':
..\src\basictest.cpp:5:17: warning: right operand of comma operator has no effect [-Wunused-value]
return z++, z + 1, 1 + z;
^
..\src\basictest.cpp:5:26: error: invalid initialization of non-const reference of type 'int&' from an rvalue of type 'int'
return z++, z + 1, 1 + z;
^
..\src\basictest.cpp: In function 'int&& justno(int&)':
..\src\basictest.cpp:8:20: error: cannot bind 'int' lvalue to 'int&&'
return worse( no );
^
..\src\basictest.cpp:4:7: error: initializing argument 1 of 'int& worse(int&&)'
int & worse (int && z) {
^
..\src\basictest.cpp: In function 'int main()':
..\src\basictest.cpp:16:13: error: cannot bind 'int' lvalue to 'int&&'
bad( a, a );
^
..\src\basictest.cpp:1:6: error: initializing argument 1 of 'void bad(int&&, int&&)'
void bad (int && x, int && y) {
^
..\src\basictest.cpp:17:23: error: cannot bind 'int' lvalue to 'int&&'
int && b = worse( a );
^
..\src\basictest.cpp:4:7: error: initializing argument 1 of 'int& worse(int&&)'
int & worse (int && z) {
^
..\src\basictest.cpp:21:7: error: lvalue required as left operand of assignment
c++ = (int) 'y';
^
..\src\basictest.cpp: In function 'int& worse(int&&)':
..\src\basictest.cpp:6:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
..\src\basictest.cpp: In function 'int&& justno(int&)':
..\src\basictest.cpp:9:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
01:31:46 Build Finished (took 72ms)
This is the unaltered output sans build header which you don't need to see :) I will leave it as an exercise to understand the errors found but re-reading my own explanations (particularly in what follows) it should be apparent what each error was caused by and why, imo anyway.
Conclusion - What can we learn from this?
First, note that the compiler treats function bodies as individual code units. This is basically the key here. Whatever the compiler does with a function body, it cannot make assumptions about the behavior of the function that would require the function body to be altered. To deal with those cases there are templates but that's beyond the scope of this discussion - just note that templates generate multiple function bodies to handle different cases, while otherwise the same function body must be re-usable in every case the function could be used.
Second, rvalue types were predominantly envisioned for move operations - a very specific circumstance that was expected to occur in assignment and construction of objects. Other semantics using rvalue reference bindings are beyond the scope of any compiler to deal with. In other words, it's better to think of rvalue references as syntax sugar than actual code. The signature differs in A&& vs A& but the argument type for the purposes of the function body does not, it is always treated as A& with the intention that the object being passed should be modified in some way because const A&, while correct syntactically, would not allow the desired behavior.
I can be very sure at this point when I say that the compiler will generate the code body for f as if it were declared f(A&). Per above, A&& assists the compiler in choosing when to allow binding a mutable reference to f but otherwise the compiler doesn't consider the semantics of f(A&) and f(A&&) to be different with respect to what f returns.
It's a long way of saying: the return method of f does not depend on the type of argument it receives.
The confusion is elision. In reality there are two copies in the returning of a value. First a copy is created as a temporary, then this temporary is assigned to something (or it isn't and remains purely temporary). The second copy is very likely elided via return optimization. The first copy can be moved in g and cannot in f. I expect in a situation where f cannot be elided, there will be a copy then a move from f in the original code.
To override this the temporary must be explicitly constructed using std::move, that is, in the return statement in f. However in g we're returning something that is known to be temporary to the function body of g, hence it is either moved twice, or moved once then elided.
I would suggest compiling the original code with all optimizations disabled and adding in diagnostic messages to copy and move constructors to keep tabs on when and where the values are moved or copied before elision becomes a factor. Even if I'm mistaken, an un-optimized trace of the constructors / operations used would paint an unambiguous picture of what the compiler has done, hopefully it will be apparent why it did what it did as well...
Short story: it only depends on doSomething.
Medium story: if doSomething never change a, then f is safe. It receives a rvalue reference and returns a new temporary moved from there.
Long story: things will go bad as soon as doSomething uses a in a move operation, because a may be in an undefined state before it is used in the return statement - it would be the same in g but at least the conversion to a rvalue reference should be explicit
TL/DR: both f and g are safe as long as there is no move operation inside doSomething. The difference comes that a move will silently executed in f, while it will require an explicit conversion to a rvalue reference (eg with std::move) in g.
Third attempt. The second became very long in the process of explaining every nook and cranny of the situation. But hey, I learned a lot too in the process, which I suppose is the point, no? :) Anyway. I'll re-address the question anew, keeping my longer answer as it in itself is a useful reference but falls short of a 'clear explanation'.
What are we dealing with here?
f and g are not trivial situations. They take time to understand and appreciate the first few times you encounter them. The issues at play are the lifetime of objects, Return Value Optimization, confusion of returning object values, and confusion with overloads of reference types. I'll address each and explain their relevance.
References
First thing's first. What's a reference? Aren't they just pointers without the syntax?
They are, but in an important way they're much more than that. Pointers are literally that, they refer to memory locations in general. There are few if any guarantees about the values located at wherever the pointer is set to. References on the other hand are bound to addresses of real values - values that guarantee to exist for the duration they can be accessed, but may not have a name for them available to be accessed in any other way (such as temporaries).
As a rule of thumb, if you can 'take its address' then you're dealing with a reference, a rather special one known as an lvalue. You can assign to an lvalue. This is why *pointer = 3 works, the operator * creates a reference to the address being pointed to.
This doesn't make the reference any more or less valid than the address it points to, however, references you naturally find in C++ do have this guarantee (as would well-written C++ code) - that they are referring to real values in a way where we don't need to know about its lifetime for the duration of our interactions with them.
Lifetime of Objects
We all should know by now when the c'tors and d'tors will be called for something like this:
{
A temp;
temp.property = value;
}
temp's scope is set. We know exactly when it's created and destroyed. One way we can be sure it's destroyed is because this is impossible:
A & ref_to_temp = temp; // nope
A * ptr_to_temp = &temp; // double nope
The compiler stops us from doing that because very clearly we should not expect that object to still exist. This can arise subtly whenever using references, which is why sometimes people can be found suggesting avoidance of references until you know what you're doing with them (or entirely if they've given up understanding them and just want to move on with their lives).
Scope of Expressions
On the other hand we also have to be mindful that temporaries exist until the outer-most expression they're found in has completed. That means up to the semicolon. An expression existing in the LHS of a comma operator, for example, doesn't get destroyed until the semicolon. Ie:
struct scopetester {
static int counter = 0;
scopetester(){++counter;}
~scopetester(){--counter;}
};
scopetester(), std::cout << scopetester::counter; // prints 1
scopetester(), scopetester(), std::cout << scopetester::counter; // prints 2
This still does not avoid issues of sequencing of execution, you still have to deal with ++i++ and other things - operator precedence and the dreaded undefined behavior that can result when forcing ambiguous cases (eg i++ = ++i). What is important is that all temporaries created exist until the semicolon and no longer.
There are two exceptions - elision / in-place-construction (aka RVO) and reference-assignment-from-temporary.
Returning by value and Elision
What is elision? Why use RVO and similar things? All of these come down under a single term that's far easier to appreciate - "in-place construction". Suppose we were using the result of a function call to initialize or set an object. Eg:
A x (void) {return A();}
A y( x() );
Lets consider the longest possible sequence of events that could happen here.
A new A is constructed in x
The temporary value returned by x() is a new A, initialized using a reference to the previous
A new A - y - is initialized using the temporary value
Where possible, the compiler should re-arrange things so that as few as possible intermediate A's are constructed where it's safe to assume the intermediate is inaccessible or otherwise unnecessary. The question is which of the objects can we do without?
Case #1 is an explicit new object. If we are to avoid this being created, we need to have a reference to an object that already exists. This is the most straightforward one and nothing more needs to be said.
In #2 we cannot avoid constructing some result. After all, we are returning by value. However, there are two important exceptions (not including exceptions themselves which are also affected when thrown): NRVO and RVO. These affect what happens in #3, but there are important consequences and rules regarding #2...
This is due to an interesting quirk of elision:
Notes
Copy elision is the only allowed form of optimization that can change the observable side-effects. Because some compilers do not perform copy elision in every situation where it is allowed (e.g., in debug mode), programs that rely on the side-effects of copy/move constructors and destructors are not portable.
Even when copy elision takes place and the copy-/move-constructor is not called, it must be present and accessible (as if no optimization happened at all), otherwise the program is ill-formed.
(Since C++11)
In a return statement or a throw-expression, if the compiler cannot perform copy elision but the conditions for copy elision are met or would be met, except that the source is a function parameter, the compiler will attempt to use the move constructor even if the object is designated by an lvalue; see return statement for details.
And more on that in the return statement notes:
Notes
Returning by value may involve construction and copy/move of a temporary object, unless copy elision is used.
(Since C++11)
If expression is an lvalue expression and the conditions for copy elision are met, or would be met, except that expression names a function parameter, then overload resolution to select the constructor to use for initialization of the returned value is performed twice: first as if expression were an rvalue expression (thus it may select the move constructor or a copy constructor taking reference to const), and if no suitable conversion is available, overload resolution is performed the second time, with lvalue expression (so it may select the copy constructor taking a reference to non-const).
The above rule applies even if the function return type is different from the type of expression (copy elision requires same type)
The compiler is allowed to even chain together multiple elisions. All it means is that two sides of a move / copy that would involve an intermediate object, could potentially be made to refer directly to each-other or even be made to be the same object. We don't know and shouldn't need to know when the compiler chooses to do this - it's an optimization, for one, but importantly you should think of move and copy constructors et al as a "last resort" usage.
We can agree the goal is to reduce the number of unnecessary operations in any optimization, provided the observable behavior is the same. Move and copy constructors are used wherever moves and copy operations happen, so what about when the compiler sees fit to remove a move/copy operation itself as an optimization? Should the functionally unnecessary intermediate objects exist in the final program just for the purposes of their side effects? The way the standard is right now, and compilers, seems to be: no - the move and copy constructors satisfy the how of those operations, not the when or why.
The short version: You have less temporary objects, that you ought to not care about to begin with, so why should you miss them. If you do miss them it may just be that your code relies on intermediate copies and moves to do things beyond their stated purpose and contexts.
Lastly, you need to be aware that the elided object is always stored (and constructed) in the receiving location, not the location of its inception.
Quoting this reference -
Named Return Value Optimization
If a function returns a class type by value, and the return statement's expression is the name of a non-volatile object with automatic storage duration, which isn't the function parameter, or a catch clause parameter, and which has the same type (ignoring top-level cv-qualification) as the return type of the function, then copy/move is omitted. When that local object is constructed, it is constructed directly in the storage where the function's return value would otherwise be moved or copied to. This variant of copy elision is known as NRVO, "named return value optimization".
Return Value Optimization
When a nameless temporary, not bound to any references, would be moved or copied into an object of the same type (ignoring top-level cv-qualification), the copy/move is omitted. When that temporary is constructed, it is constructed directly in the storage where it would otherwise be moved or copied to. When the nameless temporary is the argument of a return statement, this variant of copy elision is known as RVO, "return value optimization".
Lifetime of References
One thing we should not do, is this:
A & func() {
A result;
return result;
}
While tempting because it would avoid implicit copying of anything (we're just passing an address right?) it's also a short-sighted approach. Remember the compiler above preventing something looking like this with temp? Same thing here - result is gone once we're done with func, it could be reclaimed and could be anything now.
The reason we cannot is because we cannot pass an address to result out of func - whether as reference or as pointer - and consider it valid memory. We would get no further passing A* out.
In this situation it is best to use an object-copy return type and rely on moves, elision or both to occur as the compiler finds suitable. Always think of copy and move constructors as 'measures of last resort' - you should not rely on the compiler to use them because the compiler can find ways to avoid copy and move operations entirely, and is allowed to do so even if it means the side effects of those constructors wouldn't happen any more.
There is however a special case, alluded to earlier.
Recall that references are guarantees to real values. This implies that the first occurrence of the reference initializes the object and the last (as far as known at compile time) destroys it when going out of scope.
Broadly this covers two situations: when we return a temporary from a function. and when we assign from a function result. The first, returning a temporary, is basically what elision does but you can in effect elide explicitly with reference passing - like passing a pointer in a call chain. It constructs the object at the time of return, but what changes is the object is no longer destroyed after leaving scope (the return statement). And on the other end the second kind happens - the variable storing the result of the function call now has the honor of destroying the value when it goes out of scope.
The important point here is that elision and reference passing are related concepts. You can emulate elision by using pointers to uninitialized variables' storage location (of known type), for example, as you can with reference passing semantics (basically what they're for).
Overloads of Reference Types
References allow us to treat non-local variables as if they are local variables - to take their address, write to that address, read from that address, and importantly, be able to destroy the object at the right time - when the address can no longer be reached by anything.
Regular variables when they leave scope, have their only reference to them disappear, and are promptly destroyed at that time. Reference variables can refer to regular variables, but except for elision / RVO circumstances they do not affect the scope of the original object - not even if the object they referred to goes out of scope early, which can happen if you make references to dynamic memory and are not careful to manage those references yourself.
This means you can capture the results of an expression explicitly by reference. How? Well, this may seem odd at first but if you read the above it will make sense why this works:
class A {
/* assume rule-of-5 (inc const-overloads) has been followed but unless
* otherwise noted the members are private */
public:
A (void) { /* ... */ }
A operator+ ( const A & rhs ) {
A res;
// do something with `res`
return res;
}
};
A x = A() + A(); // doesn't compile
A & y = A() + A(); // doesn't compile
A && z = A() + A(); // compiles
Why? What's going on?
A x = ... - we can't because constructors and assignment is private.
A & y = ... - we can't because we're returning a value, not a reference to a value who's scope is greater or equal to our current scope.
A && z = ... - we can because we're able to refer to xvalues. As consequence of this assignment the lifetime of the temporary value is extended to this capturing lvalue because it in effect has become an lvalue reference. Sound familiar? It's explicit elision if I were to call it anything. This is more apparent when you consider this syntax must involve a new value and must involve assigning that value to a reference.
In all three cases when all constructors and assignment is made public, there is always only three objects constructed, with the address of res always matching the variable storing the result. (on my compiler anyway, optimizations disabled, -std=gnu++11, g++ 4.9.3).
Which means the differences really do come down to just the storage duration of function arguments themselves. Elision and move operations cannot happen on anything but pure expressions, expiring values, or explicit targeting of the "expiring values" reference overload Type&&.
Re-examining f and g
I've annotated the situation in both functions to get things rolling, a shortlist of assumptions the compiler would note when generating (reusable) code for each.
A f( A && a ) {
// has storage duration exceeding f's scope.
// already constructed.
return a;
// can be elided.
// must be copy-constructed, a exceeds f's scope.
}
A g( A a ) {
// has storage duration limited to this function's scope.
// was just constructed somehow, whether by elision, move or copy.
return a;
// elision may occur.
// can move-construct if can't elide.
// can copy-construct if can't move.
}
What we can say for sure about f's a is that it's expecting to capture moved or expression-type values. Because f can accept either expression-references (prvalues) or lvalue-references about to disappear (xvalues) or moved lvalue-references (converted to xvalues via std::move), and because f must be homogenous in the treatment of a for all three cases, a is seen as a reference first and foremost to an area of memory who's lifetime exists for longer than a call to f. That is, it is not possible to distinguish which of the three cases we called f with from within f, so the compiler assumes the longest storage duration it needs for any of the cases, and finds it safest not to assume anything about the storage duration of a's data.
Unlike the situation in g. Here, a - however it happens upon its value - will cease to be accessible beyond a call to g. As such returning it is tantamount to moving it, since it's seen as an xvalue in that case. We could still copy it or more probably even elide it, it can depend on which is allowed / defined for A at the time.
The issues with f
// we can't tell these apart.
// `f` when compiled cannot assume either will always happen.
// case-by-case optimizations can only happen if `f` is
// inlined into the final code and then re-arranged, or if `f`
// is made into a template to specifically behave differently
// against differing types.
A case_1() {
// prvalues
return f( A() + A() );
}
A make_case_2() {
// xvalues
A temp;
return temp;
}
A case_2 = f( make_case_2() )
A case_3(A & other) {
// lvalues
return f( std::move( other ) );
}
Because of the ambiguity of usage the compiler and standards are designed to make f usable consistently in all cases. There can be no assumptions that A&& will always be a new expression or that you will only use it with std::move for its argument etc. Once f is made external to your code, leaving only its call signature, that cannot be the excuse anymore. The function signature - which reference overload to target - is a clue to what the function should be doing with it and how much (or little) it can assume about the context.
rvalue references are not a panacea for targeting only "moved values", they can target a good deal more things and even be targeted incorrectly or unexpectedly if you assume that's all they do. A reference to anything in general should be expected to and be made to exist for longer than the reference does, with the one exception being rvalue reference variables.
rvalue reference variables are in essence, elision operators. Wherever they exist there is in-place construction going on of some description.
As regular variables, they extend the scope of any xvalue or rvalue they receive - they hold the result of the expression as it's constructed rather than by move or copy, and from thereon are equivalent to regular reference variables in usage.
As function variables they can also elide and construct objects in-place, but there is a very important difference between this:
A c = f( A() );
and this:
A && r = f( A() );
The difference is there is no guarantee that c will be move-constructed vs elided, but r definitely will be elided / constructed in-place at some point, owing to the nature of what we're binding to. For this reason we can only assign to r in situations where there will be a new temporary value created.
But why is A&&a not destroyed if it is captured?
Consider this:
void bad_free(A && a) {
A && clever = std::move( a );
// 'clever' should be the last reference to a?
}
This won't work. The reason is subtle. a's scope is longer, and rvalue reference assignments can only extend the lifetime, not control it. clever exists for less time than a, and therefore is not an xvalue itself (unless using std::move again, but then you're back to the same situation, and it continues forth etc).
lifetime extension
Remember that what makes lvalues different to rvalues is that they cannot be bound to objects that have less lifetime than themselves. All lvalue references are either the original variable or a reference that has less lifetime than the original.
rvalues allow binding to reference variables that have longer lifetime than the original value - that's half the point. Consider:
A r = f( A() ); // v1
A && s = f( A() ); // v2
What happens? In both cases f is given a temporary value that outlives the call, and a result object (because f returns by value) is constructed somehow (it will not matter as you shall see). In v1 we are constructing a new object r using the temporary result - we can do this in three ways: move, copy, elide. In v2 we are not constructing a new object, we are extending the lifetime of the result of f to the scope of s, alternatively saying the same: s is constructed in-place using f and therefore the temporary returned by f has its lifetime extended rather than being moved or copied.
The main distinction is v1 requires move and copy constructors (at least one) to be defined even if the process is elided. For v2 you are not invoking constructors and are explicitly saying you want to reference and/or extend the lifetime of a temporary value, and because you don't invoke move or copy constructors the compiler can only elide / construct in-place!
Remember that this has nothing to do with the argument given to f. It works identically with g:
A r = g( A() ); // v1
A && s = g( A() ); // v2
g will create a temporary for its argument and move-construct it using A() for both cases. It like f also constructs a temporary for its return value, but it can use an xvalue because the result is constructed using a temporary (temporary to g). Again, this will not matter because in v1 we have a new object that could be copy-constructed or move-constructed (either is required but not both) while in v2 we are demanding reference to something that's constructed but will disappear if we don't catch it.
Explicit xvalue capture
Example to show this is possible in theory (but useless):
A && x (void) {
A temp;
// return temp; // even though xvalue, can't do this
return std::move(temp);
}
A && y = x(); // y now refers to temp, which is destroyed
Which object does y refer to? We have left the compiler no choice: y must refer to the result of some function or expression, and we've given it temp which works based on type. But no move has occurred, and temp will be deallocated by the time we use it via y.
Why didn't lifetime extension kick in for temp like it did for a in g / f? Because of what we're returning: we can't specify a function to construct things in-place, we can specify a variable to be constructed in place. It also goes to show that the compiler does not look across function / call boundaries to determine lifetime, it will just look at which variables are on the calling side or local, how they're assigned to and how they're initialized if local.
If you want to clear all doubts, try passing this as an rvalue reference: std::move(*(new A)) - what should happen is that nothing should ever destroy it, because it isn't on the stack and because rvalue references do not alter the lifetime of anything but temporary objects (ie, intermediates / expressions). xvalues are candidates for move construction / move assignment and can't be elided (already constructed) but all other move / copy operations can in theory be elided on the whim of the compiler; when using rvalue references the compiler has no choice but to elide or pass on the address.

What exactly happens when we use rvalue references and how does std::move work?

I am trying to understand rvalue reference and move semantics. In following code, when I pass 10 to Print function it calls rvalue reference overload, which is expected. But what exactly happens, where will that 10 get copied (or from where it referred). Secondly what does std::move actually do? Does it extract value 10 from i and then pass it? Or it is instruction to compiler to use rvalue reference?
void Print(int& i)
{
cout<<"L Value reference "<<endl;
}
void Print(int&& i)
{
cout<<"R Value reference "<< endl;
}
int main()
{
int i = 10;
Print(i); //OK, understandable
Print(10); //will 10 is not getting copied? So where it will stored
Print(std::move(i)); //what does move exactly do
return 0;
}
Thanks.
In the case of a 10, there will probably be optimisations involved which will change the actual implementation, but conceptually, the following happens:
A temporary int is created and initialised with the value 10.
That temporary int is bound to the r-value reference function parameter.
So conceptually, there's no copying - the reference will refer to the temporary.
As for std::move(): there may be some tricky bits related to references etc., but principally, it's just a cast to r-value reference. std::move() does not actually move anything. It just turns its argument into an r-value, so that it can be moved from.
"Moving" is not really a defined operation, anyway. While it's convenient to think about moving, the important thing is l-value vs. r-value distinction.
"Moving" is normally implemented by move constructors, move assignment operators and functions taking r-value references (such as push_back()). It is their implementation that makes the move an actual move - that is, they are implemented so that they can "steal" the r-value's resources instead of copying them. That's because, being an r-value, it will no longer be accessible (or so you promise the compiler).
That's why std::move() enables "moving" - it turns its argument into an r-value, signalling, "hey, compiler, I will not be using this l-value any more, you can let functions (such as move ctors) treat it as an r-value and steal from it."
But what exactly happens, where that 10 will get copied (or from where it referred)
A temporary value is created, and a reference passed to the function. Temporaries are rvalues, so can be bound to rvalue references; so the second overload is chosen.
Secondly what std::move actually do?
It gives you an rvalue reference to its argument. It's equivalent (by definition) to static_cast<T&&>.
Despite the name, it doesn't do any movement itself; it just gives you a reference that can be used to move the value.
std::move cast int in int&& via static_cast<int&&>.
Eventually if the type is a class or a struct, the move constructor if it is defined (implicitly or explicitly) will be invoked instead of the copy constructor/classical constructor.