Why is the copy constructor called when returning from bar instead of the move constructor?
#include <iostream>
using namespace std;
class Alpha {
public:
Alpha() { cout << "ctor" << endl; }
Alpha(Alpha &) { cout << "copy ctor" << endl; }
Alpha(Alpha &&) { cout << "move ctor" << endl; }
Alpha &operator=(Alpha &) { cout << "copy asgn op" << endl; }
Alpha &operator=(Alpha &&) { cout << "move asgn op" << endl; }
};
Alpha foo(Alpha a) {
return a; // Move ctor is called (expected).
}
Alpha bar(Alpha &&a) {
return a; // Copy ctor is called (unexpected).
}
int main() {
Alpha a, b;
a = foo(a);
a = foo(Alpha());
a = bar(Alpha());
b = a;
return 0;
}
If bar does return move(a) then the behavior is as expected. I do not understand why a call to std::move is necessary given that foo calls the move constructor when returning.
There are 2 things to understand in this situation:
a in bar(Alpha &&a) is a named rvalue reference; therefore, treated as an lvalue.
a is still a reference.
Part 1
Since a in bar(Alpha &&a) is a named rvalue reference, its treated as an lvalue. The motivation behind treating named rvalue references as lvalues is to provide safety. Consider the following,
Alpha bar(Alpha &&a) {
baz(a);
qux(a);
return a;
}
If baz(a) considered a as an rvalue then it is free to call the move constructor and qux(a) may be invalid. The standard avoids this problem by treating named rvalue references as lvalues.
Part 2
Since a is still a reference (and may refer to an object outside of the scope of bar), bar calls the copy constructor when returning. The motivation behind this behavior is to provide safety.
References
SO Q&A - return by rvalue reference
Comment by Kerrek SB
yeah, very confusing. I would like to cite another SO post here implicite move. where I find the following comments a bit convincing,
And therefore, the standards committee decided that you had to be
explicit about the move for any named variable, regardless of its
reference type
Actually "&&" is already indicating let-go and at the time when you do "return", it is safe enough to do move.
probably it is just the choice from standard committee.
item 25 of "effective modern c++" by scott meyers, also summarized this, without giving much explanations.
Alpha foo() {
Alpha a
return a; // RVO by decent compiler
}
Alpha foo(Alpha a) {
return a; // implicit std::move by compiler
}
Alpha bar(Alpha &&a) {
return a; // Copy ctor due to lvalue
}
Alpha bar(Alpha &&a) {
return std:move(a); // has to be explicit by developer
}
This is a very very common mistake to make as people first learn about rvalue references. The basic problem is a confusion between type and value category.
int is a type. int& is a different type. int&& is yet another type. These are all different types.
lvalues and rvalues are things called value categories. Please check out the fantastic chart here: What are rvalues, lvalues, xvalues, glvalues, and prvalues?. You can see that in addition to lvalues and rvalues, we also have prvalues and glvalues and xvalues, and they form a various venn diagram sort of relation.
C++ has rules that say that variables of various types can bind to expressions. An expressions reference type however, is discarded (people often say that expressions do not have reference type). Instead, the expression have a value category, which determines which variables can bind to it.
Put another way: rvalue references and lvalue references are only directly relevant on the left hand of the assignment, the variable being created/bound. On the right side, we are talking about expressions and not variables, and rvalue/lvalue reference-ness is only relevant in the context of determining value category.
A very simple example to start with is simple looking at things of purely type int. A variable of type int as an expression, is an lvalue. However, an expression consisting of evaluating a function that returns an int, is an rvalue. This makes intuitive sense to most people; the key thing though is to separate out the type of an expression (even before references are discarded) and its value category.
What this is leading to, is that even though variables of type int&& can only bind to rvalues, does not mean that all expressions with type int&&, are rvalues. In fact, as the rules at http://en.cppreference.com/w/cpp/language/value_category say, any expression consisting of naming a variable, is always an lvalue, no matter the type.
That's why you need std::move in order to pass along rvalue references into subsequent functions that take by rvalue reference. It's because rvalue references do not bind to other rvalue references. They bind to rvalues. If you want to get the move constructor, you need to give it an rvalue to bind to, and a named rvalue reference is not an rvalue.
std::move is a function that returns an rvalue reference. And what's the value category of such an expression? An rvalue? Nope. It's an xvalue. Which is basically an rvalue, with some additional properties.
In both foo and bar, the expression a is an lvalue. The statement return a; means to initialize the return value object from the initializer a, and return that object.
The difference between the two cases is that overload resolution for this initialization is performed differently depending on whether or not a declared as a non-volatile automatic object within the innermost enclosing block, or a function parameter.
Which it is for foo but not bar. (In bar , a is declared as a reference). So return a; in foo selects the move constructor to initialize the return value, but return a; in bar selects the copy constructor.
The full text is C++14 [class.copy]/32:
When the criteria for elision of a copy/move operation are met, but not for an exception-declaration , and the object to be copied is designated by an lvalue, or when the expression in a return statement is a (possibly parenthesized) id-expression that names an object with automatic storage duration declared in the body or parameter-declaration-clause of the innermost enclosing function or lambda-expression, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue. If the first overload resolution fails or was not performed, or if the type of the first parameter of the selected constructor is not an rvalue reference to the object’s type (possibly cv-qualified), overload resolution is performed again, considering the object as an lvalue. [Note: This two-stage overload resolution must be performed regardless of whether copy elision will occur. It determines the constructor to be called if elision is not performed, and the selected constructor must be accessible even if the call is elided. —end note ]
where "criteria for elision of a copy/move operation are met" refers to [class.copy]/31.1:
in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv-unqualified type as the function return type, the copy/move operation can be omitted by constructing
the automatic object directly into the function’s return value
Note, these texts will change for C++17.
Related
My question originates from delving into std::move in return statements, such as in the following example:
struct A
{
A() { std::cout << "Constructed " << this << std::endl; }
A(A&&) noexcept { std::cout << "Moved " << this << std::endl; }
};
A nrvo()
{
A local;
return local;
}
A no_nrvo()
{
A local;
return std::move(local);
}
int main()
{
A a1(nrvo());
A a2(no_nrvo());
}
which prints (MSVC, /std:c++17, release)
Constructed 0000000C0BD4F990
Constructed 0000000C0BD4F991
Moved 0000000C0BD4F992
I am interested in the general initialization rules for return statements in functions that return by-value and which rules apply when returning a local variable with std::move as shown above.
The general case
Regarding return statements you can read
Evaluates the expression, terminates the current function and returns the result of the expression to the caller after implicit conversion to the function return type. [...]
on cppreference.com.
Amongst others Copy initialization happens
when returning from a function that returns by value
like so
return other;
Coming back to my example, according to my current knowledge - and in contrast to the above-named rule - A a1(nrvo()); is a statement that direct-initializes a1 with the prvalue nrvo(). So which object exactly is copy-initialized as described at cppreference.com for return statements?
The std::move case
For this case, I've referred to ipc's answer on Are returned locals automatically xvalues. I want to make sure that the following is correct: std::move(local) has the type A&& but no_nrvo() is declared to return the type A, so here the
returns the result of the expression to the caller after implicit conversion to the function return type
part should come into play. I think this should be an Lvalue to rvalue conversion:
A glvalue of any non-function, non-array type T can be implicitly converted to a prvalue of the same type. [...] For a class type, this conversion [...] converts the glvalue to a prvalue whose result object is copy-initialized by the glvalue.
To convert from A&& to A A's move constructor is used, which is also why NRVO is disabled here. Are those the rules that apply in this case, and did I understand them correctly? Also, again they say copy-initialized by the glvalue but A a2(no_nrvo()); is a direct initialization. So this also touches on the first case.
You have to be careful with cppreference.com when diving into such nitty-gritty, as it's not an authoritative source.
So which object exactly is copy-initialized as described at cppreference.com for return statements?
In this case, none. That's what copy elision is: The copy that would normally happen is skipped. The cppreference (4) clause could be written as "when returning from a function that returns by value, and the copy is not elided", but that's kind of redundant. The standard: [stmt.return] is a lot clearer on the subject.
To convert from A&& to A A's move constructor is used, which is also why NRVO is disabled here. Are those the rules that apply in this case, and did I understand them correctly?
That's not quite right. NRVO only applies to names of non-volatile objects. However, in return std::move(local);, it's not local that is being returned, it's the A&& that is the result of the call to std::move(). This has no name, thus mandatory NRVO does not apply.
I think this should be an Lvalue to rvalue conversion:
The A&& returned by std::move() is decidedly not an Lvalue. It's an xvalue, and thus an rvalue already. There is no Lvalue to rvalue conversion happening here.
but A a2(no_nrvo()); is a direct initialization. So this also touches on the first case.
Not really. Whether a function has to perform copy-initialization of its result as part of a return statement is not impacted in any way by how that function is invoked. Similarly, how a function's return argument is used at the callsite is not impacted by the function's definition.
In both cases, an is direct-initialized by the result of the function. In practice, this means that the compiler will use the same memory location for the an object as for the return value of the function.
In A a1(nrvo());, thanks to NRVO, the memory location assigned to local is the same as the function's result value, which happens to be a1 already. Effectively, local and a1 were the same object all along.
In A a2(no_nrvo()), local has its own storage, and the result of the function, aka a2 is move-constructed from it. Effectively, local is moved into a2.
Which clause in the C++11 Standard supports the move constructor call in the return of the function foo() below?
#include <iostream>
class A
{
public:
A() { std::cout << "Ctor\n"; }
A(const A&) {std::cout << "Copy ctor\n";}
A(A&&) {std::cout << "Move ctor\n";}
};
A foo(A&& ra) { return std::move(ra); }
int main()
{
A a = foo(A());
}
This question was closed I believe yesterday, now it was placed "on hold" and the reason for the closing was that it's too localized. It's difficult for me to understand how a post in SO asking a specific question about the C++11 Standard could be considered "too localized". For me this is a contradiction in terms, as the Standard is "de facto" the final document that every C++ programmer should look for, in case of doubt about the language.
There are lots of clauses about the code. Specifically initialization (back of clause 8), overload resolution (clause 13) and more basically clause 3 and 5 to understand value categories of expressions and reference types.
First the expression A() is a class prvalue resulting from the default construction of a temporary.
It initializes the rvalue reference ra by direct reference binding.
ra initializes the parameter of move by direct reference binding, and move returns an xvalue of type A, again initialized by direct reference binding, which initializes the return value of foo, by overload resolving to the move constructor, moving the first temporary into the return value of foo, also a temporary.
The expression foo(A()) is a class prvalue refering to the second temporary.
This would then normally copy-initialize a by overload resolving the prvalue to the move constructor, and move constructing a from the return value of foo - however due to 12.8/32p3:
when a temporary class object that has not been bound to a reference (12.2) would be copied/moved to a class object with the same cv-unqualified type, the copy/move operation can be omitted by constructing the temporary object directly into the target of the omitted copy/move
Therefore the return value of foo is usually directly constructed it in the storage of a, and this second move construction is elided.
If you compile this program with a C++11 compiler, the vector is not moved out of the function.
#include <vector>
using namespace std;
vector<int> create(bool cond) {
vector<int> a(1);
vector<int> b(2);
return cond ? a : b;
}
int main() {
vector<int> v = create(true);
return 0;
}
If you return the instance like this, it is moved.
if(cond) return a;
else return b;
Here is a demo on ideone.
I tried it with gcc 4.7.0 and MSVC10. Both behave the same way.
My guess why this happens is this:
The ternary operators type is an lvalue because it is evaluated before return statement is executed. At this point a and b are not yet xvalues (soon to expire).
Is this explanation correct?
Is this a defect in the standard?
This is clearly not the intended behaviour and a very common case in my opinion.
Here are the relevant Standard quotes:
12.8 paragraph 32:
Copy elision is permitted in the following circumstances [...]
in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv-unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function's return value
[when throwing, with conditions]
[when the source is a temporary, with conditions]
[when catching by value, with conditions]
paragraph 33:
When the criteria for elision of a copy operation are met or would be met save for the fact that the source object is a function parameter, and the object to be copied is designated by an lvalue, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue. If overload resolution fails, or if the type of the first parameter of the selected constructor is not an rvalue reference to the object's type (possibly cv-qualified), overload resolution is performed again, considering the object as an lvalue. [Note: This two-stage overload resolution must be performed regardless of whether copy elision will occur. It determines the constructor to be called if elision is not performed, and the selected constructor must be accessible even if the call is elided. - end note]
Since the expression in return (cond ? a : b); is not a simple variable name, it's not eligible for copy elision or rvalue treatment. Maybe a bit unfortunate, but it's easy to imagine stretching the example a little bit further at a time until you create a headache of an expectation for compiler implementations.
You can of course get around all this by explicitly saying to std::move the return value when you know it's safe.
This will fix it
return cond ? std::move(a) : std::move(b);
Consider the ternary operator as a function, like your code is
return ternary(cond, a, b);
The parameters won't be moved implicitly, you need to make it explicit.
Following on from a comment I made on this:
passing std::vector to constructor and move semantics
Is the std::move necessary in the following code, to ensure that the returned value is a xvalue?
std::vector<string> buildVector()
{
std::vector<string> local;
// .... build a vector
return std::move(local);
}
It is my understanding that this is required. I have often seen this used when returning a std::unique_ptr from a function, however GManNickG made the following comment:
It is my understanding that in a return statement all local variables are automatically xvalues (expiring values) and will be moved, but I'm unsure if that only applies to the returned object itself. So OP should go ahead and put that in there until I'm more confident it shouldn't have to be. :)
Can anyone clarify if the std::move is necessary?
Is the behaviour compiler dependent?
You're guaranteed that local will be returned as an rvalue in this situation. Usually compilers would perform return-value optimization though before this even becomes an issue, and you probably wouldn't see any actual move at all, since the local object would be constructed directly at the call site.
A relevant Note in 6.6.3 ["The return statement"] (2):
A copy or move operation associated with a return statement may be elided or considered as an rvalue for the purpose of overload resolution in selecting a constructor (12.8).
To clarify, this is to say that the returned object can be move-constructed from the local object (even though in practice RVO will skip this step entirely). The normative part of the standard is 12.8 ["Copying and moving class objects"] (31, 32), on copy elision and rvalues (thanks #Mankarse!).
Here's a silly example:
#include <utility>
struct Foo
{
Foo() = default;
Foo(Foo const &) = delete;
Foo(Foo &&) = default;
};
Foo f(Foo & x)
{
Foo y;
// return x; // error: use of deleted function ‘Foo::Foo(const Foo&)’
return std::move(x); // OK
return std::move(y); // OK
return y; // OK (!!)
}
Contrast this with returning an actual rvalue reference:
Foo && g()
{
Foo y;
// return y; // error: cannot bind ‘Foo’ lvalue to ‘Foo&&’
return std::move(y); // OK type-wise (but undefined behaviour, thanks #GMNG)
}
Altough both, return std::move(local) and return local, do work in sense of that they do compile, their behavior is different. And probably only the latter one was intended.
If you write a function which returns a std::vector<string>, you have to return a std::vector<string> and exactly it. std::move(local) has the typestd::vector<string>&& which is not a std::vector<string> so it has to be converted to it using the move constructor.
The standard says in 6.6.3.2:
The value of the expression is implicitly
converted to the return type of the function in which it appears.
That means, return std::move(local) is equalvalent to
std::vector<std::string> converted(std::move(local); // move constructor
return converted; // not yet a copy constructor call (which will be elided anyway)
whereas return local only is
return local; // not yet a copy constructor call (which will be elided anyway)
This spares you one operation.
To give you a short example of what that means:
struct test {
test() { std::cout << " construct\n"; }
test(const test&) { std::cout << " copy\n"; }
test(test&&) { std::cout << " move\n"; }
};
test f1() { test t; return t; }
test f2() { test t; return std::move(t); }
int main()
{
std::cout << "f1():\n"; test t1 = f1();
std::cout << "f2():\n"; test t2 = f2();
}
This will output
f1():
construct
f2():
construct
move
I think the answer is no. Though officially only a note, §5/6 summarizes what expressions are/aren't xvalues:
An expression is an xvalue if it is:
the result of calling a function, whether implicitly or explicitly, whose return type is an rvalue reference to object type,
a cast to an rvalue reference to object type,
a class member access expression designating a non-static data member of non-reference type in which the object expression is an xvalue, or
a .* pointer-to-member expression in which the first operand is an xvalue and the second operand is a pointer to data member.
In general, the effect of this rule is that named rvalue references are treated as lvalues and unnamed rvalue references to objects are treated as xvalues; rvalue references to functions are treated as lvalues whether named or not.
The first bullet point seems to apply here. Since the function in question returns a value rather than an rvalue reference, the result won't be an xvalue.
An lvalue is a value bound to a definitive region of memory whereas an rvalue is an expression value whose existence is temporary and who does not necessarily refer to a definitive region of memory. Whenever an lvalue is used in a position in which an rvalue is expected, the compiler performs an lvalue-to-rvalue conversion and then proceeds with evaluation.
http://www.eetimes.com/discussion/programming-pointers/4023341/Lvalues-and-Rvalues
Whenever we construct a temporary (anonymous) class object or return a temporary class object from a function, although the object is temporary, it is addressable. However, the object still is a valid rvalue. This means that the object is a) an addressable rvalue or b) is being implicitly converted from an lvalue to an rvalue when the compiler expects an lvalue to be used.
For instance:
class A
{
public:
int x;
A(int a) { x = a; std::cout << "int conversion ctor\n"; }
A(A&) { std::cout << "lvalue copy ctor\n"; }
A(A&&) { std::cout << "rvalue copy ctor\n"; }
};
A ret_a(A a)
{
return a;
}
int main(void)
{
&A(5); // A(5) is an addressable object
A&& rvalue = A(5); // A(5) is also an rvalue
}
We also know that temporary objects returned (in the following case a) by functions are lvalues as this code segment:
int main(void)
{
ret_a(A(5));
}
yields the following output:
int conversion ctor
lvalue copy ctor
Indicating that the call to the function ret_a using actual argument A(5) calls the conversion constructor A::A(int) which constructs the function's formal argument a with the value 5.
When the function completes execution, it then constructs a temporary A object using a as its argument, which invokes A::A(A&). However, if we were to remove A::A(A&) from the list of overloaded constructors, the returned temporary object would still match the rvalue-reference constructor A::A(A&&).
This is what I'm not quite understanding: how can the object a match both an rvalue reference and an lvalue reference? It is clear that A::A(A&) is a better match than A::A(A&&) (and therefore a must be an lvalue). But, because an rvalue reference cannot be initialized to an lvalue, given that the formal argument a is an lvalue, it should not be able to match the call to A::A(A&&). If the compiler is making an lvalue-to-rvalue conversion it would be trivial. The fact that a conversion from 'A' to 'A&' is also trivial, both functions should have identical implicit conversion sequence ranks and therefore, the compiler should not be able to deduce the best-matching function when both A::A(A&) and A::A(A&&) are in the overloaded function candidate set.
Moreover, the question (which I previously asked) is:
How can a given object match both an rvalue reference and an lvalue reference?
For me:
int main(void)
{
ret_a(A(5));
}
Yields:
int conversion ctor
rvalue copy ctor
(i.e. rvalue, not lvalue). This is a bug in your compiler. However it is understandable as the rules for this behavior changed only a few months ago (Nov. 2010). More on this below.
When the function completes execution,
it then constructs a temporary A
object using a as its argument, which
invokes A::A(A&).
Actually no. When the function ret_a completes execution, it then constructs a temporary A object using a as its argument, which invokes A:A(A&&). This is due to [class.copy]/p33]1:
When the criteria for elision of a
copy operation are met or would be met
save for the fact that the source
object is a function parameter, and
the object to be copied is designated
by an lvalue, overload resolution to
select the constructor for the copy is
first performed as if the object were
designated by an rvalue. If overload
resolution fails, or if the type of
the first parameter of the selected
constructor is not an rvalue reference
to the object’s type (possibly
cv-qualified), overload resolution is
performed again, considering the
object as an lvalue. [ Note: This
two-stage overload resolution must be
performed regardless of whether copy
elision will occur. It determines the
constructor to be called if elision is
not performed, and the selected
constructor must be accessible even if
the call is elided. — end note ]
However if you remove the A::A(A&&) constructor, then A::A(&) will be chosen for the return. Although in this case, then the construction of the argument a will fail because you can't construct it using an rvalue. However ignoring that for the moment, I believe your ultimate question is:
How can a given object match both an
rvalue reference and an lvalue
reference?
in referring to the statement:
return a;
And the answer is in the above quoted paragraph from the draft standard: First overload resolution is tried as if a is an rvalue. And if that fails, overload resolution is tried again using a as an lvalue. This two-stage process is tried only in the context wherein copy elision is permissible (such as a return statement).
The C++0x draft has just recently been changed to allow the two-stage overload resolution process when returning arguments that have been passed by value (as in your example). And that is the reason for the varying behavior from different compilers that we are seeing.