I was reading this post. The code under attention is the following
struct S {
void func() &;
void func() &&;
};
S s1;
s1.func(); // OK, calls S::func() &
S().func(); // OK, calls S::func() &&
I think I understood what are the reference qualifiers. My question is more basic: what is S()? Is it a (copy) creator? Why is it an rvalue? It seems that in that blog and also in other places it is taken for granted. What am I missing?
Formally it's an explicit cast (functional notation). It creates a temporary object from the (possibly empty) list of arguments. And yes, it does so by doing overload resolution to pick the correct constructor to call. In this case, the default c'tor (which your compiler produces, on account of no other c'tor being declared).
More formally, the explicit cast is an expression whose result is a prvalue (pure rvalue). So when doing overload resolution to pick a member function to call, the rvalue qualified version is preferable.
I imagine the blog skimmed over it because that existed in C++ since time immemorial. And the post's intent was to introduce a new concept, assuming readers already know this about C++.
S() creates a temporary object which is an rvalue. The object is constructed using default constructor. It is destroyed just after the full expression is evaluated.
More generally, one way to think about this syntax is: type name + arguments list passed to the constructor of the object.
In this case S is the type name and the empty parenthesis means there are no arguments for the constructor so the default constructor is chosen.
Related
Why is this way of initialisating a std::string:
std::string s = "123";
considered a copy initialisation, when no copy whatsoever actually occurs?
In the above case, there is no ambiguity that the compiler will see that there is a std::string constructor that takes a char const *, and consequently what happens here is construction of a std::string object via implicit conversion of char const * to std::string. This is such a clear-cut scenario. It's simply calling a std::string(const char *) constructor once, plain and simple. So simple there is even nothing to talk about regarding optimisations such as copy elision and move.
Now, the problem is, I never had any confusion about object initialisation via implicit conversion (i.e. Class a = expression) until I started coming across literature declaring that initialisation by = is "copy initialisation". Even the main man, Bjarne Stroustrup himself, refers to this form of initialisation as "copy initialisation".
At this point, I feel that I may be misunderstanding something.
So, why is initialisation by = considered copy-initialisation when clearly this is not the case if implicit conversion is allowed?
The term copy-initialization is simply used for an initializing syntax of the form:
T object = other;
where one of the effects of this initialization is:
If T is a class type, and the cv-unqualified version of the type of other is not T or derived from T, or if T is non-class type, but the type of other is a class type, user-defined conversion sequences that can convert from the type of other to T (or to a type derived from T if T is a class type and a conversion function is available) are examined and the best one is selected through overload resolution.
So for the expression:
std::string s = "123";
the implicit constructor that takes a const char * is used to construct the std::string.
So even though it has the term copy in it, copy-initialization does not mean there is an actual copy involved, it's only called that because the syntax makes it appear like a copy is happening.
The reason it is called copy initialization is because before C++11, that is what literally had to be done, per the rules of the language. When you have
T t = u;
if u is a T, then you call the copy constructor. So calling it copy initialization for that case makes sense
if u is not a T then [dcl.init]/15 bullet 7 came into play (from the C++03 draft) and that has
Otherwise (i.e., for the remaining copy-initialization cases), user-defined conversion sequences that can convert from the source type to the destination type or (when a conversion function is used) to a derived class thereof are enumerated as described in 13.3.1.4, and the best one is chosen through overload resolution (13.3). If the conversion cannot be done or is ambiguous, the initialization is ill-formed. The function selected is called with the initializer expression as its argument; if the function is a constructor, the call initializes a temporary of the cv-unqualified version of the destination type. The temporary is an rvalue. The result of the call (which is the temporary for the constructor case) is then used to direct-initialize, according to the rules above, the object that is the destination of the copy-initialization. In certain cases, an implementation is permitted to eliminate the copying inherent in this direct-initialization by constructing the intermediate result directly into the object being initialized; see 12.2, 12.8.
emphasis mine
which states that a temporary is created, and that temporary is used to initialize them object. Yes, it says this can be avoided in certain cases, but those section only allow the optimization, it is not mandated to happen.
So, again we are making a copy, so copy initialization makes sense.
With C++17 this is no longer the case and your guaranteed that no copy will exist, but were stuck with the name at this point.
Let's say I got a Foo class containing an std::vector constructed from std::unique_ptr objects of another class, Bar.
typedef std::unique_ptr<Bar> UniqueBar;
class Foo {
std::vector<UniqueBar> bars;
public:
void AddBar(UniqueBar&& bar);
};
void Foo::AddBar(UniqueBar&& bar) {
bars.push_back(bar);
}
This one results in a compilation error (in g++ 4.8.1) saying that the the copy constructor of std::unique_ptr is deleted, which is reasonable. The question here is, since the bar argument is already an rvalue reference, why does the copy constructor of std::unique_ptr is called instead of its move constructor?
If I explicitly call std::move in Foo::AddBar then the compilation issue goes away but I don't get why this is needed. I think it's quite redundant.
So, what am I missing?
Basically, every object which has a name is an lvalue. When you pass an object to a function using an rvalue reference the function actually sees an lvalue: it is named. What the rvalue reference does, however, indicate is that it came from an object which is ready to be transferred.
Put differently, rvalue references are assymmetrical:
they can only receive rvalues, i.e., either temporary objects, objects about to go away, or objects which look as if they are rvalues (e.g., the result of std::move(o))
the rvalue reference itself looks, however, like an lvalue
Confusing as it might seem, an rvalue-reference binds to an rvalue, but used as an expression is an lvalue.
bar is actually an lvalue, so you need to pass it through std::move, so that it is seen as an rvalue in the call to push_back.
The Foo::AddBar(UniqueBar&& bar) overload simply ensures that this overload is picked when an rvalue is passed in a call to Foo::AddBar. But the bar argument itself has a name and is an lvalue.
bar is defined as an rvalue-reference, but its value-category is an lvalue. This is so because the object has a name. If it has a name, it's an lvalue. Therefore an explicit std::move is necessary because the intention is to get rid of the name and return an xvalue (eXpiring-rvalue).
I am quite puzzled by the std::move stuff. Assume I have this
piece of code:
string foo() {
string t = "xxxx";
return t;
}
string s = foo();
How many times the string constructor is called? Is it 2 or 3?
Is the compiler going to use move for this line?
string s = foo();
If so, in the function I am not even returning rvalue reference, so how could the
compiler invoke the move constructor?
It depends on the compiler. In this case, the standard requires that there will be at least one constructor call. Namely, the construction of t.
But the standard allows the possibility of two others: the move-construction of the value output of foo from t, and the move-construction of s from the value output of foo. Most decent compilers will forgo these constructors by constructing t directly in the memory for s. This optimization is made possible because the standard allows these constructors to not be called if the compiler chooses not to.
This is called copy/move "elision".
If so, in the function I am not even returning rvalue reference, so how could the compiler invoke the move constructor?
You seem to be laboring under the misconception that && means "move", and that if there's no && somewhere, then movement can't happen. Or that move construction requires move, which also is not true.
C++ is specified in such a way that certain kinds of expressions in certain places are considered valid to move from. This means that the value or reference will attempt to bind to a && parameter before binding to a & parameter. Temporaries, for example, will preferentially bind to a && parameter before a const& one. That's why temporaries used to construct values of that type will be moved from.
If you have a function which returns a value of some type T, and a return expression is of the form return x, where x is a named variable of type T of automatic storage duration (ie: a function parameter or stack variable), then the standard requires that this return expression move construct the returned value from x.
The return value of foo is a temporary. The rules of C++ require that temporaries bind to && parameters before const&. So you get move construction into s.
A class must have a valid copy or move constructor for any of this syntax to be legal:
C x = factory();
C y( factory() );
C z{ factory() };
In C++03 it was fairly common to rely on copy elision to prevent the compiler from touching the copy constructor. Every class has a valid copy constructor signature regardless of whether a definition exists.
In C++11 a non-copyable type should define C( C const & ) = delete;, rendering any reference to the function invalid regardless of use (same for non-moveable). (C++11 §8.4.3/2). GCC, for one, will complain when trying to return such an object by value. Copy elision ceases to help.
Fortunately, we also have new syntax to express intent instead of relying on a loophole. The factory function can return a braced-init-list to construct the result temporary in-place:
C factory() {
return { arg1, 2, "arg3" }; // calls C::C( whatever ), no copy
}
Edit: If there's any doubt, this return statement is parsed as follows:
6.6.3/2: "A return statement with a braced-init-list initializes the object or reference to be returned from the function by copy-list-initialization (8.5.4) from the specified initializer list."
8.5.4/1: "list-initialization in a copy-initialization context is called copy-list-initialization." ¶3: "if T is a class type, constructors are considered. The applicable constructors are enumerated and the best one is chosen through overload resolution (13.3, 13.3.1.7)."
Do not be misled by the name copy-list-initialization. 8.5:
13: The form of initialization (using parentheses or =) is generally insignificant, but does matter when the
initializer or the entity being initialized has a class type; see below. If the entity being initialized does not
have class type, the expression-list in a parenthesized initializer shall be a single expression.
14: The initialization that occurs in the form
T x = a;
as well as in argument passing, function return, throwing an exception (15.1), handling an exception (15.3), and aggregate member initialization (8.5.1) is called copy-initialization.
Both copy-initialization and its alternative, direct-initialization, always defer to list-initialization when the initializer is a braced-init-list. There is no semantic effect in adding the =, which is one reason list-initialization is informally called uniform initialization.
There are differences: direct-initialization may invoke an explicit constructor, unlike copy-initialization. Copy-initialization initializes a temporary and copies it to initialize the object, when converting.
The specification of copy-list-initialization for return { list } statements merely specifies the exact equivalent syntax to be temp T = { list };, where = denotes copy-initialization. It does not immediately imply that a copy constructor is invoked.
-- End edit.
The function result can then be received into an rvalue reference to prevent copying the temporary to a local:
C && x = factory(); // applies to other initialization syntax
The question is, how to initialize a nonstatic member from a factory function returning non-copyable, non-moveable type? The reference trick doesn't work because a reference member doesn't extend the lifetime of a temporary.
Note, I'm not considering aggregate-initialization. This is about defining a constructor.
On your main question:
The question is, how to initialize a nonstatic member from a factory function returning non-copyable, non-moveable type?
You don't.
Your problem is that you are trying to conflate two things: how the return value is generated and how the return value is used at the call site. These two things don't connect to each other. Remember: the definition of a function cannot affect how it is used (in terms of language), since that definition is not necessarily available to the compiler. Therefore, C++ does not allow the way the return value was generated to affect anything (outside of elision, which is an optimization, not a language requirement).
To put it another way, this:
C c = {...};
Is different from this:
C c = [&]() -> C {return {...};}()
You have a function which returns a type by value. It is returning a prvalue expression of type C. If you want to store this value, thus giving it a name, you have exactly two options:
Store it as a const& or &&. This will extend the lifetime of the temporary to the lifetime of the control block. You can't do that with member variables; it can only be done with automatic variables in functions.
Copy/move it into a value. You can do this with a member variable, but it obviously requires the type to be copyable or moveable.
These are the only options C++ makes available to you if you want to store a prvalue expression. So you can either make the type moveable or return a freshly allocated pointer to memory and store that instead of a value.
This limitation is a big part of the reason why moving was created in the first place: to be able to pass things by value and avoid expensive copies. The language couldn't be changed to force elision of return values. So instead, they reduced the cost in many cases.
Issues like this were among the prime motivations for the change in C++17 to allow these initializations (and exclude the copies from the language, not merely as an optimization).
Assume that the following code is legal code that compiles properly, that T is a type name, and that x is the name of a variable.
Syntax one:
T a(x);
Syntax two:
T a = x;
Do the exact semantics of these two expressions ever differ? If so, under what circumstances?
If these two expressions ever do have different semantics I'm also really curious about which part of the standard talks about this.
Also, if there is a special case when T is the name of a scalar type (aka, int, long, double, etc...), what are the differences when T is a scalar type vs. a non-scalar type?
Yes. If the type of x is not T, then the second example expands to T a = T(x). This requires that T(T const&) is public. The first example doesn't invoke the copy constructor.
After the accessibility has been checked, the copy can be eliminated (as Tony pointed out). However, it cannot be eliminated before checking accessibility.
The difference here is between implicit and explicit construction, and there can be difference.
Imagine having a type Array with the constructor Array(size_t length), and that somewhere else, you have a function count_elements(const Array& array). The purpose of these are easily understandable, and the code seems readable enough, until you realise it will allow you to call count_elements(2000). This is not only ugly code, but will also allocate an array 2000 elements long in memory for no reason.
In addition, you may have other types that are implicitly castable to an integer, allowing you to run count_elements() on those too, giving you completely useless results at a high cost to efficiency.
What you want to do here, is declare the Array(size_t length) an explicit constructor. This will disable the implicit conversions, and Array a = 2000 will no longer be legal syntax.
This was only one example. Once you realise what the explicit keyword does, it is easy to dream up others.
From 8.5.14 (emphasis mine):
The function selected is called with the initializer expression as its argument; if the function is a constructor, the call initializes a temporary of the destination type. The result of the call (which is the temporary for the constructor case) is then used to direct-initialize, according to the rules above, the object that is the destination of the copy-initialization. In certain cases, an implementation is permitted to eliminate the copying inherent in this direct-initialization by constructing the intermediate result directly into the object being initialized; see class.temporary, class.copy.
So, whether they're equivalent is left to the implementation.
8.5.11 is also relevant, but only in confirming that there can be a difference:
-11- The form of initialization (using parentheses or =) is generally insignificant, but does matter when the entity being initialized has a class type; see below. A parenthesized initializer can be a list of expressions only when the entity being initialized has a class type.
T a(x) is direct initialization and T a = x is copy initialization.
From the standard:
8.5.11 The form of initialization (using parentheses or =) is generally insignificant, but does matter when the entity being initialized has a class type; see below. A parenthesized initializer can be a list of expressions only when the entity being initialized has a class type.
8.5.12 The initialization that occurs in argument passing, function return, throwing an exception (15.1), handling an exception (15.3), and brace-enclosed initializer lists (8.5.1) is called copy-initialization and is equivalent to the form
T x = a;
The initialization that occurs in new expressions (5.3.4), static_cast expressions (5.2.9), functional notation type conversions (5.2.3), and base and member initializers (12.6.2) is called direct-initialization and is equivalent to the form
T x(a);
The difference is that copy initialization creates a temporary object which is then used to direct-initialize. The compiler is allowed to avoid creating the temporary object:
8.5.14 ... The result of the call (which is the temporary for the constructor case) is then used to direct-initialize, according to the rules above, the object that is the destination of the copy-initialization. In certain cases, an implementation is permitted to eliminate the copying inherent in this direct-initialization by constructing the intermediate result directly into the object being initialized; see 12.2, 12.8.
Copy initialization requires a non-explicit constructor and a copy constructor to be available.
In C++, when you write this:
class A {
public:
A() { ... }
};
The compiler actually generates this, depending on what your code uses:
class A {
public:
A() { ... }
~A() { ... }
A(const A& other) {...}
A& operator=(const A& other) { ... }
};
So now you can see the different semantics of the various constructors.
A a1; // default constructor
A a2(a1); // copy constructor
a2 = a1; // copy assignment operator
The copy constructors basically copy all the non-static data. They are only generated if the resulting code is legal and sane: if the compiler sees types inside the class that he doesn't know how to copy (per normal assignment rules), then the copy constructor won't get generated. This means that if the T type doesn't support constructors, or if one of the public fields of the class is const or a reference type, for instance, the generator won't create them - and the code won't build. Templates are expanded at build time, so if the resulting code isn't buildable, it'll fail. And sometimes it fails loudly and very cryptically.
If you define a constructor (or destructor) in a class, the generator won't generate a default one. This means you can override the default generated constructors. You can make them private (they're public by default), you can override them so they do nothing (useful for saving memory and avoiding side-effects), etc.