I've been questioning myself whether Temporary Objects do have identity. I know that the following is valid:
object.temporary_object().modify()
as far as the object returned is non-const or the function called on the object do not modify immutable members.
According to the value categories definition, prvalues results can be moved but have no identity and as Temporary Objects are result of prvalue expressions, how can they be modified?
The linked document is not normative. In some sense it seems to describe what prvalues ought to be, rather than what they were at the time. In C++17, it became true that prvalues have no identity---but in C++11 and C++14, it was not quite so.
In C++11 and C++14, a prvalue of class type does have an identity, because, as you've observed, it's possible to call a method on it, and there are also ways to observe its address. Similarly, prvalues of array type have identity. Prvalues of scalar type (e.g., integer literals) do not have identity. Binding them to references will cause the materialization of a temporary object, which now has an address but is no longer observable as a prvalue.
In C++17, prvalues have no identity, and are not temporary objects, but are instead expressions that can be used to create temporary (or non-temporary) objects. Moving from a prvalue to an object effectively "invokes" the prvalue. A temporary object is only observable as an xvalue.
You can modify the temporary object just like any other. The modifications will just be moot as they are thrown away when the life-time of the temporary object ends and it is destructed.
It's kind of similar to something like this:
SomeClass object;
// Some code...
{
// Entering a new scope, the life-time of variables in here ends when the scope ends
SomeOtherClass temporary_object = object.temporary_object();
temporary_object.modify();
// Now the scope ends, and the life-time of temporary_object with it
}
// Here there exists no such things as "temporary_object"
All modifications you made to temporary_objects will be lost when the nested scope ends, and temporary_object is destructed.
One important disclaimer: You can design the SomeOtherClass (from the above example) to keep a link (reference or pointer) to object, and the modify() function can use that link to modify object itself. Those modifications will still exist after temporary_object is destructed, as they are modification on object itself instead of temporary_object.
Related
Firstly, I've heard that Guaranteed Copy Elision is a misnomer (as, I currently understand, it's more about fundamentally redefining the fundamental value categories, r/l-values to l/x/pr-values, which fundamentally changes the meaning and requirements of a copy), but as that's what it's commonly referred as, I will too.
After reading a bit on this topic, I thought I finally understood it - at least well enough to think that:
my_vector.push_back({arg1, arg2});
is, as of c++17, equivalent to:
my_vector.emplace_back(arg1, arg2);
I recently tried to convince my colleague of this. The only problem was he showed me that I was completely wrong! He wrote some godbolt code (like this) where the assembly shows that the push_back creates a temporary that gets moved into the vector.
So, to complete my question I must first justify that there's some reason for confusion here. I'll quote the well-regarded stackoverflow answer on this topic (emphasis mine):
Guaranteed copy elision redefines the meaning of a prvalue expression.
[...] a prvalue expression is merely something which can materialize a temporary,
but it isn't a temporary yet.
If you use a prvalue to initialize an object of the prvalue's type,
then no temporary is materialized. [...]
The thing to understand is that, since the return value is a prvalue,
it is not an object yet. It is merely an initializer for an object [...]
In my case I would've thought that auto il = {arg1, arg2} would call the constructor for std::initializer_list, but that {arg1, arg2} in push_back({arg1, arg2}) would be a prvalue (as it's unnamed) and so would be an initiliser for the vector element without being initialised itself.
When you do T t = Func();, the prvalue of the return value directly
initializes the object t; there is no "create a temporary and
copy/move" stage. Since Func()'s return value is a prvalue equivalent
to T(), t is directly initialized by T(), exactly as if you had done T
t = T().
If a prvalue is used in any other way, the prvalue will materialize a
temporary object, which will be used in that expression (or discarded
if there is no expression). So if you did const T &rt = Func();, the
prvalue would materialize a temporary (using T() as the initializer),
whose reference would be stored in rt, along with the usual temporary
lifetime extension stuff.
Guaranteed elision also works with direct initialization
Could someone kindly explain to me why Guaranteed Copy Elision doesn't apply to my example the way that I expected?
but that {arg1, arg2} in push_back({arg1, arg2}) would be a prvalue (as it's unnamed) and so would be an initiliser for the vector object without being initialised itself.
I assume that with "vector object" you mean here the vector element, the object that will be stored in the storage managed by the vector and which the push_back/emplace_back is supposed to add to the it.
{arg1, arg2} itself is not an expression, it is just a braced-init-list, a different grammatical construct. So it itself doesn't have a value category. However it has rules as to how it acts in overload resolution and how it initializes objects and references.
The overload chosen for push_back will be
void push_back(value_type&&);
where value_type is the element type of the vector. The reference parameter in this overload needs to reference some object of type value_type. So the braced-init-list must be used to construct an (temporary) object of type value_type to bind this reference to. However this can't be the object that is stored in the vector, because it is a temporary created in the context of the caller. The caller doesn't know where push_back will construct the actual element for the vector. Hence push_back will need to do a move construction from the temporary object bound to the parameter reference to the actual object placed in the vector's storage.
So effectively my_vector.push_back({arg1, arg2}); is the same as my_vector.push_back(value_type{arg1, arg2});, only that value_type{arg1, arg2} is an actual prvalue expression which will be materialized to a temporary object when initializing push_back's reference parameter from it. Because of this almost identical behavior one might sloppily say that {arg1, arg2} "is a prvalue" or "is a temporary", even though that is technically not correct.
push_back doesn't have any overload that doesn't take a reference to a value_type, so this is always unavoidably with it. emplace_back on the other hand takes any types as arguments and then just forwards them directly to the construction of the object stored in the vector's storage.
It is also impossible to forward braced-init-lists. There is no syntax to capture them while preserving the type of the individual list elements. You can only initialize an object of a specified type from the whole list as with push_back or initialize an array or std::initializer_list with homogeneous element type (one element from each list element), which is what an initializer-list constructor would do (with the homogeneous type being the vector's element type).
I'm sorry for the broadness of the question, it's just that all these details are tightly interconnected..
I've been trying to understand the difference between specifically two value categories - xvalues and prvalues, but still I'm confused.
Anyway, the mental model I tried to develop for myself for the notion of 'identity' is that the expression that has it should be guaranteed to reside in the actual program's data memory.
Like for this reason string literals are lvalues, they're guaranteed to reside in memory for the entire program run, while number literals are prvalues and could e.g. hypothetically be stored in straight asm.
The same seems to apply to std::move from prvalue literal, i.e. when calling fun(1) we would get only the parameter lvalue in the callee frame, but when calling fun(std::move(1)) the xvalue 'kind' of glvalue must be kept in the caller frame.
However this mental model doesn't work at least with temporary objects, which, as I understand, should always be created in the actual memory (e.g. if a rvalue-ref-taking func is called like fun(MyClass()) with a prvalue argument). So I guess this mental model is wrong.
So what would be the correct way to think about the 'identity' property of xvalues? I've read that with identity I can compare addresses but if I could compare addresses of 2 MyClass().members (xvalue according to the cppreference), let's say by passing them by rvalue refs into some comparison function, then I don't understand why I can't do the same with 2 MyClass()s (prvalue)?
One more source that's connected to this is the answer here:
What are move semantics?
Note that even though std::move(a) is an rvalue, its evaluation does not create a temporary object. This conundrum forced the committee to introduce a third value category. Something that can be bound to an rvalue reference, even though it is not an rvalue in the traditional sense, is called an xvalue (eXpiring value).
But this seems to have nothing to do with 'can compare addresses' and a) I don't see how this is different from the 'traditional sense' of the rvalue; b) I don't understand why such a reason would require a new value category in the language (well, OK, this allows to provide dynamic typing for objects in OO sense, but xvalues don't only refer to objects).
I personally have another mental model which doesn't deal directly with identity and memory and whatnot.
prvalue comes from "pure rvalue" while xvalue comes from "expiring value" and is this information I use in my mental model:
Pure rvalue refers to an object that is a temporary in the "pure sense": an expression for which the compiler can tell with absolute certainty that its evaluation is an object that is a temporary that has just been created and that is immediately expiring (unless we intervene to prolong it's lifetime by reference binding). The object was created during the evaluation of the expression and it will die according to the rules of the "mother expression".
By contrast, an expiring value is a expression that evaluates to a reference to an object that is promised to expire soon. That is it gives you a promise that you can do whatever you want to this object because it will be destroyed next anyway. But you don't know when this object was created, or when it is supposed to be destroyed. You just know that you "intercepted" it as it is just about to die.
In practice:
struct X;
auto foo() -> X;
X x = foo();
^~~~~
in this example evaluating foo() will result in a prvalue. Just by looking at this expression you know that this object was created as part of the return of foo and will be destroyed at the end of this full expression. Because you know all of these things you can prologue it's lifetime:
const X& rx = foo();
now the object returned by foo has it's lifetime prolongued to the lifetime of rx
auto bar() -> X&&
X x = bar();
^~~~
In this example evaluating bar() will result in a xvalue. bar promises you that is giving you an object that is about to expire, but you don't know when this object was created. It can be created way before the call to bar (as a temporary or not) and then bar gives you an rvalue reference to it. The advantage is you know you can do whatever you want with it because it won't be used afterwords (e.g. you can move from it). But you don't know when this object is supposed to be destroyed. As such you cannot extend it's lifetime - because you don't know what its original lifetime is in the first place:
const X& rx = bar();
this won't extend the lifetime.
When calling a func(T&& t) the caller is saying "there's a t here" and also "I don't care what you do to it". C++ does not specify the nature of "here".
On a platform where reference parameters are implemented as addresses, this means there must be an object present somewhere. On that platform identity == address. However this is not a requirement of the language, but of the platform calling convention.
A platform could implement references simply by arranging for the objects to be enregistered in a particular manner in both the caller and callee. Here an identity could be "register edi".
rvalue references: what exactly are "temporary" objects, what is their scope, and where are they stored?
Reading some articles, rvalues are always defined as "temporary" objects like Animal(), where Animal is a class, or some literal e.g. 10.
However, what is the formal definition of rvalues/"temporary" objects?
Is new Animal() also considered a "temporary" object? Or is it only values on the stack, like Animal() and literals stored in code?
Also, where are these "temporary" objects stored, what is their scope, and how long are rvalue references to these values valid?
Firstly it is important not to conflate the terms "rvalue" and "temporary object". They have very different meanings.
Temporary objects do not have a storage duration. Instead, they have lifetime rules that are specific to temporary objects. These can be found in section [class.temporary] of the C++ Standard; there is a summary on cppreference, which also includes a list of which expressions create temporary objects.
In practice I'd expect that a compiler would either optimize the object out, or store it in the same location as automatic objects are stored.
Note that "temporary object" only refers to objects of class type. The equivalent for built-in types are called values. (Not "temporary values"). In fact the term "values" includes both values of built-in type, and temporary objects.
A "value" is a completely separate idea to prvalue, xvalue, rvalue. The similarity in spelling is unfortunate.
Values don't have scope. Scope is a property of a name. In many cases the scope of a name coincides with the lifetime of the object or value it names, but not always.
The terms rvalue, lvalue etc. are value categories of an expression. These describe expressions, not values or objects.
Every expression has a value category. Also, every expression has a value, except expressions of void type. These are two different things. (The value of an expression has a non-reference type.)
An expression of value category rvalue may designate a temporary object, or a non-temporary object, or a value of built-in type.
The expressions which create a temporary object all have value category prvalue, however it is then possible to form expressions with category lvalue which designate that same temporary object. For example:
const std::string &v = std::string("hello");
In this case v is an lvalue, but it designates a temporary object. The lifetime of this temporary object matches the lifetime of v, as described in the earlier cppreference link.
Link to further reading about value categories
An rvalue reference is a reference that can only bind to an expression of value category rvalue. (This includes prvalue and xvalue). The word rvalue in its name refers to what it binds to, not its own value category.
All named references in fact have category lvalue. Once bound, there is no difference in behaviour between an rvalue reference and an lvalue reference.
std::string&& rref = std::string("hello");
rref has value category lvalue , and it designates a temporary object. This example is very similar to the previous one, except the temporary object is non-const this time.
Another example:
std::string s1("hello");
std::string&& rref = std::move(s1);
std::string& lref = s1;
In this case, rref is an lvalue, and it designates a non-temporary object. Further, lref and rref (and even s1!) are all indistinguishable from hereon in, except for specifically the result of decltype.
There are two different things to concern about. First of all, there's the language's point of view. Language specifications, such as the C++ standard(s), don't talk about things such as CPU registers, cache coherence, stacks (in the assembly sense), etc... Then, there's a real machine's point of view. Instruction set architectures (ISAs), such as the one(s) defined by Intel manuals, do concern about this stuff. This is, of course, because of portability and abstraction. There's no good reason for C++ to depend on x86-specific details, but a lot of bad ones. I mean, imagine if HelloWorld.cpp would only compile for your specific Core i7 model for no good reason at all! At the same time, you need CPU specific stuff sometimes. For instance, how would you issue a CLI instruction in a portable way? We have different languages because we need to solve different tasks, and we have different ISAs because we need different means to solve them. There's a good reason explaining why your smartphone doesn't use an Intel CPU, or why the Linux kernel is written in C and not, ahem... Brainfuck.
Now, from the language's point of view, a "rvalue" is a temporary value whose lifetime ends at the expression it is evaluated in.
In practice, rvalues are implemented the same way as automatic variables, that is, by storing their value on the stack, or a register if the compiler sees it fit. In the case of an automatic variable, the compiler can't store it in a register if its address is taken somewhere in the program, because registers have no "address". However, if its address is never taken, and no volatile stuff is involved, then the compiler's optimizer can place that variable into a register for optimization's sake. For rvalues, this is always the case, as you can't take a rvalue's address. From the language's point of view, they don't have one (Note: I'm using oldish C terminology here; see the comments for details, as there are way too many C++11 pitfalls to annotate here). This is necessary for some things to work properly. For instance, cdecl requires that small values be returned in the EAX register. Thus, all function calls must evaluate into a rvalue (consider references as pointers for simplicity's sake), because you can't take a register's address, as they don't have one!
There's also the concept of "lifetime". From the language's perspective, once some object's lifetime "ends", it ceases to be, period. When does it "begins" and "ends" depends on the object's allocation means:
For objects with dynamic storage, their lifetime sexplicitly start by means of new expressions and explicitly end by means of delete statements. This mechanism allows them to survive their original scope (e.g: return new int;).
For objects with automatic storage, their lifetimes start when their scope is reached in the program flow, and end when their scope is exited.
For objects with static storage, their lifetimes start before main() is called and end once main() exits.
For objects with thread-local storage, their lifetimes start when their respective thread starts, and end when their respective thread exits.
Construction and destruction are respectively involved in an object's lifetime "start" and "end".
From a real machine's point of view, bits are just bits, period. There are no "objects" but bits and bytes in memory cells and CPU registers. For things like an int, that is, a POD type, "ending its lifetime" translates into doing nothing at all. For non-trivially destructible non-POD types, a destructor must be called at the right moment. However, the memory/register that once contained the "object" is still there. It just happens that it can now be reused by something else.
Is new Animal() also considered a "temporary" object? Or is it only values on the stack, like Animal() and literals stored in code?
new Animal() allocates memory in the heap for an Animal, constructs it, and the whole expression evaluates into a Animal*. Such an expression is an rvalue itself, as you can't say something like &(new Animal()). However, the expression evaluates into a pointer, no? Such a pointer points to an lvalue, as you can say things such as &(*(new Animal())) (will leak, though). I mean, if there's a pointer containing its address, it has an address, no?
Also, where are these "temporary" objects stored, what is their scope, and how long are rvalue references to these values valid?
As explained above, a "temporary object"'s scope is that of the expression that encloses it. For example, in the expression a(b * c) (assuming a is a function taking a rvalue reference as its single argument), b * c is an rvalue whose scope ends after the expression enclosing it, that is, A(...), is evaluated. After that, all remaining rvalue references to it that the function a may have somehow created out of its parameter are dangling and will cause your program to do funny things. In order words, as long as you don't abuse std::move or do other voodoo with rvalues, rvalue references are valid in the circumstances that you'ld expect them to be.
The idea of move semantics is that you can grab everything from another temporary object (referenced by an rvalue reference) and store that "everything" in your object. That helps to avoid deep copying where single construction of things is enough -- so you construct things in a rvalue object and then just move it to your long living object.
Why is it that C++ doesn't allow binding lvalue objects to rvalue references? Both allow me to change the referenced object, so there is no difference to me in terms of accessing internals of referenced object.
The only reason I can guess is function overloading ambiguity issues.
But why C++ doesn't allow binding lvalue objects to rvalue references?
Assuming you mean "Why doesn't C++ allow binding rvalue references to lvalue objects": it does. It just isn't automatic, so you have to use std::move to make it explicit.
Why? Because otherwise an innocuous function call can surprisingly destroy something you didn't expect it to:
Class object(much,state,many,members,wow);
looks_safe_to_me(object);
// oh no, it destructively copied my object!
vs.
Class object(much,state,many,members,wow);
obviously_destructive(std::move(object));
// same result, but now the destruction is explicit and expected
A note on destructive copying: why I say destructively and destruction above, I don't mean the object destructor ends its lifetime: just that its internal state has been moved to a new instance. It's still a valid object, but no longer holds the same expensive state it used to.
A note on terminology: let's see if we can clear up the imprecise use of lvalue, rvalue etc. above.
Quoting from cppreference for posterity:
an lvalue is
an expression that has identity and cannot be moved from.
So, there's no such thing as an lvalue object, but there is an object which is locally named (or referred to) by an lvalue expression
an rvalue is
an expression that is either a prvalue or an xvalue. It can be moved from. It may or may not have identity.
a prvalue (pure rvalue) is roughly an expression referring to an un-named temporary object: we can't convert our lvalue expression to one of these IIUC.
an xvalue (expiring value) is
an expression that has identity and can be moved from.
which explicitly includes the result of std::move
So what actually happens:
an object exists
the object is identified locally by an lvalue expression, which cannot be moved from (to protect us from unexpected side-effects)
std::move yields an xvalue expression (which can be moved from) referring to the same object as the lvalue expression
this means objects such as variables (which are named by lvalue expressions) cannot be implicitly moved from, and must instead explicitly moved from via an explicit xvalue expression such as std::move.
anonymous temporaries are probably already referred to by prvalue expressions, and can be moved implicitly
Essentially, a mechanism is needed to distinguish between values that can be moved from, and those that cannot be moved from (i.e. a copy would be needed).
Allowing both rvalues and lvalues to be bound to an lvalue reference makes that impossible.
Hence, values bound to an rvalue reference can be moved from (not necessarily always going to be moved from, but it is allowed), and lvalues can be bound to lvalue references and can't be moved from.
std::move is there to allow for the casting between the value categories (to an rvalue) to allow the move to happen.
Note; const lvalue references (const T&) can be bound to both rvalues (temporaries) and lvalues since the referred to object can't change (it is marked as const so there can't be any moving from anyway).
There is some history (back to the early days of C++) to why temporary objects could not be bound to non-const lvalue references to begin with... the detail is blurry but it there was some reasoning that modifying a temporary didn't make sense, since it would destruct at the end of the current statement anyway. Additionally you could be lulled into the sense that you were modifying an lvalue, when you where in fact not - the semantics of the code could/would be wrong and be buggy. There are further reasons tied to addresses, literals etc. This was before moving and the semantics thereof solidified, and is some of the motivation for moving and its semantics.
I have come across some code that is intended to replace an object in-place without reallocation of memory:
static void move(void* const* src, void** dest) {
(*reinterpret_cast<T**>(dest))->~T();
**reinterpret_cast<T**>(dest) = **reinterpret_cast<T* const*>(src);
}
This looks like UB to me, since the object is destroyed and then assigned to without being constructed, i.e. it needs to either just copy-assign (the second line only) or explicitly destruct (the first line) followed by placement-new copy construction instead of the assignment.
I only ask because although this seems like a glaring bug to me, it has existed for some time in both boost::spirit::hold_any and the original cdiggins::any on which it is based. (I have asked about it on the Boost developers mailing list, but while awaiting responses wish to fix this locally if it is indeed incorrect.)
Assuming the reinterpret_casts are well-defined (that is, dest really is a pointer to pointer to T), the standard defines the end of an object's lifetime as:
The lifetime of an object of type T ends when:
if T is a class type with a non-trivial destructor (12.4), the destructor call starts, or
the storage which the object occupies is reused or released.
It then gives some restrictions over what can be done with the glvalue **reinterpret_cast<T**>(dest):
Similarly, [...] after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any glvalue that refers to the original object may be used but only in limited ways. [...] The program has undefined behavior if:
an lvalue-to-rvalue conversion (4.1) is applied to such a glvalue,
the glvalue is used to access a non-static data member or call a non-static member function of the object, or
the glvalue is implicitly converted (4.10) to a reference to a base class type, or
the glvalue is used as the operand of a static_cast (5.2.9) except when the conversion is ultimately to cv char& or cv unsigned char&, or
the glvalue is used as the operand of a dynamic_cast (5.2.7) or as the operand of typeid.
Emphasis added.
If the object doesn't end up in this after-life state because it has a trivial destructor, there is no problem. However, for any T which is a class type with non-trivial destructor, we know that the assignment operator is considered a member function operator= of that class. Calling a non-static member function of the object through this glvalue results in undefined behaviour.
This looks like UB to me, since the object is destroyed and then
assigned to without being constructed, i.e. it needs to either just
copy-assign (the second line only) or explicitly destruct (the first
line) followed by placement-new copy construction instead of the
assignment.
There is no need to fix anything, although this code is certainly not safe without further qualification (it's certain to be safe in the context where it's used though).
The object at dest is destroyed, and then the memory backing the object at src is copied over to where the object at dest used to live. End result: you have destroyed one object and placed a shallow clone of another object where the first one used to live.
If you only do the copy assignment the first object will not have been destructed, resulting in resource leaks.
Using placement new to populate the memory at dest would be an option, but it has very different semantics than the existing code (creates a brand new object instead of making a shallow clone of an existing one). Placement new and using the copy constructor also has different semantics: the object needs to have an accessible copy constructor, and you are no longer in control of what the result will be (the copy constructor does whatever it wants).