I've read that lvalues are "things with a defined storage location".
And also that literals and temporaries variables are not lvalues, but no reason is given for this statement.
Is it because literals and temporary variables do not have defined storage location? If yes, then where do they reside if not in memory?
I suppose there is some significance to "defined" in "defined storage location", if there is (or is not) please let me know.
And also that literals and temporaries variables are not lvalues, but no reason is given for this statement.
This is true for all temporaries and literals except for string literals. Those are actually lvalues (which is explained below).
Is it because literals and temporaries variables do not have defined storage location? If yes, then where do they reside if not in memory?
Yes. The literal 2 doesn't actually exist; it is just a value in the source code. Since it's a value, not an object, it doesn't have to have any memory associated to it. It can be hard coded into the assembly that the compiler creates, or it could be put somewhere, but since it doesn't have to be, all you can do is treat it as a pure value, not an object.
There is an exemption though and that is string literals. Those actually have storage since a string literal is an array of const char[N]. You can take the address of a string literal and a string literal can decay into a pointer, so it is an lvalue, even though it doesn't have a name.
Temporaries are also rvalues. Even if they exist as objects, their storage location is ephemeral. They only last until the end of the full expression they are in. You are not allowed to take their address and they also do not have a name. They might not even exist: for instance, in
Foo a = Foo();
The Foo() can be removed and the code semantically transformed to
Foo a(); // you can't actually do this since it declares a function with that signature.
so now there isn't even a temporary object in the optimized code.
Why are literals and temporary variables not lvalues?
I have two answers: because it wouldn't make sense (1) and because the Standard says so (2). Let's focus on (1).
Is it because literals and temporaries variables do not have defined storage location?
This is a simplification that doesn't fit here. A simplification that would: literals and temporary are not lvalues because it wouldn't make sense to modify them1.
What is the meaning of 5++? What is the meaning of rand() = 0? The Standard says that temporaries and literals are not lvalues so those examples are invalid. And every compiler developer is happier.
1) You can define and use user-defined types in a way where the modification of a temporary makes sense. This temporary would live until the evaluation of the full-expression. François Andrieux makes a nice analogy between calling f(MyType{}.mutate()) on one hand and f(my_int + 1) on the other. I think the simplification holds still as MyType{}.mutate() can be seen as another temporary as MyType{} was, like my_int + 1 can be seen as another int as my_int was. This is all semantics and opinion-based. The real answer is: (2) because the Standard says so.
There are a lot of common misconceptions in the question and in the other answers; my answer hopes to address that.
The terms lvalue and rvalue are expression categories. They are terms that apply to expressions. Not to objects. (A bit confusingly, the official term for expression categories is "value categories" ! )
The term temporary object refers to objects. This includes objects of class type, as well as objects of built-in type. The term temporary (used as a noun) is short for temporary object. Sometimes the standalone term value is used to refer to a temporary object of built-in type. These terms apply to objects, not to expressions.
The C++17 standard is more consistent in object terminology than past standards, e.g. see [conv.rval]/1. It now tries to avoid saying value other than in the context value of an expression.
Now, why are there different expression categories? A C++ program is made up of a collection of expressions, joined to each other with operators to make larger expressions; and fitting within a framework of declarative constructs. These expressions create, destroy, and do other manipulations on objects. Programming in C++ could be described as using expressions to perform operations with objects.
The reason that expression categories exist is to provide a framework for using expressions to express operations that the programmer intends. For example way back in the C days (and probably earlier), the language designers figured that 3 = 5; did not make any sense as part of a program so it was decided to limit what sort of expression can appear on the left-hand side of =, and have the compiler report an error if this restriction wasn't followed.
The term lvalue originated in those days, although now with the development of C++ there are a vast range of expressions and contexts where expression categories are useful, not just the left-hand side of an assignment operator.
Here is some valid C++ code: std::string("3") = std::string("5");. This is conceptually no different from 3 = 5;, however it is allowed. The effect is that a temporary object of type std::string and content "3" is created, and then that temporary object is modified to have content "5", and then the temporary object is destroyed. The language could have been designed so that the code 3 = 5; specifies a similar series of events (but it wasn't).
Why is the string example legal but the int example not?
Every expression has to have a category. The category of an expression might not seem to have an obvious reason at first, but the designers of the language have given each expression a category according to what they think is a useful concept to express and what isn't.
It's been decided that the sequence of events in 3 = 5; as described above is not something anyone would want to do, and if someone did write such a thing then they probably made a mistake and meant something else, so the compiler should help out by giving an error message.
Now, the same logic might conclude that std::string("3") = std::string("5") is not something anyone would ever want to do either. However another argument is that for some other class type, T(foo) = x; might actually be a worthwhile operation, e.g. because T might have a destructor that does something. It was decided that banning this usage could be more harmful to a programmer's intentions than good. (Whether that was a good decision or not is debatable; see this question for discussion).
Now we are getting closer to finally address your question :)
Whether or not there is memory or a storage location associated is not the rationale for expression categories any more. In the abstract machine (more explanation of this below), every temporary object (this includes the one created by 3 in x = 3;) exists in memory.
As described earlier in my answer, a program consists of expressions that manipulate objects. Each expression is said to designate or refer to an object.
It's very common for other answers or articles on this topic to make the incorrect claim that an rvalue can only designate a temporary object, or even worse , that an rvalue is a temporary object , or that a temporary object is an rvalue. An expression is not an object, it is something that occurs in source code for manipulating objects!
In fact a temporary object can be designated by an lvalue or an rvalue expression; and a non-temporary object can be designated by an lvalue or an rvalue expression. They are separate concepts.
Now, there's an expression category rule that you can't apply & to an expression of the rvalue category. The purpose of this rule and these categories is to avoid errors where a temporary object is used after it is destroyed. For example:
int *p = &5; // not allowed due to category rules
*p = 6; // oops, dangling pointer
But you could get around this:
template<typename T> auto f(T&&t) -> T& { return t; }
// ...
int *p = f(5); // Allowed
*p = 6; // Oops, dangling pointer, no compiler error message.
In this latter code, f(5) and *p are both lvalues that designate a temporary object. This is a good example of why the expression category rules exist; by following the rules without a tricky workaround, then we would get an error for the code that tries to write through a dangling pointer.
Note that you can also use this f to find the memory address of a temporary object, e.g. std::cout << &f(5);
In summary, the questions you actually ask all mistakenly conflate expressions with objects. So they are non-questions in that sense. Temporaries are not lvalues, because objects are not expressions.
A valid but related question would be: "Why is the expression that creates a temporary object an rvalue (as opposed to being an lvalue?)"
To which the answer is as was discussed above: having it be an lvalue would increase the risk of creating dangling pointers or dangling references; and as in 3 = 5;, would increase the risk of specifying redundant operations that the programmer probably didn't intend.
I repeat again that the expression categories are a design decision to help with programmer expressiveness; not anything to do with memory or storage locations.
Finally, to the abstract machine and the as-if rule. C++ is defined in terms of an abstract machine, in which temporary objects have storage and addresses too. I gave an example earlier of how to print the address of a temporary object.
The as-if rule says that the output of the actual executable the compiler produces must only match the output that the abstract machine would. The executable doesn't actually have to work in the same way as the abstract machine, it just has to produce the same result.
So for code like x = 5; , even though a temporary object of value 5 has a memory location in the abstract machine; the compiler doesn't have to allocate physical storage on the real machine. It only has to ensure that x ends up having 5 stored in it and there are much easier ways to do this that don't involve extra storage being created.
The as-if rule applies to everything in the program, even though my example here only refers to temporary objects. A non-temporary object could equally well be optimized out, e.g. int x; int y = 5; x = y; // other code that doesn't use y could be changed to int x = 5;.
The same applies for class types without side-effects that would alter the program output. E.g. std::string x = "foo"; std::cout << x; can be optimized to std::cout << "foo"; even though the lvalue x denoted an object with storage in the abstract machine.
lvalue stands for locator value and represents an object that occupies some identifiable location in memory.
The term locator value is also used here:
C
The C programming language followed a similar taxonomy, except that
the role of assignment was no longer significant: C expressions are
categorized between "lvalue expressions" and others (functions and
non-object values), where "lvalue" means an expression that identifies
an object, a "locator value"[4].
Everything that is not an lvalue is by exclusion an rvalue. Every expression is either an lavalue or rvalue.
Originally lvalue term was used in C to indicate values that can stay on the left side of assignment operator. However with the const keywork this changed. Not all lvalues can be assigned to. Those that can are called modifiable lvalues.
And also that literals and temporaries variables are not lvalues, but
no reason is given for this statement.
According to this answer literals can be lvalues in some cases.
literals of scalar types are rvalue because they are of known size and are very likely to be embedded directly into the machine commands on the given hardware architecture. What would be the memory location of 5?
On the contrary, strangely enough, string literals are lvalues since they have unpredictable size and there is no other way to represent them apart from as objects in memory.
An lvalue can be converted to an rvalue. For example in the following instructions
int a =5;
int b = 3;
int c = a+b;
the operator + takes two rvalues. So a and b are converted to rvalues before getting summed. Another example of conversion:
int c = 6;
&c = 4; //ERROR: &c is an rvalue
On the contrary you cannot convert an rvalue to an lvalue.
However you can produce a valid lvalue from an rvalue for example:
int arr[] = {1, 2};
int* p = &arr[0];
*(p + 1) = 10; // OK: p + 1 is an rvalue, but *(p + 1) is an lvalue
In C++11 rvalues reference are related to the move constructor and move assignment operator.
You can find more details in this clear and well-explained post.
Where do they reside if not in memory?
Of course they reside in memory*, there's no way around it. The question is, can your program determine where exactly in memory do they reside. In other words, is your program allowed to take the address of the thing in question.
In a simple example a = 5 the value of five, or an instruction representing an assignment of the value of five, is somewhere in memory. However, you cannot take the address of five, because int *p = &5 is illegal.
Note that string literals are an exception from the "not an lvalue" rule, because const char *p = "hello" produces an address of a string literal.
* However, it may not necessarily be data memory. In fact, they may not even be represented as a constant in the program memory: for example, an assignment short a; a = 0xFF00 could be represented as a an assignment of 0xFF in the upper octet, and clearing out the lower octet in memory.
Related
Im trying to understand what exactly Rvalue References are.
Everywhere I look they have examples to something called Perfect forwarding which sounds too complicated.
I just want a clear basic example of what Rvalue References are and see if they get used outside of the topics like Perfect forwarding.
I have read:
An lvalue is an expression that refers to a memory location and allows us to take the address of that memory location via the & operator. An rvalue is an expression that is not an lvalue
So is it correct to say:
An Rvalue Reference is a Reference to something that is not in the memory? Maybe it's in a register?
For example:
#include <iostream>
int main()
{
int&& a = 50;
int b = 60;
return 0;
}
50 is technically not in memory yet but a is a reference to it? is b any different than a?
Where else (which is easily explainable) can this be used?
First, whether or not something resides in the memory or in a register is not defined by the standard. You don't need to think about it. For you, all objects work as if they were in memory. Under the as-if rule, your compiler can put some of them into the registers, as long as it doesn't affect how your program works.
Yes, you can obtain addresses of lvalues (except bitfields) and not rvalues, but that's a language design choice rather than a technical limitation (most of the time it would make no sense, so it's not allowed). Value categories (lvalues vs rvalues) have nothing to do with how things are stored in memory.
Im trying to understand what exactly Rvalue References are
Rvalue references are similar to lvalue references. The primary difference is what you can initialize them with. Rvalue references must be initialized with rvalues, and lvalue references must be initialized with lvalues (as an exception, const lvalue references are allowed to bind to rvalues).
int&& a = 50; works because 50 is an int and a temporary object. (Usually "a temporary object" is a synonym for "an rvalue", but not always.)
This case is a bit tricky, because temporaries as normally destroyed at the end of full-expression (roughly, when the line of code they were created in finishes executing). But here, the lifetime of 50 is extended because it's bound to a reference.
is b any different than a?
int&& a = 50;
int b = 60;
Because of the lifetime extension - no, they're equivalent. Maybe there's some obscure difference between them, but I can't think of any.
rvalue references: what exactly are "temporary" objects, what is their scope, and where are they stored?
Reading some articles, rvalues are always defined as "temporary" objects like Animal(), where Animal is a class, or some literal e.g. 10.
However, what is the formal definition of rvalues/"temporary" objects?
Is new Animal() also considered a "temporary" object? Or is it only values on the stack, like Animal() and literals stored in code?
Also, where are these "temporary" objects stored, what is their scope, and how long are rvalue references to these values valid?
Firstly it is important not to conflate the terms "rvalue" and "temporary object". They have very different meanings.
Temporary objects do not have a storage duration. Instead, they have lifetime rules that are specific to temporary objects. These can be found in section [class.temporary] of the C++ Standard; there is a summary on cppreference, which also includes a list of which expressions create temporary objects.
In practice I'd expect that a compiler would either optimize the object out, or store it in the same location as automatic objects are stored.
Note that "temporary object" only refers to objects of class type. The equivalent for built-in types are called values. (Not "temporary values"). In fact the term "values" includes both values of built-in type, and temporary objects.
A "value" is a completely separate idea to prvalue, xvalue, rvalue. The similarity in spelling is unfortunate.
Values don't have scope. Scope is a property of a name. In many cases the scope of a name coincides with the lifetime of the object or value it names, but not always.
The terms rvalue, lvalue etc. are value categories of an expression. These describe expressions, not values or objects.
Every expression has a value category. Also, every expression has a value, except expressions of void type. These are two different things. (The value of an expression has a non-reference type.)
An expression of value category rvalue may designate a temporary object, or a non-temporary object, or a value of built-in type.
The expressions which create a temporary object all have value category prvalue, however it is then possible to form expressions with category lvalue which designate that same temporary object. For example:
const std::string &v = std::string("hello");
In this case v is an lvalue, but it designates a temporary object. The lifetime of this temporary object matches the lifetime of v, as described in the earlier cppreference link.
Link to further reading about value categories
An rvalue reference is a reference that can only bind to an expression of value category rvalue. (This includes prvalue and xvalue). The word rvalue in its name refers to what it binds to, not its own value category.
All named references in fact have category lvalue. Once bound, there is no difference in behaviour between an rvalue reference and an lvalue reference.
std::string&& rref = std::string("hello");
rref has value category lvalue , and it designates a temporary object. This example is very similar to the previous one, except the temporary object is non-const this time.
Another example:
std::string s1("hello");
std::string&& rref = std::move(s1);
std::string& lref = s1;
In this case, rref is an lvalue, and it designates a non-temporary object. Further, lref and rref (and even s1!) are all indistinguishable from hereon in, except for specifically the result of decltype.
There are two different things to concern about. First of all, there's the language's point of view. Language specifications, such as the C++ standard(s), don't talk about things such as CPU registers, cache coherence, stacks (in the assembly sense), etc... Then, there's a real machine's point of view. Instruction set architectures (ISAs), such as the one(s) defined by Intel manuals, do concern about this stuff. This is, of course, because of portability and abstraction. There's no good reason for C++ to depend on x86-specific details, but a lot of bad ones. I mean, imagine if HelloWorld.cpp would only compile for your specific Core i7 model for no good reason at all! At the same time, you need CPU specific stuff sometimes. For instance, how would you issue a CLI instruction in a portable way? We have different languages because we need to solve different tasks, and we have different ISAs because we need different means to solve them. There's a good reason explaining why your smartphone doesn't use an Intel CPU, or why the Linux kernel is written in C and not, ahem... Brainfuck.
Now, from the language's point of view, a "rvalue" is a temporary value whose lifetime ends at the expression it is evaluated in.
In practice, rvalues are implemented the same way as automatic variables, that is, by storing their value on the stack, or a register if the compiler sees it fit. In the case of an automatic variable, the compiler can't store it in a register if its address is taken somewhere in the program, because registers have no "address". However, if its address is never taken, and no volatile stuff is involved, then the compiler's optimizer can place that variable into a register for optimization's sake. For rvalues, this is always the case, as you can't take a rvalue's address. From the language's point of view, they don't have one (Note: I'm using oldish C terminology here; see the comments for details, as there are way too many C++11 pitfalls to annotate here). This is necessary for some things to work properly. For instance, cdecl requires that small values be returned in the EAX register. Thus, all function calls must evaluate into a rvalue (consider references as pointers for simplicity's sake), because you can't take a register's address, as they don't have one!
There's also the concept of "lifetime". From the language's perspective, once some object's lifetime "ends", it ceases to be, period. When does it "begins" and "ends" depends on the object's allocation means:
For objects with dynamic storage, their lifetime sexplicitly start by means of new expressions and explicitly end by means of delete statements. This mechanism allows them to survive their original scope (e.g: return new int;).
For objects with automatic storage, their lifetimes start when their scope is reached in the program flow, and end when their scope is exited.
For objects with static storage, their lifetimes start before main() is called and end once main() exits.
For objects with thread-local storage, their lifetimes start when their respective thread starts, and end when their respective thread exits.
Construction and destruction are respectively involved in an object's lifetime "start" and "end".
From a real machine's point of view, bits are just bits, period. There are no "objects" but bits and bytes in memory cells and CPU registers. For things like an int, that is, a POD type, "ending its lifetime" translates into doing nothing at all. For non-trivially destructible non-POD types, a destructor must be called at the right moment. However, the memory/register that once contained the "object" is still there. It just happens that it can now be reused by something else.
Is new Animal() also considered a "temporary" object? Or is it only values on the stack, like Animal() and literals stored in code?
new Animal() allocates memory in the heap for an Animal, constructs it, and the whole expression evaluates into a Animal*. Such an expression is an rvalue itself, as you can't say something like &(new Animal()). However, the expression evaluates into a pointer, no? Such a pointer points to an lvalue, as you can say things such as &(*(new Animal())) (will leak, though). I mean, if there's a pointer containing its address, it has an address, no?
Also, where are these "temporary" objects stored, what is their scope, and how long are rvalue references to these values valid?
As explained above, a "temporary object"'s scope is that of the expression that encloses it. For example, in the expression a(b * c) (assuming a is a function taking a rvalue reference as its single argument), b * c is an rvalue whose scope ends after the expression enclosing it, that is, A(...), is evaluated. After that, all remaining rvalue references to it that the function a may have somehow created out of its parameter are dangling and will cause your program to do funny things. In order words, as long as you don't abuse std::move or do other voodoo with rvalues, rvalue references are valid in the circumstances that you'ld expect them to be.
I was reading Thomas Becker's article on rvalue reference and their use. In there he defines what he calls if-it-has-a-name rule:
Things that are declared as rvalue reference can be lvalues or
rvalues. The distinguishing criterion is: if it has a name, then it is
an lvalue. Otherwise, it is an rvalue.
This sounds very reasonable to me. It also clearly identifies the rvalueness of an rvalue reference.
My questions are:
Do you agree with this rule? If not, can you give an example where this rule can be violated?
If there are no violations of this rule. Can we use this rule to define rvalueness/lvaluness of an expression?
This is one of the most common "rules of thumb" used to explain what is the difference between lvalues and rvalues.
The situation in C++ is much more complex than that so this can't be nothing but a rule of thumb. I'll try to resume a couple of concepts and try to make it clear why this issue is so complex in the C++ world. First let's recap a bit what happened once upon a time
At the beginning there was C
First, what "lvalue" and "rvalue" used to mean originally, in the world of programming languages in general?
In a simpler language like C or Pascal, the terms used to refer to what could be placed at the Left or at the Right of an assignment operator.
In a language like Pascal where the assignment is not an expression but only a statement, the difference is pretty clear and it's defined in grammatical terms. An lvalue is a name of a variable, or a subscript of an array.
That's because only these two things could stand at the left of an assignment:
i := 42; (* ok *)
a[i] := 42; (* ok *)
42 := 42; (* no sense *)
In C, the same difference applies, and it is still pretty much grammatical in the sense that you could look at a line of code and tell if an expression would produce an lvalue or an rvalue.
i = 42; // ok, a variable
*p = 42; // ok, a pointer dereference
a[i] = 42; // ok, a subscript (which is a pointer dereference anyway)
s->var = 42; // ok, a struct member access
So what changed in C++?
Little languages grow up
In C++ things become much more complex and the difference is not grammatical anymore but involves the type checking process, for two reasons:
Everything could stay at the left of an assignment, as long as its type has a suitable overload of operator=
References
So this means that in C++ you can't say if an expression will produce an lvalue only by looking at its grammatical structure. For example:
f() = g();
is a statement that would have no sense in C but can be perfectly legal in C++ if, for example, f() returns a reference. That's how expressions like v[i] = j work for std::vector: the operator[] returns a reference to the element so you can assign to it.
So what's the point of having a distinction between lvalues and rvalues anymore? The distinction is still relevant for basic types of course, but also to decide what can be bound to a non-const reference.
That's because you don't want to have legal code like:
int &x = 42;
x = 0; // Have we changed the meaning of a natural number??
So the language specifies carefully what is an lvalue and what isn't, and then says that only lvalues can be bound to non-const references. So the above code is not legal because an integer literal is not an lvalue so a non-const reference cannot be bound to it.
Note that const references are different, since they can bind to literals and temporaries (and local references even extend the lifetime of those temporaries):
int const&x = 42; // It's ok
And until now we've only touched what already used to happen in C++98. The rules were already more complex than "if it has a name it's an lvalue", since you have to consider the references. So an expression returning a non-const reference is still considered an lvalue.
Also, other rules of thumb mentioned here already don't work in all cases. For example "if you can take it's address, it's an lvalue". If by "taking the address" you mean "applying operator&", then it might work, but don't trick yourself into thinking that you can't ever come to have the address of a temporary: The this pointer inside a temporary's member function, for example, will point to it.
What changed in C++11
C++11 puts more complexity into the bin by adding the concept of an rvalue reference, that is, a reference that can be bound to an rvalue even if non-const. The fact that it can only be applied to an rvalue make it both safe and useful. I don't think its needed to explain why rvalue reference are useful, so move on.
The point here is that now we have a lot more of cases to consider. So what is an rvalue now? The Standard actually distinguish between different kinds of rvalues to be able to correctly state the behavior of rvalue references and overload resolution and template argument deduction in the presence of rvalue references. So we have terms like xvalue, prvalue and things like that, which make things more complex.
What about our rules of thumb?
So "everything that has a name is an lvalue" can still be true, but for sure it isn't true that every lvalue has a name. A function returning a non-const lvalue reference is an lvalue. A function returning something by value creates a temporary and it is an rvalue, so is a function returning an rvalue reference.
What about "temporaries are rvalues". It's true, but also non-temporaries can be made into rvalues by simply casting the type (as does std::move).
So I think that all these rules are useful if we keep in mind what they are: rules of thumb.
They'll always have some corner case where they don't apply, because to exactly specify what an rvalue is and what isn't, we can't avoid using the exact terms and rules used in the standard. That's why they were written for!
While the rule covers a majority of case, I can't agree with it in general:
The dereferencing of an anonymous pointer does not have a name, yet it is an lvalue:
foo(*new X); // Not allowed if foo expects an rvalue reference (example of the article)
Based on the standard, and taking into account the special cases of temporary objects being rvalues, I'd suggest to update the second sentence of the rule :
" ... The criterion is: if it designates a function or an object
which is not of temporary nature, then it's an lvalue. ... ".
Question 1: That rule is strictly referring to classifying expressions of rvalue reference type, not expressions in general. I almost agree with it in this context ('almost' because there's a bit more to it, see the quote below). The precise wording is in a note in the Standard [Clause 5 paragraph 7]:
In general, the effect of this rule is that named rvalue references
are treated as lvalues and unnamed rvalue references to objects are
treated as xvalues; rvalue references to functions are treated as
lvalues whether named or not.
(emphases mine, for obvious reasons)
Question 2: As you can see from the other answers and comments (some nice examples in there), there are issues with general, concise statements about the value category of an expression. Here's the way I think about it.
We need to look at the problem from the other side: instead of trying to specify what expressions are lvalues, list the kinds that are rvalues; lvalues are everything else.
First, a couple of definitions to keep things clear:
An object means a region of storage for data, not a function and not a reference (it's the definition in the Standard).
When I say an expression generates something, I mean it doesn't just name it or refer to it, but actually constructs and returns it as the result of a combination of operators, function calls (possibly constructor calls) or casts (possibly implicit casts).
Now, based primarily on [3.10] (but also quite a few other places in the Standard), an expression is an rvalue if and only if it is one of the following:
a value that is not associated with an object (like this, or literals like 7, not string ones);
an expression that generates an object by value, a.k.a. a temporary object;
an expression that generates an rvalue reference to an object;
recursively, one of the following expressions using an rvalue:
x.y, where x is an rvalue and y is a non-static member object;
x.*y, where x is an rvalue and y is a pointer to a member object;
x[y], where either x or y is an rvalue of array type (using the built-in [] operator).
That's it.
Well, technically, the following special cases are also rvalues, but I don't think they're relevant in practice:
a function call returning void, a cast to void, or a throw (obviously not lvalues, I'm not sure why I'd ever be interested in their value category in practice);
one of obj.mf, ptr->mf, obj.*pmf, or ptr->*pmf (mf is a non-static member function, pmf is a pointer to member function); here we're talking strictly about these forms, not the function call expressions that can be built with them, and you really can't do anything with these but make a function call, which is a different expression altogether (to which we need to apply the rules above).
And that's really it. Everything else is an lvalue. I find it easy enough to reason about expressions this way, as all categories above are easily recognizable. For example, it's easy to look at an expression, rule out the cases above, and decide it's an lvalue. Even for category 4, which has a longer description, the expressions are easily recognizable (I tried hard to make it a one-liner, but ultimately failed).
Expressions involving operators can be lvalues or rvalues depending on the exact operator being used. Built-in operators specify what happens in each case, but user-defined operator functions can change the rules. When determining the value category of an expression, both the structure of the expression and the types involved matter.
Notes:
Regarding category 1:
this in the example refers to this the pointer value, not *this.
String literals are lvalues because they're arrays of static storage duration, so they don't fit in category 1 (they're associated with objects).
Some examples related to categories 2 and 3:
Given the declaration int& f(int), the expression f(7) doesn't generate an object by value, so it doesn't fit in category 2; it does generate a reference, but it's not an rvalue reference, so category 3 doesn't apply either; the expression is an lvalue.
Given the declaration int&& f(int), the expression f(7) generates an rvalue reference; category 3 applies here, so the expression is an rvalue.
Given the declaration int f(int), the expression f(7) generates an object by value; category 2 applies here, the expression is an rvalue.
For casts, we can apply the same reasoning as for the three bullets above.
Given the declaration int&& a, using the expression a doesn't generate an rvalue reference; it just uses an identifier of reference type. Category 3 doesn't apply, the expression is an lvalue.
Lambda expressions generate closure objects by value - they are in category 2.
Some examples related to category 4:
x->y is translated to (*x).y. *x is an lvalue (it doesn't fit in any of the categories above). So, if y is a non-static member object, x->y is an lvalue (it doesn't fit in category 4 because of *x and it doesn't fit in 6 because that one only talks about member functions).
In x.y, if y is a static member, then category 4 doesn't apply. Such an expression is always an lvalue, even if x is an rvalue (6 doesn't apply either, because it talks about non-static member functions).
In x.y, if y is of type T& or T&&, then it's not a member object (remember, objects, not references, not functions), so category 4 doesn't apply. Such an expression is always an lvalue, even if x is an rvalue and even if y is an rvalue reference.
Category 4 used to be a bit different in C++11, but I believe this wording is correct for C++14. (If you insist to know, the result of subscripting into an rvalue array used to be an lvalue in C++11, but is an xvalue in C++14 - issue 1213.)
Further separating rvalues into xvalues and prvalues is relatively straightforward for C++14: categories 1, 2, 5 and 6 are prvalues, 3 and 4 are xvalues. Things were slightly different for C++11: category 4 was split between prvalues, xvalues and lvalues (changed as noted above, and also as part of the resolution of issue 616). This can be important, as it can affect the type you get back from decltype, for example.
All references are to N4140, the last C++14 draft before publication.
I first found the last two special rvalue cases here (everything's also in the Standard, of course, but harder to find). Note that not everything on that page is accurate for C++14. It also contains a very nice summary on the rationale behind the primary value categories (at the top).
Why don't rvalues have a memory address? Are they not loaded into the RAM when the program executes or does it refer to the values stored in processor registers?
Your question ("Why don't rvalues have a memory address?") is a bit confused. An rvalue is a kind of expression. Expressions don't have addresses: objects have addresses. It would be more correct to ask "why can one not apply the address-of operator to an rvalue expression?"
The answer to that is rather simple: you can only take the address of an object and not all rvalue expressions refer to objects (for example, the expression 42 has a value but does not refer to an object).
Some rvalue expressions do refer to objects, but such objects lack persistence. An object referred to by an rvalue expression is a temporary object and is destroyed at the end of the expression in which it is created. Such objects do indeed have addresses (you can easily discover this by calling a member function on a temporary object; the this pointer must point to the temporary object and thus the temporary object must have an address).
This is the fundamental difference between lvalue expressions and rvalue expressions. Lvalue expressions refer to objects that have persistence: the object to which an lvalue expression refers persists beyond a single expression.
Think of rvalue as value of an expression. The value itself doesn't have address. But the objects involve in the expression do have address. You can take address of an object, even be it a temporary object.
Consider this,
const int & i = 10; //ok
Here, 10 is an rvalue, so it appears that &i is an address of the 10. No, that is wrong. &i is an address of the temporary object of type int, which is created out of the expression 10. And since the temporary object cannot be bound to non-const reference, I use const. That means, the following is an error:
int & i = 10; //error
The question mixes two different aspects related to the "specification" and to the "implementation".
The "specification" define some abstract rules that defines how the language behave respect to an abstract machine it works with.
Adapt that "abstract machine" to the "real one" under it its a purpose of the compiler (not the language).
What the specification is enforcing is that -by the language stand point- a "storage" (a piece of memory with a proper address) is given only to objects that have a name (for the existence of the scope that name lives) or that are dynamically allocated with an explicit request (new).
Everything else is "temporary": assign, copy and moves like an object, but is not required to exist in a well defined and stable place. At least, not for the purpose of the language.
Of course, that has to stay (physically) somewhere, so you can -with appropriate casting or conversion- trying to guess a memory address. But the language specification does not grant any consistent behavior if you try to actively use it.
That means that different compilers can behave differently, and optimize the better they can respect to the real machine they target.
What do you mean, rvalues do have an address. Ever tried
Type const value& = rvalue;
Type const* address = &value;
Simply take this case
int a = 1 + 2;
1+2 gets resolved to 3.
Ask yourself:
Is 3 an object?
Where would 3 be located in memory?
When you need the address of an object, you use &.
If rvalues would be addressable, it means you could declare a pointer to where your computer decided to store 3
int* a = &3;
Does that seem right? :)
I know that the code written below is illegal
void doSomething(std::string *s){}
int main()
{
doSomething(&std::string("Hello World"));
return 0;
}
The reason is that we are not allowed to take the address of a temporary object. But my question is WHY?
Let us consider the following code
class empty{};
int main()
{
empty x = empty(); //most compilers would elide the temporary
return 0;
}
The accepted answer here mentions
"usually the compiler consider the temporary and the copy constructed as two objects that are located in the exact same location of memory and avoid the copy."
According to the statement it can be concluded that the temporary was present in some memory location( hence its address could have been taken) and the compiler decided to eliminate the temporary by creating an in-place object at the same location where the temporary was present.
Does this contradict the fact that the address of a temporary cannot be taken?
I would also like to know how is return value optimization implemented. Can someone provide a link or an article related to RVO implementation?
&std::string("Hello World")
The problem with this isn't that std::string("Hello World") yields a temporary object. The problem is that the expression std::string("Hello World") is an rvalue expression that refers to a temporary object.
You cannot take the address of an rvalue because not all rvalues have addresses (and not all rvalues are objects). Consider the following:
42
This is an integer literal, which is a primary expression and an rvalue. It is not an object, and it (likely) does not have an address. &42 is nonsensical.
Yes, an rvalue may refer to an object, as is the case in your first example. The problem is that not all rvalues refer to objects.
Long answer:
[...] it can be concluded that the temporary was present in some memory location
By definition:
"temporary" stands for: temporary object
an object occupies a region of storage
all objects have an address
So it doesn't take a very elaborate proof to show that a temporary has an address. This is by definition.
OTOH, you are not just fetching the address, you are using the builtin address-of operator. The specification of the builtin address-of operator says that you must have a lvalue:
&std::string() is ill-formed because std::string() is a rvalue. At runtime, this evaluation of this expression creates a temporary object as a side-effect, and the expression yield a rvalue that refers to the object created.
&(std::string() = "Hello World") is well-formed because std::string() = "Hello World" is a lvalue. By definition, a lvalue refers to an object. The object this lvalue refers to is the exact same temporary
Short answer:
This is the rule. It doesn't need the (incorrect, unsound) justifications some people are making up.
$5.3.1/2 - "The result of the unary & operator is a pointer to its operand. The operand shall be an lvalue or a qualifiedid."
Expressions such as
99
A() // where A is a user defined class with an accessible
// and unambiguous default constructor
are all Rvalues.
$3.10/2 - "An lvalue refers to an
object or function. Some rvalue
expressions—those of class or
cv-qualified class type—also refer to
objects.47)"
And this is my guess: Even though Rvalues may occupy storage (e.g in case of objects), C++ standard does not allow taking their address to maintain uniformity with the built-in types
Here's something interesting though:
void f(const double &dbl){
cout << &dbl;
}
int main(){
f(42);
}
The expression '42' is an Rvalue which is bound to the 'reference to const double' and hence it creates a temporary object of type double. The address of this temporary can be taken inside the function 'f'. But note that inside 'f' this is not really a temporary or a Rvalue. The moment it is given a name such as 'dbl', it is treated as an Lvalue expression inside 'f'.
Here's something on NRVO (similar)
A temporary is an example of a C++ "rvalue." It is supposed to purely represent a value within its type. For example, if you write 42 in two different places in your program, the instances of 42 are indistinguishable despite probably being in different locations at different times. The reason you can't take the address is that you need to do something to specify that there should be an address, because otherwise the concept of an address is semantically unclean and unintuitive.
The language requirement that you "do something" is somewhat arbitrary, but it makes C++ programs cleaner. It would suck if people made a habit of taking addresses of temporaries. The notion of an address is intimately bound with the notion of a lifetime, so it makes sense to make "instantaneous" values lack addresses. Still, if you are careful, you can acquire an address and use it within the lifetime that the standard does allow.
There are some fallacies in the other answers here:
"You cannot take the address of an rvalue because not all rvalues have addresses." — Not all lvalues have addresses either. A typical local variable of type int which participates in a simple loop and is subsequently unused will likely be assigned a register but no stack location. No memory location means no address. The compiler will assign it a memory location if you take its address, though. The same is true of rvalues, which may be bound to const references. The "address of 42" may be acquired as such:
int const *fortytwo_p = & static_cast<int const &>( 42 );
Of course, the address is invalid after the ; because temporaries are temporary, and this is likely to generate extra instructions as the machine may pointlessly store 42 onto the stack.
It's worth mentioning that C++0x cleans up the concepts by defining the prvalue to be the value of the expression, independent of storage, and the glvalue to be the storage location independent of its contents. This was probably the intent of the C++03 standard in the first place.
"Then you could modify the temporary, which is pointless." — Actually temporaries with side effects are useful to modify. Consider this:
if ( istringstream( "42" ) >> my_int )
This is a nice idiom for converting a number and checking that the conversion succeeded. It involves creating a temporary, calling a mutating function on it, and then destroying it. Far from pointless.
It can be taken, but once the temporary ceases to exist, you have a dangling pointer left.
EDIT
For the downvoters:
const std::string &s = std::string("h");
&s;
is legal. s is a reference to a temporary. Hence, a temporary object's address can be taken.
EDIT2
Bound references are aliases to what they are bound to. Hence, a reference to a temporary is another name for that temporary. Hence, the second statement in the paragraph above holds.
OP's question is about temporaries (in terms of the words he uses), and his example is about rvalues. These are two distinct concepts.
One reason is that your example would give the method write access to the temporary, which is pointless.
The citation you provided isn't about this situation, it is a specific optimization that is permitted in declarators with initializers.
Why is taking the address of a temporary illegal?
The scope of temporary variables are limited to some particular method or some block, as soon as the method call returns the temporary variables are removed from the memory, so if we return the address of a variable which no longer exists in the memory it does not make sense. Still the address is valid but that address may now contain some garbage value.