I have been trying to learn the associativity of operators in C++ and I have come across a code segment :
int a = 10;
int C = a++ + ++a + ++a +a;
I have also studied that ++a is right to left associative and a++ is left to right associative. Also + is left to right associative. But I don't understand how to apply this knowledge in this problem.
I am confused that how this statement will be parsed by my compiler?
I am also puzzled that since putting spaces don't matter much why does removing spaces like :
int C = a+++++a+++a+a; //error: lvalue required as increment operand
generate an error?
Please help me understand this concept.
Thanks!
First of all space does matter- It helps compiler to resolve ambiguity.
Whenever there is an expression, compiler parse it from right to left. It looks for all the post increment operators first and then pre increment operators as later has lower precedence than the former. So any modification done by pre-increment operator will be applied to the whole expression and then changes of post-increment will be applied in the next expression.
Explanation
++a first increments the value of a and then returns lvalue referring to a, so if a is used then it will be the incremented value.
In your case there are total two ++a, thus the value of a will be incremented to 12 and thus assigned to a. so all the a in your expression will be holding the value 12 giving you the value of c=48.
a++ first returns an rvalue whose value is a, that is the old value, and then increments a at an unspecified time before the next full expression.
In your case if you use value of a after the expression it will be 13 as in the previous expression there was only one a++.
For eg.
int a = 10;
int C = a++ + ++a + ++a +a; // Here a=12 and the post increment effect will be applied in the next expression
int B = a + a; // Here a=13 the effect of previous post increment.
Regarding Error
With no space in expression, compiler will get confused when it will parse expression and thus dosent have any value to do the assignment.
PS: lvalue is a value that can be the target of an assignment.
In C/C++ the pre-increment (decrement) and the post-increment (decrement) operators require an L-value expression as operand. Providing an R-value or a const qualified variable results in compilation error.
Putting aside the fact that it would result in UB (as no sequence points between these multiple increments of the same variable)
a+++++a+++a+a
is parsed (as parser is greedy) as
((a++)++) + (a++) + a + a
and (a++)++ is illegal when a is a built-in type as int.
Related
This piece of code:
int scores[] {1,2,3,4};
int *score_ptr {scores};
//let's say that initial value of score_ptr is 1000
std::cout<<*score_ptr++;
produces the output:
1
As * and ++ have same precedence and then associativity is from right to left shouldn't we apply ++ operator first that is to increase the pointer first and then *(dereference) it?
So accordingly score_ptr will be increased to 1004 and then dereferencing it will give the second element of scores which is 2.
How and why does this give me output of 1 instead of 2?
As * and ++ have same precedence
No, postfix operator++ has higher precedence than operator*; then *score_ptr++ is equivalent to *(score_ptr++). Note that the postfix operator++ would increment the operand and return the original value, then *(score_ptr++) would give the value 1.
The result is prvalue copy of the original value of the operand.
On the other hand prefix operator++ returns incremented value. If you change the code to *++score_ptr (which is equivalent to *(++score_ptr)) then the result would be 2 (which might be what you expected).
The increment will happen first, it has higher precedence, it's equivalent to *(score_ptr++), but it's a post-increment, this means it will only happen after the dereferenced pointer is used, i.e. the expression reaches ;.
If you use
std::cout << *++score_ptr;
Then you have a pre-increment, here it will happen beforehand, the pointer will be incremented before the value is used and the output will be 2. Equivalent to *(++score_ptr).
Note that it's allways a good idea to use parentheses, it will make the code clearer and will avoid missinterpretations.
Take these three snippets of C code:
1) a = b + a++
2) a = b + a; a++
3) a = b + a, a++
Everyone knows that example 1 is a Very Bad Thing, and clearly invokes undefined behavior. Example 2 has no problems. My question is regarding example 3. Does the comma operator work like a semicolon in this kind of expression? Are 2 and 3 equivalent or is 3 just as undefined as 1?
Specifically I was considering this regarding something like free(foo), foo = bar. This is basically the same problem as above. Can I be sure that foo is freed before it's reassigned, or is this a clear sequence point problem?
I am aware that both examples are largely pointless and it makes far more sense to just use a semicolon and be done with it. I'm just asking out of curiosity.
Case 3 is well defined.
First, let's look at how the expression is parsed:
a = b + a, a++
The comma operator , has the lowest precedence, followed by the assignment operator =, the addition operator + and the postincrement operator ++. So with the implicit parenthesis it is parsed as:
(a = (b + a)), (a++)
From here, section 6.5.17 of the C standard regarding the comma operator , says the following:
2 The left operand of a comma operator is evaluated as a void expression; there is a sequence point between its
evaluation and that of the right operand. Then the right
operand is evaluated; the result has its type and value
Section 5.14 p1 of the C++11 standard has similar language:
A pair of expressions separated by a comma is evaluated left-to-right;
the left expression is a discarded- value expression.
Every value computation and side effect associated with the left
expression is sequenced before every value computation and side effect
associated with the right expression. The type and value of the result
are the type and value of the right operand; the result is of the same
value category as its right operand, and is a bit-field if its right
operand is a glvalue and a bit-field.
Because of the sequence point, a = b + a is guaranteed to be fully evaluated before a++ in the expression a = b + a, a++.
Regarding free(foo), foo = bar, this also guarantees that foo is free'ed before a new value is assigned.
a = b + a, a++; is well-defined, but a = (b + a, a++); can be undefined.
First of all, the operator precedence makes the expression equivalent to (a = (b+a)), a++;, where + has the highest precedence, followed by =, followed by ,. The comma operator includes a sequence point between the evaluation of its left and right operand. So the code is, uninterestingly, completely equivalent to:
a = b + a;
a++;
Which is of course well-defined.
Had we instead written a = (b + a, a++);, then the sequence point in the comma operator wouldn't save the day. Because then the expression would have been equivalent to
(void)(b + a);
a = a++;
In C and C++14 or older, a = a++ is unsequenced , (see C11 6.5.16/3). Meaning this is undefined behavior (Per C11 6.5/2). Note that C++11 and C++14 were badly formulated and ambiguous.
In C++17 or later, the operands of the = operator are sequenced right to left and this is still well-defined.
All of this assuming no C++ operator overloading takes place. In that case, the parameters to the overloaded operator function will be evaluated, a sequence point takes place before the function is called, and what happens from there depends on the internals of that function.
I have been fooling around with some code and saw something that I don't understand the "why" of.
int i = 6;
int j;
int *ptr = &i;
int *ptr1 = &j
j = i++;
//now j == 6 and i == 7. Straightforward.
What if you put the operator on the left side of the equals sign?
++ptr = ptr1;
is equivalent to
(ptr = ptr + 1) = ptr1;
whereas
ptr++ = ptr1;
is equivalent to
ptr = ptr + 1 = ptr1;
The postfix runs a compilation error and I get it. You've got a constant "ptr + 1" on the left side of an assignment operator. Fair enough.
The prefix one compiles and WORKS in C++. Yes, I understand it's messy and you're dealing with unallocated memory, but it works and compiles. In C this does not compile, returning the same error as the postfix "lvalue required as left operand of assignment". This happens no matter how it's written, expanded out with two "=" operators or with the "++ptr" syntax.
What is the difference between how C handles such an assignment and how C++ handles it?
In both C and C++, the result of x++ is an rvalue, so you can't assign to it.
In C, ++x is equivalent to x += 1 (C standard §6.5.3.1/p2; all C standard cites are to WG14 N1570). In C++, ++x is equivalent to x += 1 if x is not a bool (C++ standard §5.3.2 [expr.pre.incr]/p1; all C++ standard cites are to WG21 N3936).
In C, the result of an assignment expression is an rvalue (C standard §6.5.16/p3):
An assignment operator stores a value in the object designated by the
left operand. An assignment expression has the value of the left
operand after the assignment, but is not an lvalue.
Because it's not an lvalue, you can't assign to it: (C standard §6.5.16/p2 - note that this is a constraint)
An assignment operator shall have a modifiable lvalue as its left
operand.
In C++, the result of an assignment expression is an lvalue (C++ standard §5.17 [expr.ass]/p1):
The assignment operator (=) and the compound assignment operators all
group right-to-left. All require a modifiable lvalue as their left
operand and return an lvalue referring to the left operand.
So ++ptr = ptr1; is a diagnosable constraint violation in C, but does not violate any diagnosable rule in C++.
However, pre-C++11, ++ptr = ptr1; has undefined behavior, as it modifies ptr twice between two adjacent sequence points.
In C++11, the behavior of ++ptr = ptr1 becomes well defined. It's clearer if we rewrite it as
(ptr += 1) = ptr1;
Since C++11, the C++ standard provides that (§5.17 [expr.ass]/p1)
In all cases, the assignment is sequenced after the value computation
of the right and left operands, and before the value computation of
the assignment expression. With respect to an
indeterminately-sequenced function call, the operation of a compound
assignment is a single evaluation.
So the assignment performed by the = is sequenced after the value computation of ptr += 1 and ptr1. The assignment performed by the += is sequenced before the value computation of ptr += 1, and all value computations required by the += are necessarily sequenced before that assignment. Thus, the sequencing here is well-defined and there is no undefined behavior.
In C the result of pre and post increment are rvalues and we can not assign to an rvalue, we need an lvalue(also see: Understanding lvalues and rvalues in C and C++) . We can see by going to the draft C11 standard section 6.5.2.4 Postfix increment and decrement operators which says (emphasis mine going forward):
The result of the postfix ++ operator is the value of the
operand. [...] See the discussions of additive operators and compound
assignment for information on constraints, types, and conversions and
the effects of operations on pointers. [...]
So the result of post-increment is a value which is synonymous for rvalue and we can confirm this by going to section 6.5.16 Assignment operators which the paragraph above points us to for further understanding of constraints and results, it says:
[...] An assignment expression has the value of the left operand after the
assignment, but is not an lvalue.[...]
which further confirms the result of post-increment is not an lvalue.
For pre-increment we can see from section 6.5.3.1 Prefix increment and decrement operators which says:
[...]See the discussions of additive operators and compound assignment for
information on constraints, types, side effects, and conversions and
the effects of operations on pointers.
also points back to 6.5.16 like post-increment does and therefore the result of pre-increment in C is also not an lvalue.
In C++ post-increment is also an rvalue, more specifically a prvalue we can confirm this by going to section 5.2.6 Increment and decrement which says:
[...]The result is a prvalue. The type of the result is the cv-unqualified
version of the type of the operand[...]
With respect to pre-increment C and C++ differ. In C the result is an rvalue while in C++ the result is a lvalue which explains why ++ptr = ptr1; works in C++ but not C.
For C++ this is covered in section 5.3.2 Increment and decrement which says:
[...]The result is the updated operand; it is an lvalue, and it is a
bit-field if the operand is a bit-field.[...]
To understand whether:
++ptr = ptr1;
is well defined or not in C++ we need two different approaches one for pre C++11 and one for C++11.
Pre C++11 this expression invokes undefined behavior, since it is modifying the object more than once within the same sequence point. We can see this by going to a Pre C++11 draft standard section 5 Expressions which says:
Except where noted, the order of evaluation of operands of individual
operators and subexpressions of individual expressions, and the order
in which side effects take place, is unspecified.57) Between the
previous and next sequence point a scalar object shall have its stored
value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be accessed only to determine the
value to be stored. The requirements of this paragraph shall be met
for each allowable ordering of the subexpressions of a full
expression; otherwise the behavior is undefined. [ Example:
i = v[i ++]; / / the behavior is undefined
i = 7 , i++ , i ++; / / i becomes 9
i = ++ i + 1; / / the behavior is undefined
i = i + 1; / / the value of i is incremented
—end example ]
We are incrementing ptr and then subsequently assigning to it, which is two modifications and in this case the sequence point occurs at the end of the expression after the ;.
For C+11, we should go to defect report 637: Sequencing rules and example disagree which was the defect report that resulted in:
i = ++i + 1;
becoming well defined behavior in C++11 whereas prior to C++11 this was undefined behavior. The explanation in this report is one of best I have even seen and reading it many times was enlightening and helped me understand many concepts in a new light.
The logic that lead to this expression becoming well defined behavior goes as follows:
The assignment side-effect is required to be sequenced after the value computations of both its LHS and RHS (5.17 [expr.ass] paragraph 1).
The LHS (i) is an lvalue, so its value computation involves computing the address of i.
In order to value-compute the RHS (++i + 1), it is necessary to first value-compute the lvalue expression ++i and then do an lvalue-to-rvalue conversion on the result. This guarantees that the incrementation side-effect is sequenced before the computation of the addition operation, which in turn is sequenced before the assignment side effect. In other words, it yields a well-defined order and final value for this expression.
The logic is somewhat similar for:
++ptr = ptr1;
The value computations of the LHS and RHS are sequenced before the assignment side-effect.
The RHS is an lvalue, so its value computation involves computing the address of ptr1.
In order to value-compute the LHS (++ptr), it is necessary to first value-compute the lvalue expression ++ptr and then do an lvalue-to-rvalue conversion on the result. This guarantees that the incrementation side-effect is sequenced before the assignment side effect. In other words, it yields a well-defined order and final value for this expression.
Note
The OP said:
Yes, I understand it's messy and you're dealing with unallocated
memory, but it works and compiles.
Pointers to non-array objects are considered arrays of size one for additive operators, I am going to quote the draft C++ standard but C11 has almost the exact same text. From section 5.7 Additive operators:
For the purposes of these operators, a pointer to a nonarray object
behaves the same as a pointer to the first element of an array of
length one with the type of the object as its element type.
and further tells us pointing one past the end of an array is valid as long as you don't dereference the pointer:
[...]If both the pointer operand and the result point to elements of
the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.
so:
++ptr ;
is still a valid pointer.
Consider the following code snippet
int a,i;
a = 5;
(i++) = a;
(++i) = a;
cout<<i<<endl;
Line (++i) = a is compiling properly and giving 5 as output.
But (i++) = a is giving compilation error error: lvalue required as left operand of assignment.
I am not able to find the reason for such indifferent behavior. I would be grateful if someone explains this.
The expression i++ evaluates to the value of i prior to the increment operation. That value is a temporary (which is an rvalue) and you cannot assign to it.
++i works because that expression evaluates to i after it has been incremented, and i can be assigned to (it's an lvalue).
More on lvalues and rvalues on Wikipedia.
According to the C++ standard, prefix ++ is an lvalue (which
is different than C), post-fix no. More generally, C++ takes
the point of view that anything which changes an lvalue
parameter, and has as its value the value of that parameter,
results in an lvalue. So ++ i is an lvalue (since the
resulting value is the new value of i), but i ++ is not
(since the resulting value is not the new value, but the old).
All of this, of course, for the built-in ++ operators. If you
overload, it depends on the signatures of your overloads (but
a correctly designed overloaded ++ will behave like the
built-in ones).
Of course, neither (++ i) = a; nor (i ++) = a; in your
example are legal; both use the value of an uninitialized
variable (i), which is undefined behavior, and both modify i
twice without an intervening sequence point.
I think everyone here knows that --i is a left value expression while i-- is a right value expression. But I read the Assembly code of the two expression and find out that they are compiled to the same Assembly code:
mov eax,dword ptr [i]
sub eax,1
mov dword ptr [i],eax
In C99 language standard, An lvalue is defined to an expression with an object type or an incomplete type other than void.
So I can ensure that --i return a value which is an type other than void while i-- return a value which is void or maybe a temp variable.
However when I give a assignment such as i--=5, the compiler will give me an error indicating i-- is not a lvalue, I do no know why it is not and why the return value is a temp variable. How does the compiler make such a judgement? Can anybody give me some explanation in Assembly language level?Thanks!
Left value? Right value?
If you are talking about lvalues and rvalues, then the property of being lvalue or rvalue applies to the result of an expression, meaning that you have to consider the results of --i and i--. And in C language both --i and i-- are rvalues. So, your question is based on incorrect premise in the realm of C language. --i is not an lvalue in C. I don't know what point you are trying to make by referring to the C99 standard, since it clearly states that neither is an lvalue. Also, it is not clear what you mean by i-- returning a void. No, the built-in postfix -- never returns void.
The lvalue vs. rvalue distinction in case of --i and i-- exists in C++ only.
Anyway, if you are looking at mere --i; and i--; expression statements, you are not using the results of these expressions. You are discarding them. The only point to use standalone --i and i-- is their side-effects (decrement of i). But since their side-effects are identical, it is completely expected that the generated code is the same.
If you want to see the difference between --i and i-- expressions, you have to use their results. For example
int a = --i;
int b = i--;
will generate different code for each initialization.
This example has nothing to do with lvalueness or rvalueness of their results though. If you want to observe the difference from that side (which only exists in C++, as I said above), you can try this
int *a = &--i;
int *b = &i--;
The first initialization will compile in C++ (since the result is an lvalue) while the second won't compile (since the result is an rvalue and you cannot apply the built-in unary & to an rvalue).
The rationale behind this specification is rather obvious. Since the --i evaluates to the new value of i, it is perfectly possible to make this operator to return a reference to i itself as its result (and C++ language, as opposed to C, prefers to return lvalues whenever possible). Meanwhile, i-- is required to return the old value of i. Since by the time we get to analyze the result oh i-- the i itself is likely to hold the new value, we cannot return a reference to i. We have to save (or recreate) the old value of i in some auxiliary temporary location and return it as the result of i--. That temporary value is just a value, not an object. It does not need to reside in memory, which is why it cannot be an lvalue.
[Note: I'm answering this from a C++ perspective.]
Assuming i is a built-in type, if you just write --i; or i--; rather than, say, j = ++i; or j = i++;, then it's unsurprising that they get compiled to the assembly code by the compiler - they're doing the same thing, which is decrementing i. The difference only becomes apparent at the assembly level when you do something with the result, otherwise they effectively have the same semantics.
(Note that if we were thinking about overloaded pre- and post-decrement operators for a user-defined type, the code generated would not be the same.)
When you write something like i-- = 5;, the compiler quite rightly complains, because the semantics of post-decrement are essentially to decrement the thing in question but return the old value of it for further use. The thing returned will be a temporary, hence why i-- yields an r-value.
The terms “lvalue” and “rvalue” originate from the assignment expression E1 = E2, in which the left operand E1 is used to identify the object to be modified, and the right operand E2 identifies the value to be used. (See C 1999 6.3.2.1, note 53.)
Thus, an expression which still has some object associated with it can be used to locate that object and to write to it. This is an lvalue. If an expression is not an lvalue, it might be called an rvalue.
For example, if you have i, the name of some object, it is an lvalue, because we can find where i is, and we can assign to it, as in i = 3.
On the other hand, if we have the expression i+1, then we have taken the value of i and added 1, and we now have a value, but it is not associated with a particular object. This new value is not in i. It is just a temporary value and does not have a particular location. (To be sure, the compiler must put it somewhere, unless optimization removes the expression completely. But it might be in registers and never in memory. Even if it is in memory for some reason, the C language does not provide you for a way to find out where.) So i+1 is not an lvalue, because you cannot use it on the left side of an assignment.
--i and i++ are both expressions that result from taking the value of i and performing some arithmetic. (These expressions also change i, but that is a side effect of the operator, not part of the result it returns.) The “left” and “right” of lvalues and rvalues have nothing to do with whether -- or ++ operator is on the left side or the right side of a name; they have to do with the left side or the right side of an assignment. As other answers explain, in C++, when they are on the left side of an lvalue, they return an lvalue. However, this is coincidental; this definition of the operators in C++ came many years after the creation of the term “lvalue”.