Is increment stackable? I.e x++++; or (x++)++; - c++

When me and my friend were preparing for exam, my friend said that x+++; is the same as x+=3;
It is not true but is x++++; same as x+=1; or is (x++)++;? Could I generalize it? I.e. x++++++++++++++; or ((((((x++)++)++)++)++)++)++; is equivalent to x+=7;
Maybe it's completely wrong and it is true for ++++++x; or ++(++(++x)); equivalent to x+=3;
Also it should generalize to --x; and x--;

The behavior of your program can be understood using the following rules from the standard.
From lex.pptoken#3.3:
Otherwise, the next preprocessing token is the longest sequence of characters that could constitute a preprocessing token, even if that would cause further lexical analysis to fail, except that a header-name is only formed within a #include directive.
And from lex.pptoken#5:
[ Example: The program fragment x+++++y is parsed as x ++ ++ + y, which, if x and y have integral types, violates a constraint on increment operators, even though the parse x ++ + ++ y might yield a correct expression.  — end example ]
is x++++; same as x+=1;
Using the statement quoted above, x++++ will be parsed as x++ ++.
But note that from increment/decrement operator's documentation:
The operand expr of a built-in postfix increment or decrement operator must be a modifiable (non-const) lvalue of non-boolean (since C++17) arithmetic type or pointer to completely-defined object type. The result is prvalue copy of the original value of the operand.
That means the result of x++ will a prvalue. Thus the next postfix increment ++ cannot be applied on that prvalue since it requires an lvalue. Hence, x++++ will not compile.
Similarly, you can use the above quoted statements to understand the behavior of other examples in your snippet.

Related

Does the modification of the LHS always happen strictly after the RHS has been evaluated during assignment for built-in type? [duplicate]

map<int, int> mp;
printf("%d ", mp.size());
mp[10]=mp.size();
printf("%d\n", mp[10]);
This code yields an answer that is not very intuitive:
0 1
I understand why it happens - the left side of the assignment returns reference to mp[10]'s underlying value and at the same time creates aforementioned value, and only then is the right side evaluated, using the newly computed size() of the map.
Is this behaviour stated anywhere in C++ standard? Or is the order of evaluation undefined?
Result was obtained using g++ 5.2.1.
Yes, this is covered by the standard and it is unspecified behavior. This particular case is covered in a recent C++ standards proposal: N4228: Refining Expression Evaluation Order for Idiomatic C++ which seeks to refine the order of evaluation rules to make it well specified for certain cases.
It describes this problem as follows:
Expression evaluation order is a recurring discussion topic in the C++
community. In a nutshell, given an expression such as f(a, b,
c), the order in which the sub-expressions f, a, b, c are evaluated is left unspecified by the standard. If any two of these sub-expressions happen to modify the same object without intervening sequence points, the behavior of the program is undefined. For instance, the expression f(i++, i) where i is an
integer variable leads to undefined behavior , as does v[i]
= i++. Even when the behavior is not undefined, the result of evaluating an expression can still be anybody’s guess. Consider
the following program fragment:
#include <map>
int main() {
std::map<int, int> m;
m[0] = m.size(); // #1
}
What should the map object m look like after evaluation of the
statement marked #1? { {0, 0 } } or {{0, 1 } } ?
We know that unless specified the evaluations of sub-expressions are unsequenced, this is from the draft C++11 standard section 1.9 Program execution which says:
Except where noted, evaluations of operands of individual operators
and of subexpressions of individual expressions are unsequenced.[...]
and all the section 5.17 Assignment and compound assignment operators [expr.ass] says is:
[...]In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.[...]
So this section does not nail down the order of evaluation but we know this is not undefined behavior since both operator [] and size() are function calls and section 1.9 tells us(emphasis mine):
[...]When calling a function (whether or not the function is inline), every value computation and side effect
associated with any argument expression, or with the postfix expression designating the called function, is
sequenced before execution of every expression or statement in the body of the called function. [ Note: Value
computations and side effects associated with different argument expressions are unsequenced. —end note ]
Every evaluation in the calling function (including other function calls) that is not otherwise specifically
sequenced before or after the execution of the body of the called function is indeterminately sequenced with
respect to the execution of the called function.9[...]
Note, I cover the second interesting example from the N4228 proposal in the question Does this code from “The C++ Programming Language” 4th edition section 36.3.6 have well-defined behavior?.
Update
It seems like a revised version of N4228 was accepted by the Evolution Working Group at the last WG21 meeting but the paper(P0145R0) is not yet available. So this could possibly no longer be unspecified in C++17.
Update 2
Revision 3 of p0145 made this specified and update [expr.ass]p1:
The assignment operator (=) and the compound assignment operators all group right-to-left.
All require a modifiable lvalue as their left operand; their result is an lvalue referring to the left operand.
The result in all cases is a bit-field if the left operand is a bit-field.
In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression. The right operand is sequenced before the left operand. ...
From the C++11 standard (emphasis mine):
5.17 Assignment and compound assignment operators
1 The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modifiable lvalue as their left operand and return an lvalue referring to the left operand. The result in all cases is a bit-field if the left operand is a bit-field. In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.
Whether the left operand is evaluated first or the right operand is evaluated first is not specified by the language. A compiler is free to choose to evaluate either operand first. Since the final result of your code depends on the order of evaluation of the operands, I would say it is unspecified behavior rather than undefined behavior.
1.3.25 unspecified behavior
behavior, for a well-formed program construct and correct data, that depends on the implementation
I'm sure that the standard does not specify for an expression x = y; which order x or y is evaluated in the C++ standard (this is the reason why you can't do *p++ = *p++ for example, because p++ is not done in a defined order).
In other words, to guarantee order x = y; in a defined order, you need to do break it up into two sequence points.
T tmp = y;
x = tmp;
(Of course, in this particular case, one might presume the compiler prefers to do operator[] before size() because it can then store the value directly into the result of operator[] instead of keeping it in a temporary place, to store it later after operator[] has been evaluated - but I'm pretty sure the compiler doesn't NEED to do it in that order)
Let's take a look at what your code breaks down to:
mp.operator[](10).operator=(mp.size());
which pretty much tells the story that in the first part an entry to 10 is created and in the second part the size of the container is assigned to the integer reference in position of 10.
But now you get into the order of evaluation problem which is unspecified. Here is a much simpler example .
When should map::size() get called, before or after map::operator(int const &); ?
Nobody really knows.

Does writing the new value form part of the "value computation" of a pre-increment expression, or is it a "side effect"?

Consider the following expression (with declaration for exposition):
int n = 42;
--n &= 0x01;
Does this fall foul of sequencing rules?
In my opinion, the pre-increment is needed as part of the "value computation" of the left-hand operand. If this is true, there's no UB here since C++11 (and, since C++17, both value computations and side effects are sequenced relative to the assignment).
If it were a post-increment, then the modification of n would be merely a side-effect and we'd not have good sequencing (until C++17).
I suppose you are right, here's what standard says:
8.5.18 Assignment and compound assignment operators
All require a modifiable lvalue as their left operand; their result is
an lvalue referring to the left operand. [...]
In all cases, the assignment is sequenced after the value computation
of the right and left operands, and before the value computation of
the assignment expression.
So from above it seems assignment is value expression and both left and right of assignment are evaluated before assignment.
From standard about the preincrement:
8.5.2.2 Increment and decrement
The result is the updated operand; it is an lvalue, and it is a
bit-field if the operand is a bit-field. The expression
++x is equivalent to x+=1.
Which means that even before C++17 its side effect is sequenced before value computation.
As far as I can tell the wording in C++11 doesn't mention the "value computation" of preincrement and predecrement in relation to the update:
[expr.pre.incr]
1 The operand of prefix ++ is modified by adding 1, or set to true if it
is bool (this use is deprecated). The operand shall be a modifiable
lvalue. The type of the operand shall be an arithmetic type or a
pointer to a completely-defined object type. The result is the updated
operand; it is an lvalue, and it is a bit-field if the operand is a
bit-field. If x is not of type bool, the expression ++x is equivalent
to x+=1.
There is nothing in the above paragraph from which I'd conclude the modification has to come first. An implementation may very well compute the updated value (and use it) prior to writing it into the object by the next sequence point.
In which case we will have ourselves a side-effect that is indeterminately sequenced with another modification. So I'd say since the standard doesn't specify if there is a side-effect, nor how such potential side-effect is to be sequenced, the whole thing is undefined by omission.
With C++17, we of course get well defined sequencing with or without this potential side-effect.

Why are multiple pre-increments allowed in C++ but not in C? [duplicate]

This question already has answers here:
Why are multiple increments/decrements valid in C++ but not in C?
(4 answers)
Closed 5 years ago.
Why is
int main()
{
int i = 0;
++++i;
}
valid C++ but not valid C?
C and C++ say different things about the result of prefix ++. In C++:
[expr.pre.incr]
The operand of prefix ++ is modified by adding 1. The operand shall be
a modifiable lvalue. The type of the operand shall be an arithmetic
type other than cv bool, or a pointer to a completely-defined object
type. The result is the updated operand; it is an lvalue, and it is a
bit-field if the operand is a bit-field. The expression ++x is
equivalent to x+=1.
So ++ can be applied on the result again, because the result is basically just the object being incremented and is an lvalue. In C however:
6.5.3 Unary operators
The operand of the prefix increment or decrement operator shall have atomic, qualified, or unqualified real or pointer type, and shall be a modifiable lvalue.
The value of the operand of the prefix ++ operator is incremented. The
result is the new value of the operand after incrementation.
The result is not an lvalue; it's just the pure value of the incrementation. So you can't apply any operator that requires an lvalue on it, including ++.
If you are ever told the C++ and C are superset or subset of each other, know that it is not the case. There are many differences that make that assertion false.
In C, it's always been that way. Possibly because pre-incremented ++ can be optimised to a single machine code instruction on many CPUs, including ones from the 1970s which was when the ++ concept developed.
In C++ though there's the symmetry with operator overloading to consider. To match C, the canonical pre-increment ++ would need to return const &, unless you had different behaviour for user-defined and built-in types (which would be a smell). Restricting the return to const & is a contrivance. So the return of ++ gets relaxed from the C rules, at the expense of increased compiler complexity in order to exploit any CPU optimisations for built-in types.
I assume you understand why it's fine in C++ so I'm not going to elaborate on that.
For whatever it's worth, here's my test result:
t.c:6:2: error: lvalue required as increment operand
++ ++c;
^
Regarding CppReference:
Non-lvalue object expressions
Colloquially known as rvalues, non-lvalue object expressions are the expressions of object types that do not designate objects, but rather values that have no object identity or storage location. The address of a non-lvalue object expression cannot be taken.
The following expressions are non-lvalue object expressions:
all operators not specified to return lvalues, including
increment and decrement operators (note: pre- forms are lvalues in C++)
And Section 6.5.3.1 from n1570:
The value of the operand of the prefix ++ operator is incremented. The result is the new value of the operand after incrementation.
So in C, the result of prefix increment and prefix decrement operators are not required to be lvalue, thus not incrementable again. In fact, such word can be understood as "required to be rvalue".
The other answers explain the way that the standards diverge in what they require. This answer provides a motivating example in the area of difference.
In C++, you can have a function like int& foo(int&);, which has no analog in C. It is useful (and not onerous) for C++ to have the option of foo(foo(x));.
Imagine for a moment that operations on basic types were defined somewhere, e.g. int& operator++(int&);. ++++x itself is not a motivating example, but it fits the pattern of foo above.

In C++11, does `i += ++i + 1` exhibit undefined behavior?

This question came up while I was reading (the answers to) So why is i = ++i + 1 well-defined in C++11?
I gather that the subtle explanation is that (1) the expression ++i returns an lvalue but + takes prvalues as operands, so a conversion from lvalue to prvalue must be performed; this involves obtaining the current value of that lvalue (rather than one more than the old value of i) and must therefore be sequenced after the side effect from the increment (i.e., updating i) (2) the LHS of the assignment is also an lvalue, so its value evaluation does not involve fetching the current value of i; while this value computation is unsequenced w.r.t. the value computation of the RHS, this poses no problem (3) the value computation of the assignment itself involves updating i (again), but is sequenced after the value computation of its RHS, and hence after the prvious update to i; no problem.
Fine, so there is no UB there. Now my question is what if one changed the assigment operator from = to += (or a similar operator).
Does the evaluation of the expression i += ++i + 1 lead to undefined behavior?
As I see it, the standard seems to contradict itself here. Since the LHS of += is still an lvalue (and its RHS still a prvalue), the same reasoning as above applies as far as (1) and (2) are concerned; there is no undefined behavior in the evalutation of the operands on +=. As for (3), the operation of the compound assignment += (more precisely the side effect of that operation; its value computation, if needed, is in any case sequenced after its side effect) now must both fetch the current value of i, and then (obviously sequenced after it, even if the standard does not say so explicitly, or otherwise the evaluation of such operators would always invoke undefined behavior) add the RHS and store the result back into i. Both these operations would have given undefined behavior if they were unsequenced w.r.t. the side effect of the ++, but as argued above (the side effect of the ++ is sequenced before the value computation of + giving the RHS of the += operator, which value computation is sequenced before the operation of that compound assignment), that is not the case.
But on the other hand the standard also says that E += F is equivalent to E = E + F, except that (the lvalue) E is evaluated only once. Now in our example the value computation of i (which is what E is here) as lvalue does not involve anything that needs to be sequenced w.r.t. other actions, so doing it once or twice makes no difference; our expression should be strictly equivalent to E = E + F. But here's the problem; it is pretty obvious that evaluating i = i + (++i + 1) would give undefined behaviour! What gives? Or is this a defect of the standard?
Added. I have slightly modified my discussion above, to do more justice to the proper distinction between side effects and value computations, and using "evaluation" (as does the standard) of an expression to encompass both. I think my main interrogation is not just about whether behavior is defined or not in this example, but how one must read the standard in order to decide this. Notably, should one take the equivalence of E op= F to E = E op F as the ultimate authority for the semantics of the compound assignment operation (in which case the example clearly has UB), or merely as an indication of what mathematical operation is involved in determining the value to be assigned (namely the one identified by op, with the lvalue-to-rvalue converted LHS of the compound assignment operator as left operand and its RHS as right operand). The latter option makes it much harder to argue for UB in this example, as I have tried to explain. I admit that it is tempting to make the equivalence authoritative (so that compound assignments become a kind of second-class primitives, whose meaning is given by rewriting in term of first-class primitives; thus the language definition would be simplified), but there are rather strong arguments against this:
The equivalence is not absolute, because of the "E is evaluated only once" exception. Note that this exception is essential to avoid making any use where the evaluation of E involves a side effect undefined behavior, for instance in the fairly common a[i++] += b; usage. If fact I think no absolutely equivalent rewriting to eliminate compound assignments is possible; using a fictive ||| operator to designate unsequenced evaluations, one might try to define E op= F; (with int operands for simplicity) as equivalent to { int& L=E ||| int R=F; L = L + R; }, but then the example no longer has UB. In any case the standard gives us no rewriitng recipe.
The standard does not treat compound assignments as second-class primitives for which no separate definition of semantics is necessary. For instance in 5.17 (emphasis mine)
The assignment operator (=) and the compound assignment operators all group right-to-left. [...] In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression. With respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation.
If the intention were to let compound assignments be mere shorthands for simple assignments, there would be no reason to include them explicitly in this description. The final phrase even directly contradicts what would be the case if the equivalence was taken to be authoritative.
If one admits that compound assignments have a semantics of their own, then the point arises that their evaluation involves (apart from the mathematical operation) more than just a side effect (the assignment) and a value evaluation (sequenced after the assignment), but also an unnamed operation of fetching the (previous) value of the LHS. This would normally be dealt with under the heading of "lvalue-to-rvalue conversion", but doing so here is hard to justify, since there is no operator present that takes the LHS as an rvalue operand (though there is one in the expanded "equivalent" form). It is precisely this unnamed operation whose potential unsequenced relation with the side effect of ++ would cause UB, but this unsequenced relation is nowhere explicitly stated in the standard, because the unnamed operation is not. It is hard to justify UB using an operation whose very existence is only implicit in the standard.
About the description of i = ++i + 1
I gather that the subtle explanation is that
(1) the expression ++i returns an lvalue but + takes prvalues as operands, so a conversion from lvalue to prvalue must be performed;
Probably, see CWG active issue 1642.
this involves obtaining the
current value of that lvalue (rather than one more than the old value
of i) and must therefore be sequenced after the side effect from the
increment (i.e., updating i)
The sequencing here is defined for the increment (indirectly, via +=, see (a)):
The side effect of ++ (the modification of i) is sequenced before the value computation of the whole expression ++i. The latter refers to computing the result of ++i, not to loading the value of i.
(2) the LHS of the assignment is also an
lvalue, so its value evaluation does not involve fetching the current
value of i;
while this value computation is unsequenced w.r.t. the
value computation of the RHS, this poses no problem
I don't think that's properly defined in the Standard, but I'd agree.
(3) the value
computation of the assignment itself involves updating i (again),
The value computation of i = expr is only required when you use the result, e.g. int x = (i = expr); or (i = expr) = 42;. The value computation itself does not modify i.
The modification of i in the expression i = expr that happens because of the = is called the side effect of =. This side effect is sequenced before value computation of i = expr -- or rather the value computation of i = expr is sequenced after the side effect of the assignment in i = expr.
In general, the value computation of the operands of an expression are sequenced before the side effect of that expression, of course.
but is sequenced after the value computation of its RHS, and hence after
the previous update to i; no problem.
The side effect of the assignment i = expr is sequenced after the value computation of the operands i (A) and expr of the assignment.
The expr in this case is a +-expression: expr1 + 1. The value computation of this expression is sequenced after the value computations of its operands expr1 and 1.
The expr1 here is ++i. The value computation of ++i is sequenced after the side effect of ++i (the modification of i) (B)
That's why i = ++i + 1 is safe: There's a chain of sequenced before between the value computation in (A) and the side effect on the same variable in (B).
(a) The Standard defines ++expr in terms of expr += 1, which is defined as expr = expr + 1 with expr being evaluated only once.
For this expr = expr + 1, we therefore have only one value computation of expr. The side effect of = is sequenced before the value computation of the whole expr = expr + 1, and it's sequenced after the value computation of the operands expr (LHS) and expr + 1 (RHS).
This corresponds to my claim that for ++expr, the side effect is sequenced before the value computation of ++expr.
About i += ++i + 1
Does the value computation of i += ++i + 1 involve undefined behavior?
Since the
LHS of += is still an lvalue (and its RHS still a prvalue), the same
reasoning as above applies as far as (1) and (2) are concerned;
as for
(3) the value computation of the += operator now must both fetch the
current value of i, and then (obviously sequenced after it, even if
the standard does not say so explicitly, or otherwise the execution of
such operators would always invoke undefined behavior) perform the
addition of the RHS and store the result back into i.
I think here's the problem: The addition of i in the LHS of i += to the result of ++i + 1 requires knowing the value of i - a value computation (which can mean loading the value of i). This value computation is unsequenced with respect to the modification performed by ++i. This is essentially what you say in your alternative description, following the rewrite mandated by the Standard i += expr -> i = i + expr. Here, the value computation of i within i + expr is unsequenced with respect to the value computation of expr. That's where you get UB.
Please note that a value computation can have two results: The "address" of an object, or the value of an object. In an expression i = 42, the value computation of the lhs "produces the address" of i; that is, the compiler needs to figure out where to store the rhs (under the rules of observable behaviour of the abstract machine). In an expression i + 42, the value computation of i produces the value. In the above paragraph, I was referring to the second kind, hence [intro.execution]p15 applies:
If a side effect on a scalar object is unsequenced relative to either
another side effect on the same scalar object or a value computation
using the value of the same scalar object, the behavior is undefined.
Another approach for i += ++i + 1
the value computation of the += operator now must both fetch the
current value of i, and then [...] perform the addition of the RHS
The RHS being ++i + 1. Computing the result of this expression (the value computation) is unsequenced with respect to the value computation of i from the LHS. So the word then in this sentence is misleading: Of course, it must first load i and then add the result of the RHS to it. But there's no order between the side-effect of the RHS and the value computation to get the value of the LHS. For example, you could get for the LHS either the old or the new value of i, as modified by the RHS.
In general a store and a "concurrent" load is a data race, which leads to Undefined Behaviour.
Addressing the addendum
using a fictive ||| operator to designate unsequenced evaluations, one might try to define E op= F; (with int operands for simplicity) as equivalent to { int& L=E ||| int R=F; L = L + R; }, but then the example no longer has UB.
Let E be i and F be ++i (we don't need the + 1). Then, for i = ++i
int* lhs_address;
int lhs_value;
int* rhs_address;
int rhs_value;
( lhs_address = &i)
||| (i = i+1, rhs_address = &i, rhs_value = *rhs_address);
*lhs_address = rhs_value;
On the other hand, for i += ++i
( lhs_address = &i, lhs_value = *lhs_address)
||| (i = i+1, rhs_address = &i, rhs_value = *rhs_address);
int total_value = lhs_value + rhs_value;
*lhs_address = total_value;
This is intended to represent my understanding of the sequencing guarantees. Note that the , operator sequences all value computations and side effects of the LHS before those of the RHS. Parentheses do not affect sequencing. In the second case, i += ++i, we have a modification of i unsequenced wrt an lvalue-to-rvalue conversion of i => UB.
The standard does not treat compound assignments as second-class primitives for which no separate definition of semantics is necessary.
I would say that's a redundancy. The rewrite from E1 op = E2 to E1 = E1 op E2 also includes which expression types and value categories are required (on the rhs, 5.17/1 says something about the lhs), what happens to pointer types, the required conversions etc. The sad thing is that the sentence about "With respect to an.." in 5.17/1 is not in 5.17/7 as an exception of that equivalence.
In any way, I think we should compare the guarantees and requirements for compound assignment vs. simple assignment plus the operator, and see if there's any contradiction.
Once we put that "With respect to an.." also in the list of exceptions in 5.17/7, I don't think there's a contradiction.
As it turns out, as you can see in the discussion of Marc van Leeuwen's answer, this sentence leads to the following interesting observation:
int i; // global
int& f() { return ++i; }
int main() {
i = i + f(); // (A)
i += f(); // (B)
}
It seems that (A) has an two possible outcomes, since the evaluation of the body of f is indeterminately sequenced with the value computation of the i in i + f().
In (B), on the other hand, the evaluation of the body of f() is sequenced before the value computation of i, since += must be seen as a single operation, and f() certainly needs to be evaluated before the assignment of +=.
The expression:
i += ++i + 1
does invoke undefined behavior. The language lawyer method requires us to go back to the defect report that results in:
i = ++i + 1 ;
becoming well defined in C++11, which is defect report 637. Sequencing rules and example disagree , it starts out saying:
In 1.9 [intro.execution] paragraph 16, the following expression is
still listed as an example of undefined behavior:
i = ++i + 1;
However, it appears that the new sequencing rules make this expression
well-defined
The logic used in the report is as follows:
The assignment side-effect is required to be sequenced after the value computations of both its LHS and RHS (5.17 [expr.ass] paragraph 1).
The LHS (i) is an lvalue, so its value computation involves computing the address of i.
In order to value-compute the RHS (++i + 1), it is necessary to first value-compute the lvalue expression ++i and then do an lvalue-to-rvalue conversion on the result. This guarantees that the incrementation side-effect is sequenced before the computation of the addition operation, which in turn is sequenced before the assignment side effect. In other words, it yields a well-defined order and final value for this expression.
So in this question our problem changes the RHS which goes from:
++i + 1
to:
i + ++i + 1
due to draft C++11 standard section 5.17 Assignment and compound assignment operators which says:
The behavior of an expression of the form E1 op = E2 is equivalent to
E1 = E1 op E2 except that E1 is evaluated only once. [...]
So now we have a situation where the computation of i in the RHS is not sequenced relative to ++i and so we then have undefined behavior. This follows from section 1.9 paragraph 15 which says:
Except where noted, evaluations of operands of individual operators
and of subexpressions of individual expressions are unsequenced. [
Note: In an expression that is evaluated more than once during the
execution of a program, unsequenced and indeterminately sequenced
evaluations of its subexpressions need not be performed consistently
in different evaluations. —end note ] The value computations of the
operands of an operator are sequenced before the value computation of
the result of the operator. If a side effect on a scalar object is
unsequenced relative to either another side effect on the same scalar
object or a value computation using the value of the same scalar
object, the behavior is undefined.
The pragmatic way to show this would be to use clang to test the code, which generates the following warning (see it live):
warning: unsequenced modification and access to 'i' [-Wunsequenced]
i += ++i + 1 ;
~~ ^
for this code:
int main()
{
int i = 0 ;
i += ++i + 1 ;
}
This is further bolstered by this explicit test example in clang's test suite for -Wunsequenced:
a += ++a;
Yes, it is UB!
The evaluation of your expression
i += ++i + 1
proceeds in the following steps:
5.17p1 (C++11) states (emphases mine):
The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modifiable lvalue as their left operand and return an lvalue referring to the left operand. The result in all cases is a bit-field if the left operand is a bit-field. In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.
What does "value computation" mean?
1.9p12 gives the answer:
Accessing an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression (or a sub-expression) in general includes both value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and initiation of side effects.
Since your code uses a compound assignment operator, 5.17p7 tells us, how this operator behaves:
The behavior of an expression of the form E1 op= E2 is equivalent to E1 = E1 op E2 except that E1 is evaluated only once.
Hence the evaluation of the expression E1 ( == i) involves both, determining the identity of the object designated by i and an lvalue-to-rvalue conversion to fetch the value stored in that object. But the evaluation of the two operands E1 and E2 are not sequenced with respect to each other. Thus we get undefined behavior since the evaluation of E2 ( == ++i + 1) initiates a side effect (updating i).
1.9p15:
... If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
The following statements in your question/comments seem to be the root of your misunderstanding:
(2) the LHS of the assignment is also an lvalue, so its value evaluation does not involve fetching the current value of i
fetching a value can be part of a prvalue evaluation. But in E += F the only prvalue is F so fetching the value of E is not part of the evaluation of the (lvalue) subexpression E
If an expression is an lvalue or rvalue doesn't tell anything about how this expression is to be evaluated. Some operators require lvalues as their operands some others require rvalues.
Clause 5p8:
Whenever a glvalue expression appears as an operand of an operator that expects a prvalue for that operand, the lvalue-to-rvalue (4.1), array-to-pointer (4.2), or function-to-pointer (4.3) standard conversions are applied to convert the expression to a prvalue.
In a simple assignment the evaluation of of the LHS only requires determining the identity of the object. But in a compound assignment such as += the LHS must be a modifiable lvalue, but the evaluation of the LHS in this case consists of determining the identity of the object and an lvalue-to-rvalue conversion. It is the result of this conversion (which is a prvalue) that is added to the result (also a prvalue) of the evaluation of the RHS.
"But in E += F the only prvalue is F so fetching the value of E is not part of the evaluation of the (lvalue) subexpression E"
That's not true as I explained above. In your example F is a prvalue expression, but F may as well be an lvalue expression. In that case, the lvalue-to-rvalue conversion is also applied to F. 5.17p7 as cited above tells us, what the semantics of the compound assignment operators are. The standard states that the behavior of E += F is the same as of E = E + F but E is only evaluated once. Here, the evaluation of E includes the lvalue-to-rvalue conversion, because the binary operator + requires it operands to be rvalues.
There is no clear case for Undefined Behavior here
Sure, an argument leading to UB can be given, as I indicated in the question, and which has been repeated in the answers given so far. However this involves a strict reading of 5.17:7 that is both self-contradictory and in contradiction with explicit statements in 5.17:1 about compound assignment. With a weaker reading of 5.17:7 the contradictions disappear, as does the argument for UB. Whence my conclusion is neither that there is UB here, nor that there is clearly defined behaviour, but the the text of the standard is inconsistent, and should be modified to make clear which reading prevails (and I suppose this means a defect report should be written). Of course one might invoke here the fall-back clause in the standard (the note in 1.3.24) that evaluations for which the standard fails to define the behavior [unambiguously and self-consistently] are Undefined Behavior, but that would make any use of compound assignments (including prefix increment/decrement operators) into UB, something that might appeal to certain implementors, but certainly not to programmers.
Instead of arguing for the given problem, let me present a slightly modified example that brings out the inconsistency more clearly. Assume one has defined
int& f (int& a) { return a; }
a function that does nothing and returns its (lvalue) argument. Now modify the example to
n += f(++n) + 1;
Note that while some extra conditions about sequencing of function calls are given in the standard, this would at first glance not seem to effect the example, since there are no side effect at all from the function call (not even locally inside the function), as the incrementation happens in the argument expression for f, whose evaluation is not subject to those extra conditions. Indeed, let us apply the Crucial Argument for Undefined Behavior (CAUB), namely 5.17:7 which says that the behavior of such a compound assignment is equivalent to that of (in this case)
n = n + f(++n) + 1;
except that n is evaluated only once (an exception that makes no difference here). The evaluation of the statement I just wrote clearly has UB (the value computation of the first (prvalue) n in the RHS is unsequenced w.r.t. the side effect of the ++ operation, which involves the same scalar object (1.9:15) and you're dead).
So the evaluation of n += f(++n) + 1 has undefined behavior, right? Wrong! Read in 5.17:1 that
With respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation. [ Note: Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single compound assignment operator. — end note ]
This language is far from as precise as I would like it to be, but I don't think it is a stretch to assume that "indeterminately-sequenced" should mean "with respect to that operation of a compound assignment". The (non normative, I know) note makes it clear that the lvalue-to-rvalue conversion is part of the operation of the compound assignment. Now is the call of f indeterminately-sequenced with respect to the operation of the compound assignment of +=? I'm unsure, because the 'sequenced' relation is defined for individual value computations and side effects, not complete evaluations of operators, which may involve both. In fact the evaluation of a compound assignment operator involves three items: the lvalue-to-rvalue conversion of its left operand, the side effect (the assignment proper), and the value computation of the compound assignment (which is sequenced after the side effect, and returns the original left operand as lvalue). Note that the existence of the lvalue-to-rvalue conversion is never explicitly mentioned in the standard except in the note cited above; in particular, the standard makes no (other) statement at all regarding its sequencing relative to other evaluations. It is pretty clear that in the example the call of f is sequenced before the side effect and value computation of += (since the call occurs in the value computation of the right operand to +=), but it might be indeterminately-sequenced with respect to the lvalue-to-rvalue conversion part. I recall from my question that since the left operand of += is an lvalue (and necessarily so), one cannot construe the lvalue-to-rvalue conversion to have occurred as part of the value computation of the left operand.
However, by the principle of the excluded middle, the call to f must either be indeterminately-sequenced with respect to the operation of the compound assignment of +=, or not indeterminately-sequenced with respect to it; in the latter case it must be sequenced before it because it cannot possibly be sequenced after it (the call of f being sequenced before the side effect of +=, and the relation being anti-symmetric). So first assume it is indeterminately-sequenced with respect to the operation. Then the cited clause says that w.r.t. the call of f the evaluation of += is a single operation, and the note explains that it means the call should not intervene between the lvalue-to-rvalue conversion and the side effect associated with +=; it should either be sequenced before both, or after both. But being sequenced after the side effect is not possible, so it should be before both. This makes (by transitivity) the side effect of ++ sequenced before the lvalue-to-rvalue conversion, exit UB. Next assume the call of f is sequenced before the operation of +=. Then it is in particular sequenced before the lvalue-to-rvalue conversion, and again by transitivity so is the side effect of ++; no UB in this branch either.
Conclusion: 5.17:1 contradicts 5.17:7 if the latter is taken (CAUB) to be normative for questions of UB resulting from unsequenced evaluations by 1.9:15. As I said CAUB is self-contradictory as well (by arguments indicated in the question), but this answer is getting to long, so I'll leave it at this for now.
Three problems, and two proposals for resolving them
Trying to understand what the standard writes about these matters, I distinguish three aspects in which the text is hard to interpret; they all are of a nature that the text is insufficiently clear about what model its statements are referring to. (I cite the texts at the end of the numbered items, since I do not know the markup to resume a numbered item after a quote)
The text of 5.17:7 is of an apparent simplicity that, although the intention is easy to grasp, gives us little hold when applied to difficult situations. It makes a sweeping claim (equivalent behavior, apparently in all aspects) but whose application is thwarted by the exception clause. What if the behavior of E1 = E1 op E2 is undefined? Well then that of E1 op = E2 should be as well. But what if the UB was due to E1 being evaluated twice in E1 = E1 op E2? Then evaluating E1 op = E2 should presumably not be UB, but if so, then defined as what? This is like saying "the youth of the second twin was exactly like that of the first, except that he did not die at childbirth." Frankly, I think this text, which has little evolved since the C version "A compound assignment of the the form E1 op = E2 differs from the simple assignment expression E1 = E1 op E2 only in that the lvalue E1 is evaluated only once." might be adapted to match the changes in the standard.
(5.17) 7 The behavior of an expression of the form E1 op = E2 is equivalent to
E1 = E1 op E2 except that E1 is evaluated only once.[...]
It is not so clear what precisely the actions (evaluations) are between which the 'sequenced' relation is defined. It is said (1.9:12) that evaluation of an expression includes value computations and initiation of side effects. Though this appears to say that an evaluation may have multiple (atomic) components, the sequenced relation is actually mostly defined (e.g. in 1.9:14,15) for individual components, so that it might be better to read this as that the notion of "evaluation" encompasses both value computations and (initiation of) side effects. However in some cases the 'sequenced' relation is defined for the (entire) execution of an expression of statement (1.9:15) or for a function call (5.17:1), even though a passage in 1.9:15 avoids the latter by referring directly to executions in the body of a called function.
(1.9) 12 Evaluation of an expression (or a sub-expression) in general includes
both value computations (...) and initiation of side effects. [...] 13 Sequenced before is an asymmetric, transitive, pair-wise relation between evaluations executed by a single thread [...] 14 Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated. [...] 15 When calling a function (whether or not the function is inline), every value computation and side effect
associated with any argument expression, or with the postfix expression designating the called function, is
sequenced before execution of every expression or statement in the body of the called function. [...] Every evaluation in the calling function (including other function calls) ... is indeterminately sequenced with
respect to the execution of the called function [...] (5.2.6, 5.17) 1 ... With respect to an indeterminately-sequenced function call, ...
The text should more clearly acknowledge that a compound assignment involves, in contrast to a simple assignment, the action of fetching the value previously assigned to its left operand; this action is like lvalue-to-rvalue conversion, but does not happen as part of the value computation of that left operand, since it is not a prvalue; indeed it is a problem that 1.9:12 only acknowledges such action for prvalue evaluation. In particular the text should be more clear about which 'sequenced' relations are given for that action, if any.
(1.9) 12 Evaluation of an expression... includes... value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation)
The second point is the least directly related to our concrete question, and I think it can be solved simply by choosing a clear point of view and reformulating pasages that seem to indicate a different point of view. Given that one of the main purposes of the old sequence points, and now the 'sequenced' relation, was to make clear that the side effect of postfix-increment operators is unsequenced w.r.t. to actions sequenced after the value computation of that operator (thus giving e.g. i = i++ UB), the point of view must be that individual value computations and (initiation of) individual side effects are "evaluations" for which "sequenced before" may be defined. For pragmatic reasons I would also include two more kinds of (trivial) "evaluations": function entry (so that the language of 1.9:15 may be simplified to: "When calling a function..., every value computation and side effect associated with any of its argument expressions, or with the postfix expression designating the called function, is sequenced before entry of that function") and function exit (so that any action in the function body gets by transitivity sequenced before anything that requires the function value; this used to be guaranteed by a sequence point, but the C++11 standard seems to have lost such guarantee; this might make calling a function ending with return i++; potentially UB where this is not intended, and used to be safe). Then one can also be clear about the "indeterminately sequenced" relation of functions calls: for every function call, and every evaluation that is not (directly or indirectly) part of evaluating that call, that evaluation shall be sequenced (either before or after) w.r.t. both entry and exit of that function call, and it shall have the same relation in both cases (so that in particular such external actions cannot be sequenced after function entry but before function exit, as is clearly desirable within a single thread).
Now to resolve points 1. and 3., I can see two paths (each affecting both points), which have different consequences for the defined or not behavior of our example:
Compound assignments with two operands, and three evaluations
Compound operations have thier two usual operands, an lvalue left operand and a prvalue right operand. To settle the unclarity of 3., it is included in 1.9:12 that fetching the value previously assigned to an object also may occur in compound assignments (rather than only for prvalue evaluation). The semantics of compount assignments are defined by changing 5.17:7 to
In a compound assignment op=, the value previously assigned to the object referred to by the left operand is fetched, the operator op is applied with this value as left operand and the right operand of op= as right operand, and the resulting value replaces that of the object referred to by the left operand.
(That gives two evaluations, the fetch and the side effect; a third evaluation is the trivial value computation of the compound operator, sequenced after both other evaluations.)
For clarity, state clearly in 1.9:15 that value computations in operands are sequenced before all value computations associated with the operator (rather than just those for the result of the operator), which ensures that evaluating the lvalue left operand is sequenced before fetching its value (one can hardly imagine otherwise), and also sequences the value computation of the right operand before that fetch, thus excluding UB in our example. While at it, I see no reason not to also sequence value computations in operands before any side effects associated with the operator (as they clearly must); this would make mentioning this explicitly for (compound) assignments in 5.17:1 superfluous. On the other hand do mention there that the value fetching in a compound assignment is sequenced before its side effect.
Compound assignments with three operands, and two evaluations
In order to obtain that the fetch in a compount assignment will be unsequenced with respect to the value computation of the right operand, making our example UB, the clearest way seems to be to give compound operators an implicit third (middle) operand, a prvalue, not represented by a separate expression, but obtained by lvalue-to-rvalue conversion from the left operand (this three-operand nature corresponds to the expanded form of compound assignments, but by obtaining the middle operand from the left operand, it is ensured that the value is fetched from the same object to which the result will be stored, a crucial guarantee that is only vaguely and implicitly given in the current formulation through the "except that E1 is evaluated only once" clause). The difference with the previous solution is that the fetch is now a genuine lvalue-to-rvalue conversion (since the middle operand is a prvalue) and is performed as part of the value computation of the operands to the compound assignment, which makes it naturally unsequenced with the value computation of the right operand. It should be stated somewhere (in a new clause that describes this implicit operand) that the value computation of the left operand is sequenced before this lvalue-to-rvalue conversion (it clearly must). Now 1.9:12 can be left as it is, and in place of 5.17:7 I propose
In a compound assignment op= with left operand a (an lvalue), and midlle and right operands brespectively c (both prvalues), the operator op is applied with b as left operand and c as right operand, and the resulting value replaces that of the object referred to by a.
(That gives one evaluation, the side effect, with as second evaluation the trivial value computation of the compound operator, sequenced after it.)
The still applicable changes to 1.9:15 and 5.17:1 suggested in the previous solution could still apply, but would not give our original example defined behavior. However the modified example at the top of this answer would still have defined behavior, unless the part 5.17:1 "compound assignment is a single operation" is scrapped or modified (there is a similar passage in 5.2.6 for postfix increment/decrement). The existence of those passages would suggest that detaching the fecth and store operations within a single compound assignement or postfix increment/decrement was not the intention of those who wrote the current standard (and by extension making our example UB), but this of course is mere guesswork.
From the compiler writer's perspective, they don't care about "i += ++i + 1", because whatever the compiler does, the programmer may not get the correct result, but they surely get what they deserve. And nobody writes code like that. What the compiler writer cares about is
*p += ++(*q) + 1;
The code must read *p and *q, increase *q by 1, and increase *p by some amount that is calculated. Here the compiler writer cares about restrictions on the order of read and write operations. Obviously if p and q point to different objects, the order makes no difference, but if p == q then it will make a difference. Again, p will be different from q unless the programmer writing the code is insane.
By making the code undefined, the language allows the compiler to produce the fastest possible code without caring for insane programmers. By making the code defined, the language forces the compiler to produce code that conforms to the standard even in insane cases, which may make it run slower. Both compiler writers and sane programmers don't like that.
So even if the behaviour is defined in C++11, it would be very dangerous to use it, because (a) a compiler might not be changed from C++03 behaviour, and (b) it might be undefined behaviour in C++14, for the reasons above.

L-Value, Pointer arithmetic [duplicate]

This question already has answers here:
Why ++i++ gives "L-value required error" in C? [duplicate]
(5 answers)
Closed 9 years ago.
I am looking for an explanation of how lines L1 and L2 in the code snippet below differ w.r.t l-values, i.e, Why am I getting the: C2105 error in L1, but not in L2?
*s = 'a';
printf("%c\n", *s );
//printf("%c\n", ++(*s)++ ); //L1 //error C2105: '++' needs l-value
printf("%c\n", (++(*s))++); //L2
printf("%c\n", (*s) );
Note: I got the above result when the code was compiled as a .cpp file. Now, on compilation as .c file, I get the same error C2105 on both lines L1 and L2. Why does L2 compile in C++, and not in C is another mystery :(.
If its of any help, I'm using Visual C++ Express Edition.
Compiler see ++(*s)++ as ++((*s)++), as post-increment has higher precedence than pre-increment. After post-incrementation, (*s)++ becomes an r-value and it can't be further pre-incremented (here).
And yes it is not a case of UB (at least in C).
And also read this answer.
For L2 in C++ not giving error because
C++11: 5.3.2 Increment and decrement says:
The operand of prefix ++ is modified by adding 1, or set to true if it is bool (this use is deprecated). The
operand shall be a modifiable lvalue. The type of the operand shall be an arithmetic type or a pointer to a completely-defined object type. The result is the updated operand; it is an lvalue, and it is a bit-field if the operand is a bit-field. If x is not of type bool, the expression ++x is equivalent to x+=1.
C++11:5.2.6 Increment and decrement says:
The value of a postfix ++ expression is the value of its operand. [ Note: the value obtained is a copy of the original value —end note ] The operand shall be a modifiable lvalue. The type of the operand shall be an arithmetic type or a pointer to a complete object type. The value of the operand object is modified by adding 1 to it, unless the object is of type bool, in which case it is set to true. The value computation of the ++ expression is sequenced before the modification of the operand object. With respect to an indeterminately-sequenced function call, the operation of postfix
++ is a single evaluation. [ Note: Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single postfix ++ operator. —end note ] The result is a
prvalue. The type of the result is the cv-unqualified version of the type of the operand.
and also on MSDN site it is stated that:
The operands to postfix increment and postfix decrement operators must be modifiable (not const) l-values of arithmetic or pointer type. The type of the result is the same as that of the postfix-expression, but it is no longer an l-value.
For completeness and quotes from C++ documentation:
Looking at L2, the prefix-increment/decrement returns an l-value. Hence, no error when performing the post-increment. From the C++ documentation for prefix-increment/decrement:
The result is an l-value of the same type as the operand.
Looking at L1, it becomes:
++( ( *s )++ )
... after operand precedence. The post-increment operator by definition evaluates the expression (returning an r-value) and then mutates it. From C++ documentation for post-increment/decrement:
The type of the result is the same as that of the postfix-expression, but it is no longer an l-value.
... and you cannot prefix-increment/decrement an r-value, hence error.
References:
Postfix Operand Doc.
Prefix Operand Doc.