Post-increment operator behaviour w.r.t comma operator? - c++

In the following code:
int main() {
int i, j;
j = 10;
i = (j++, j+100, 999+j);
cout << i;
return 0;
}
The output is 1010.
However shouldn't it be 1009, as ++ should be done after the whole expression is used?

The comma operator is a sequence point: as it says in the C++17 standard for example,
Every value computation and side effect associated with the left expression is sequenced
before every value computation and side effect associated with the right expression.
Thus, the effect of the ++ operator is guaranteed to occur before 999+j is evaluated.

++ should be done after the whole expression is used?
No. The postfix operator evaluates to the value of the old j and has the side effect of incrementing j.
Comma operator evaluates the second operand after the first operand is evaluated and its side-effects are evaluated.
A pair of expressions separated by a comma is evaluated left-to-right;
the left expression is a discarded- value expression (Clause 5)83.
Every value computation and side effect associated with the left
expression is sequenced before every value computation and side effect
associated with the right expression.
https://stackoverflow.com/a/7784819/2805305

Associativity of the comma operator is left to right.
So starting from j++, this will be evaluated first (j becomes 11)
Then j + 100 is evaluated (no use)
Then 999 + j is evaluated which is equal to 1010
This rightmost value is assigned to i
Thus, the output is 1010
Long Answer:
Built-in comma operator
The comma operator expressions have the form
E1 , E2
In a comma expression E1, E2, the expression E1 is evaluated, its
result is discarded (although if it has class type, it won't be
destroyed until the end of the containing full expression), and its
side effects are completed before evaluation of the expression E2
begins (note that a user-defined operator, cannot guarantee
sequencing) (until C++17).
This already answers your question, but I'll walk through it with reference to your code:
Start with something simple like
int value = (1 + 2, 2 + 3, 4 + 5); // value is assigned 9
Because ...the expression E1 is evaluated, its result is discarded... Here, since we have more than 2 operands, the associativity of the comma operator also comes into play.
However shouldn't it be 1009, as '++" should be done after the whole
expression is used?
Now see:
int j = 0;
int i = (j++, 9 + j);
Here, the value of i is 10 because ...and its side effects are completed before evaluation of the expression E2 begins... Hence, the incrementation of j has its effect before the evaluation of 9 + j.
I think now you can clearly understand why your
j = 10;
i = (j++, j+100, 999+j);
i is assigned a value of 1010.

Related

Do the multiple postfix-expression(subscripting) evaluations result in UB

#include <iostream>
int main(){
int arr[7] = {0,1,2,3,4,3,2};
arr[0]++[arr]++[arr]++[arr]++[arr]++[arr]++[arr] = 5; //#1
for(auto i = 0;i<7;i++){
std::cout<<i<<" : "<< arr[i]<<std::endl;
}
}
Consider the above code, Is this evaluation at #1 would result in UB? This is a example I have saw in twitter.
According to the evaluation sequence for postfix ++:
expr.post.incr#1
The value computation of the ++ expression is sequenced before the modification of the operand object.
That means, such a example would result in UB
int arr[2] = {0};
(*(arr[0]++ + arr))++
Because, the side effect caused by expression arr[0]++ and (*(arr[0]++) + arr))++ are unsequenced and applied to the same memory location.
However, for the first example, It's difference. My argument is:
expr.sub#1
The expression E1[E2] is identical (by definition) to *((E1)+(E2)),..., The expression E1 is sequenced before the expression E2.
That means, every value computation and side effect associated with E1 are both sequenced before every value computation and side effect associated with E2.
To simplify the expression at #1, according to the grammar of expression, such a expression should conform to:
expr.ass
logical-or-expression assignment-operator initializer-clause
And expr.post#1
Where the logical-or-expression here is a postfix-expression. That is,
postfix-expression [ arr ] = 5;
Where the postfix-expression has the form postfix-expression ++, which in turn, the postfix-expression has the form postfix-expression[arr]. In simple, the left operand of assignment consists of two kinds of postfix-expressions, they alternate combination with each other.
Postfix expressions group left-to-right
So, let the subscript operation has the form E1[E2] and the postfix++ expression has the form PE++, then for the first example, it will give a following decomposition like this:
E1': arr[0]++
E2': arr
E1'[E2']: arr[0]++[arr]
PE'++ : E1'[E2']++
E1'': PE'++
E2'': arr
E1''[E2'']: PE'++[arr]
PE''++: E1''[E2''] ++
and so on...
That means, in order to caculate PE'++, E1'[E2'] should be calculated prior PE'++, which is identical to *((E1')+E2'), per the rule E1' is sequenced before E2', hence the side effect caused by E1' is sequenced before value computation for E2'.
In other words, each side effect caused by postfix++ expression must be evaluated prior to that expression combined with the subsequent [arr].
So, in this way, I think such a code at #1 should have well-defined behavior rather than UB. Is there anything I misunderstand? Whether the code is UB or not? If it's not UB, what's the correct result the code will give?
I believe your understanding is fine and the code is fine in C++ from C++17.
Is there anything I misunderstand?
No.
Whether the code is UB or not?
No.
If it's not UB, what's the result?
arr[0]++[arr]++[arr]++[arr]++[arr]++[arr]++[arr] = 5;
Side effect: arr[0] := 0 + 1 = 1
0[arr]++[arr]++[arr]++[arr]++[arr]++[arr] = 5;
Side effect: arr[0] := 1 + 1 = 2
1[arr]++[arr]++[arr]++[arr]++[arr] = 5;
Side effect: arr[1] := 1 + 1 = 2
1[arr]++[arr]++[arr]++[arr] = 5;
Side effect: arr[1] := 2 + 1 = 3
2[arr]++[arr]++[arr] = 5;
Side effect: arr[2] := 2 + 1 = 3
2[arr]++[arr] = 5;
Side effect: arr[2] := 3 + 1 = 4
3[arr] = 5;
Side effect: arr[3] := 5
I see the output would be:
0 : 2
1 : 3
2 : 4
3 : 5
4 : 4
5 : 3
6 : 2
Note that the part The expression E1 is sequenced before the expression E2 was added in C++17.
The code is undefined before C++17 and is undefined in C (the tweet was about C code), because in arr[0]++[arr]++ both side effects on arr[0] from ++ are unsequenced to each other.
The multiply postfix incremented expression is not UB by itself because the standard mandates that:
The value computation of the ++ expression is sequenced before
the modification of the operand object.
So in each x++[arr], the postfix incrementation is defered after the value computation and will end into arr[0][arr][arr][arr][arr][arr][arr] and finaly (as arr[0] is initialy 0) into arr[0].
Then you will have a number of incrementation to apply here to the same arr[0] element, and as demonstrated by KalilCuk, all will be fine on that part alone at least in C++17. As arr[0] is 0 all those post incrementation will apply on arr[0], still without UB in that specific user case even before C++17.
The problem is that the assignment is not a sequence point. So a compiler can choose to first apply the assignment and then increment the result (which would give 11), or to first increment the value (which will become 6) and then process the assignment with a final result of 5.
So you are invoking the same UB as the simpler:
i = 0;
i++ = 1; // 1 or 2 ?
Probably not the one you expected, but still UB...

In C and C++, is an expression using the comma operator like "a = b, ++a;" undefined?

Take these three snippets of C code:
1) a = b + a++
2) a = b + a; a++
3) a = b + a, a++
Everyone knows that example 1 is a Very Bad Thing, and clearly invokes undefined behavior. Example 2 has no problems. My question is regarding example 3. Does the comma operator work like a semicolon in this kind of expression? Are 2 and 3 equivalent or is 3 just as undefined as 1?
Specifically I was considering this regarding something like free(foo), foo = bar. This is basically the same problem as above. Can I be sure that foo is freed before it's reassigned, or is this a clear sequence point problem?
I am aware that both examples are largely pointless and it makes far more sense to just use a semicolon and be done with it. I'm just asking out of curiosity.
Case 3 is well defined.
First, let's look at how the expression is parsed:
a = b + a, a++
The comma operator , has the lowest precedence, followed by the assignment operator =, the addition operator + and the postincrement operator ++. So with the implicit parenthesis it is parsed as:
(a = (b + a)), (a++)
From here, section 6.5.17 of the C standard regarding the comma operator , says the following:
2 The left operand of a comma operator is evaluated as a void expression; there is a sequence point between its
evaluation and that of the right operand. Then the right
operand is evaluated; the result has its type and value
Section 5.14 p1 of the C++11 standard has similar language:
A pair of expressions separated by a comma is evaluated left-to-right;
the left expression is a discarded- value expression.
Every value computation and side effect associated with the left
expression is sequenced before every value computation and side effect
associated with the right expression. The type and value of the result
are the type and value of the right operand; the result is of the same
value category as its right operand, and is a bit-field if its right
operand is a glvalue and a bit-field.
Because of the sequence point, a = b + a is guaranteed to be fully evaluated before a++ in the expression a = b + a, a++.
Regarding free(foo), foo = bar, this also guarantees that foo is free'ed before a new value is assigned.
a = b + a, a++; is well-defined, but a = (b + a, a++); can be undefined.
First of all, the operator precedence makes the expression equivalent to (a = (b+a)), a++;, where + has the highest precedence, followed by =, followed by ,. The comma operator includes a sequence point between the evaluation of its left and right operand. So the code is, uninterestingly, completely equivalent to:
a = b + a;
a++;
Which is of course well-defined.
Had we instead written a = (b + a, a++);, then the sequence point in the comma operator wouldn't save the day. Because then the expression would have been equivalent to
(void)(b + a);
a = a++;
In C and C++14 or older, a = a++ is unsequenced , (see C11 6.5.16/3). Meaning this is undefined behavior (Per C11 6.5/2). Note that C++11 and C++14 were badly formulated and ambiguous.
In C++17 or later, the operands of the = operator are sequenced right to left and this is still well-defined.
All of this assuming no C++ operator overloading takes place. In that case, the parameters to the overloaded operator function will be evaluated, a sequence point takes place before the function is called, and what happens from there depends on the internals of that function.

What is the difference between a sequence point and operator precedence?

Consider the classical sequence point example:
i = i++;
The C and C++ standards state that the behavior of the above expression is undefined because the = operator is not associated with a sequence point.
What confuses me is that ++ has a higher precedence than = and so, the above expression, based on precedence, must evaluate i++ first and then do the assignment. Thus, if we start with i = 0, we should always end up with i = 0 (or i = 1, if the expression was i = ++i) and not undefined behavior. What am I missing?
All operators produce a result. In addition, some operators, such as assignment operator = and compound assignment operators (+=, ++, >>=, etc.) produce side effects. The distinction between results and side effects is at the heart of this question.
Operator precedence governs the order in which operators are applied to produce their results. For instance, precedence rules require that * goes before +, + goes before &, and so on.
However, operator precedence says nothing about applying side effects. This is where sequence points (sequenced before, sequenced after, etc.) come into play. They say that in order for an expression to be well-defined, the application of side effects to the same location in memory must be separated by a sequence point.
This rule is broken by i = i++, because both ++ and = apply their side effects to the same variable i. First, ++ goes, because it has higher precedence. It computes its value by taking i's original value prior to the increment. Then = goes, because it has lower precedence. Its result is also the original value of i.
The crucial thing that is missing here is a sequence points separating side effects of the two operators. This is what makes behavior undefined.
Operator precedence (and associativity) state the order in which an expression is parsed and executed. However, this says nothing about the order of evaluation of the operands, which is a different term. Example:
a() + b() * c()
Operator precedence dictates that the result of b() and the result of c() must be multiplied before added together with the result of a().
However, it says nothing about the order in which these functions should be executed. The order of evaluation of each operator specifies this. Most often, the order of evaluation is unspecified (unspecified behavior), meaning that the standard lets the compiler do it in any order it likes. The compiler need not document this order nor does it need to behave consistently. The reason for this is to give compilers more freedom in expression parsing, meaning faster compilation and possibly also faster code.
In the above example, I wrote a simple test program and my compiler executed the above functions in the order a(), b(), c(). The fact that the program needs to execute both b() and c() before it can multiply the results, doesn't mean that it must evaluate those operands in any given order.
This is where sequence points come in. It is a given point in the program where all previous evaluations (and operations) must be done. So sequence points are mostly related to order of evaluation and not so much operator precedence.
In the example above, the three operands are unsequenced in relation to each other, meaning that no sequence point dictates the order of evaluation.
Therefore it turns problematic when side effects are introduced in such unsequenced expressions. If we write i++ + i++ * i++, then we still don't know the order in which these operands are evaluated, so we can't determine what the result will be. This is because both + and * have unspecified/unsequenced order of evaluation.
Had we written i++ || i++ && i++, then the behavior would be well-defined, because the && and || specifies the order of evaluation to be left-to-right and there is a sequence point between the evaluation of the left and the right operand. Thus if(i++ || i++ && i++) is perfectly portable and safe (although unreadable) code.
As for the expression i = i++;, the problem here is that the = is defined as (6.5.16):
The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.
This expression is actually close to be well-defined, because the text actually says that the left operand should not be updated before the right operand is computed. The problem is the very last sentence: the order of evaluation of the operands is unspecified/unsequenced.
And since the expression contains the side effect of i++, it invokes undefined behavior, since we can't know if the operand i or the operand i++ is evaluated first.
(There's more to it, since the standard also says that an operand should not be used twice in an expression for unrelated purposes, but that's another story.)
Operator precedence and order of evaluation are two different things. Let's have a look at them one by one:
Operator precedence rule: In an expression operands bound tighter to the operators having higher precedence.
For example
int a = 5;
int b = 10;
int c = 2;
int d;
d = a + b * c;
In the expression a + b * c, precedence of * is higher than that of + and therefore, b and c will bind to * and expression will be parsed as a + (b * c).
Order of evaluation rule: It describes how operands will be evaluated in an expression. In the statement
d = a>5 ? a : ++a;
a is guaranteed to be evaluated before evaluation of ++b or c.
But for the expression a + (b * c), though * has higher precedence than that of +, it is not guaranteed that a will be evaluated either before or after b or c and not even b and c ordered for their evaluation. Even a, b and c can evaluate in any order.
The simple rule is that: operator precedence is independent from order of evaluation and vice versa.
In the expression i = i++, higher precedence of ++ just tells the compiler to bind i with ++ operator and that's it. It says nothing about order of evaluation of the operands or which side effect (the one by = operator or one by ++) should take place first. Compiler is free to do anything.
Let's rename the i at left of assignment be il and at the right of assignment (in the expression i++) be ir, then the expression be like
il = ir++ // Note that suffix l and r are used for the sake of clarity.
// Both il and ir represents the same object.
Now compiler is free to evaluate the expression il = ir++ either as
temp = ir; // i = 0
ir = ir + 1; // i = 1 side effect by ++ before assignment
il = temp; // i = 0 result is 0
or
temp = ir; // i = 0
il = temp; // i = 0 side effect by assignment before ++
ir = ir + 1; // i = 1 result is 1
resulting in two different results 0 and 1 which depends on the sequence of side effects by assignment and ++ and hence invokes UB.

Double assignment of the same variable in one expression in C++11

The C++11 standard (5.17, expr.ass) states that
In all cases, the assignment is sequenced after the value computation
of the right and left operands, and before the value computation of
the assignment expression. With respect to an
indeterminately-sequenced function call, the operation of a compound
assignment is a single evaluation
As I understand it, all expressions which are a part of the given assignment will be evaluated before the assignment itself. This rule should work even if I modify the same variable twice in the same assignment, which, I am fairly certain, was undefined behavior before.
Will the given code:
int a = 0;
a = (a+=1) = 10;
if ( a == 10 ) {
printf("this is defined");
} else {
printf("undefined");
}
always evaluate to a==10?
Yes, there was a change between C++98 and C++11. I believe your example to be well-defined under C++11 rules, while exhibiting undefined behavior under C++98 rules.
As a simpler example, x = ++x; is undefined in C++98 but is well-defined in C++11. Note that x = x++; is still undefined (side effect of post-increment is unsequenced with the evaluation of the expression, while side effect of pre-increment is sequenced before the same).
Let's rewrite your code as
E1 = (E2 = E3)
where E1 is the expression a, E2 is the expression a += 1 and E3 is the expression 10. Here we ussed, that the assignment operator groups right-to-left (§5.17/1 in C++11 Standard).
§5.17/1 moreover states:
In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.
Applying this to our expression means that we first must evaluate the subexpressions E1 and E2 = E3. Note that there is no "sequenced-before" relationship between these two evaluations, but that causes no problems.
The evaluation of the id-expression E1 is trivial (the result is a itself). The evaluation of the assignment-expression E2 = E3 proceeds as follows:
First both subexpressions have to be evaluated. The evaluation of the literal E3is again trivial (gives a prvalue of value 10).
The evaluation of the (compound) assignment-expression E2 is done in the following steps:
1) The behavior of a += 1is equivalent to a = a + 1 but a is only evaluated once (§5.17/7). After evaluating the subexpressions a and 1 (in an arbitrary order), an lvalue-to-rvalue conversion is applied to a in order to read the value stored in a.
2) The values of a (which is 0) and of 1 are added (a + 1) and the result of this addition is a prvalue of value 1.
3) Before we can compute the result of the assignment a = a + 1 the value of the object the left operand refers to is replaced by the value of the right operand (§5.17/2). The result of E2 is then an lvalue refereing to the new value 1. Note that the side effect (updating the value of the left operand) is sequenced before the value computation of the assignment expression. This is §5.17/1 cited above.
Now that we have evaluated the subexpressions E2and E3, the value of the expression E2refers to is replaced by the value of E3, which is 10. Hence the result of E2 = E3 is an lvalue of value 10.
Finally, the value expression E1 refers to is replaced by the value of the expression E2 = E3, which we computed to be 10. Thus, the variable aends up to contain the value 10.
Since all these steps are well-defined, the whole expression yields a well-defined value.
After doing a little research, I am convinced your codes behaviour is well defined in C++11.
$1.9/15 states:
The value computations of the operands of an operator are sequenced before
the value computation of the result of the operator.
$5.17/1 states:
The assignment operator (=) and the compound assignment operators all group
right-to-left.
If I understand correctly, in your example
a = (a+=1) = 10;
this implies that the value computations of (a+=1) and 10 have to be made before the value computation of (a+=1) = 10 and the value computation of this expression has to be finished before a = (a+=1) = 10; is evaluated.
$5.17/1 states:
In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.
This implies that the assignment must happen before the value computation, and therefore, due to transitivity, the evaluation of (a+=1) = 10 can only begin after the assignment a+=1 (Because its value may only be computed after the side effect).
The same is true for the second and third assignment.
See also this excellent answer, which explains the sequenced-before relation in much more detail and way better than I could.

How does this piece of code work?

1.)
int i;
for(i=1;i<5,i<8;i++){
}
printf("%d",i);
2.)
int i;
for(i=1;i<18,i<6;i++){
}
printf("%d",i);
output for 1.) is 8 while for 2.) is 6
I am not getting how the code works, Help will be highly appreciated.
The , operator evaluates to its last operand.
i < 18, i < 6 becomes false when i is 6.
Comma operator ( , )
The comma operator (,) is used to separate two or more expressions that are included where only one expression is expected. When the set of expressions has to be evaluated for a value, only the rightmost expression is considered.
Hence:
for(i=1;i<5,i<8;i++)
is equivalent to:
for(i=1;i<8;i++)
Which evaluates value of i to 8
And
for(i=1;i<18,i<6;i++)
is equivalent to:
for(i=1;i<6;i++)
Which evaluates value of i to 6
Standerdese Reference:
C++11 Standard §5.18:
The comma operator groups left-to-right.
expression:
assignment-expression
expression , assignment-expression
A pair of expressions separated by a comma is evaluated left-to-right; the left expression is a discarded- value expression (Clause 5)83. Every value computation and side effect associated with the left expression is sequenced before every value computation and side effect associated with the right expression. The type and value of the result are the type and value of the right operand; the result is of the same value category as its right operand, and is a bit-field if its right operand is a glvalue and a bit-field.