Order of evaluation and undefined behaviour - c++

Speaking in the context of the C++11 standard (which no longer has a concept of sequence points, as you know) I want to understand how two simplest examples are defined.
int i = 0;
i = i++; // #0
i = ++i; // #1
There are two topics on SO which explain those examples within the C++11 context. Here it was said that #0 invokes UB and #1 is well-defined. Here it was said that both examples are undefined. This ambiguity confuses me much. I've read this well-structured reference three times already but the topic seems to be way too complicated for me.
.
Let's analyze the example #0: i = i++;.
Corresponding quotes are:
The value computation of the built-in postincrement and postdecrement
operators is sequenced before its side-effect.
The side effect (modification of the left argument) of the built-in
assignment operator and of all built-in compound assignment operators
is sequenced after the value computation (but not the side effects) of
both left and right arguments, and is sequenced before the value
computation of the assignment expression (that is, before returning
the reference to the modified object)
If a side effect on a scalar object is unsequenced relative to another
side effect on the same scalar object, the behavior is undefined.
As I get it, the side effect of the assignment operator is not sequenced with side effects of it's left and right arguments. Thus the side effect of the assignment operator is not sequenced with the side effects of i++. So #0 invokes an UB.
.
Let's analyze the example #1: i = ++i;.
Corresponding quotes are:
The side effect of the built-in preincrement and predecrement
operators is sequenced before its value computation (implicit rule due
to definition as compound assignment)
The side effect (modification of the left argument) of the built-in
assignment operator and of all built-in compound assignment operators
is sequenced after the value computation (but not the side effects) of
both left and right arguments, and is sequenced before the value
computation of the assignment expression (that is, before returning
the reference to the modified object)
If a side effect on a scalar object is unsequenced relative to another
side effect on the same scalar object, the behavior is undefined.
I can not see, how this example is different from the #0. This seems to be an UB for me for the very same reason as #0. The side effect of assignment is not sequenced with the side effect of ++i. It seems to be an UB. The topic liked above says it is well-defined. Why?
.
Question: how can I apply quoted rules to determine the UB of the examples. An as simple as possible explanation would be greatly appreciated. Thank you!

Since your quotes are not directly from the standard, I will try to give a detailed answer quoting the relevant parts of the standard. The definitions of "side effects" and "evaluation" is found in paragraph 1.9/12:
Accessing an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression (or a sub-expression) in general includes both value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and initiation of side effects.
The next relevant part is paragraph 1.9/15:
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced. [...] The value computations of the operands of an operator are sequenced before the value computation of the result of the operator. If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
Now let's see, how to apply this to the two examples.
i = i++;
This is the postfix form of increment and you find its definition in paragraph 5.2.6. The most relevant sentence reads:
The value computation of the ++ expression is sequenced before the modification
of the operand object.
For the assignment expression see paragraph 5.17. The relevant part states:
In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.
Using all the information from above, the evaluation of the whole expression is (this order is not guaranteed by the standard!):
value computation of i++ (right hand side)
value computation of i (left hand side)
modification of i (side effect of ++)
modification of i (side effect of =)
All the standard guarantees is that the value computations of the two operands is sequenced before the value computation of the assignment expression. But the value computation of the right hand side is only "reading the value of i" and not modifying i, the two modifications (side effects) are not sequenced with respect to each other and we get undefined behavior.
What about the second example?
i = ++i;
The situation is quite different here. You find the definition of prefix increment in paragraph 5.3.2. The relevant part is:
If x is not of type bool, the expression ++x is equivalent to x+=1.
Substituting that, our expression is equivalent to
i = (i += 1)
Looking up the compound assignment operator += in 5.17/7 we get that i += 1 is equivalent to i = i + 1 except that i is only evaluated once. Hence, the expression in question finally becomes
i = ( i = (i + 1))
But we already know from above that the value computation of the = is sequenced after the value computation of the operands and the side effects are sequenced before the value computations of =. So we get a well-defined order of evaluation:
compute value of i + 1 (and i - left hand side of inner expression)(#1)
initiate side effect of inner =, i.e. modify "inner" i
compute value of (i = i + 1), which is the "new" value of i
initiate side effect of outer =, i.e. modify "outer" i
compute value of full expression.
(#1): Here, i is only evaluated once, since i += 1 is equivalent to i = i + 1 except that i is only evaluated once (5.17/7).

The key difference is that ++i is defined as i += 1, so
i = ++i;
is the same as:
i = (i += 1);
Since the side effects of the += operator are sequenced before
the value computation of the operator, the actual modification
of i in ++i is sequenced before the outer assignment. This
follows directly from the sections you quote: "The side effect
(modification of the left argument) of the built-in assignment
operator and of all built-in compound assignment operators is
sequenced after the value computation (but not the side effects)
of both left and right arguments, and is sequenced before the
value computation of the assignment expression (that is, before
returning the reference to the modified object)"
This is due to the nested assignment operator; the (outer)
assignment operator only imposes sequenced before on the value
computation of its operands, not on their side effects. (But of
course, it doesn't undo sequencing imposed otherwise.)
And as you indirectly point out, this is new to C++11;
previously, both were undefined. The older versions of C++
used sequence points, rather than sequenced before, and there
was no sequence point in any of the assignment operators. (I
have the impression that the intent was that operators which
result in an lvalue have a value which is sequenced after any
side effects. In earlier C++, the expression *&++i was
undefined behavior; in C++11, it is guaranteed to be the same as
++i.)

Related

Evaluation order of side effects for assignment operator in C++11

I would very much appreciate it if someone could give a clarification on the sequencing of side effects for assignment statements in C++11. E.g., point me to the relevant standard text that deals with it.
The page on evaluation order on cpprefence.com states the following regarding assignments:
8) The side effect (modification of the left argument) of the built-in assignment operator and of all built-in compound assignment operators is sequenced after the value computation (but not the side effects) of both left and right arguments, and is sequenced before the value computation of the assignment expression (that is, before returning the reference to the modified object)
What is meant by "(but not the side effects)? Are the side effects unsequenced, inderminately sequenced or sequenced after the modification of the left argument (or perhaps even sequenced after the returning of the reference?
As an example when do the post-increment operations take place in:
while (*tgt++= *src++);
It seems clear from evaluation order that the value calculations are performed first, so *tgt and *src are calculated first. But is it known when post-increment side effects occur?
Edit #1:
Undefined behavior and sequence points does to my best understanding not answer my question. In fact it was the start of my descent into the "rabbit hole" that in the end led me to cppreference.com. What I specifically want to know is the definition of sequencing of side effects for the assignment operator in C++11. The question answered in Undefined behavior and sequence points is the relation between sequencing and the concepts of undefined, unspecied behaviour and impementation specific behaviour. Which, by the way, it answers very well.
End of Edit #1
Best regards
What is meant by "(but not the side effects)?
This remark emphasises the fact that the sentence makes no claims about sequencing of side effects.
Are the side effects unsequenced, inderminately sequenced or sequenced after the modification of the left argument (or perhaps even sequenced after the returning of the reference?
This is determined in paragraphs that discuss each specific side effect. For example, the side effect of the postfix increment operator is sequenced after its value computation, and it is stated that an indeterminately-sequenced function call cannot intervene. There are no other claims made about sequencing of this operator that I can find. If there are indeed none, one must conclude it is unsequenced w.r.t. the assignment.
First of all, note that C++17 introduced quite some changes to expression evaluation order.
Let's first see what the current standard draft has to say. I guess relevant here should be [intro.execution]/7
[…] Evaluation of an expression (or a subexpression) in general includes both value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and initiation of side effects. […]
and [intro.execution]/10
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced. […] The value computations of the operands of an operator are sequenced before the value computation of the result of the operator. […]
and finally [expr.ass]/1
[…] In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.
The right operand is sequenced before the left operand. […]
Based on this, I would conclude that in
while (*tgt++ = *src++);
the evaluation of *src is sequenced before the evaluation of *tgt while the side-effects of each increment as well as the assignment are all unsequenced with respect to each other. Since the condition in a while loop is a full-expression, all evaluations and side effects occurring in one iteration of the loop are sequenced before the evaluations and side effects of the next iteration.
As far as I can see, in C++ 11, the evaluations of *src and *tgt were unsequenced with respect to each other but sequenced before the side effect of assignment. The side effects of the increments and the assignment were also unsequenced with respect to each other.

Function parameters: intedeterminately sequenced or unsequenced? [duplicate]

This question already has answers here:
What are the evaluation order guarantees introduced by C++17?
(3 answers)
Closed 2 years ago.
On cppreference I see the following text:
In a function call, value computations and side effects of the initialization of every parameter are indeterminately sequenced with respect to value computations and side effects of any other parameter.
Hovewer, I wasn't able to find any confirmation of this in C++17 standard.
Function parameters, as subexpressions, should comply with [intro.execution.17]:
Except where noted, evaluations of operands of individual operators and of subexpressions of individual
expressions are unsequenced. [ Note: In an expression that is evaluated more than once during the execution
of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be
performed consistently in different evaluations. — end note ] The value computations of the operands of
an operator are sequenced before the value computation of the result of the operator. If a side effect on a
memory location (4.4) is unsequenced relative to either another side effect on the same memory location or a
value computation using the value of any object in the same memory location, and they are not potentially
concurrent (4.7), the behavior is undefined. [ Note: The next section imposes similar, but more complex
restrictions on potentially concurrent computations. — end note ]
Which means, that function parameters calculation should be unsequenced unless it's prohibited by any other points in the standard. I tried to find a substring "indeterminately" in the standard text, and none of 10 occurrences look relevant to function call arguments.
So, the question is: are the function parameters unsequenced or indeterminately sequenced in C++17?
[expr.call]/5 The postfix-expression is sequenced before each expression in the expression-list and any default argument. The initialization of a parameter, including every associated value computation and side effect, is indeterminately sequenced with respect to that of any other parameter.
Emphasis mine.

Interaction between post-decrement and logical operators

For example, in the following expression
i-- && expr
Will i be already decremented when evaluating expr? Language-lawyers would be adapt here.
If the && operator is the built-in operator, then yes. From [expr.log.and]/2:
If the second expression is evaluated, every value computation and side effect associated with the first expression is sequenced before every value computation and side effect associated with the second expression.
If the operator is overloaded, it is a normal function call, and the order of evaluation of the function call arguments is unspecified.

Do parentheses force order of evaluation and make an undefined expression defined?

I was just going though my text book when I came across this question:
What would be the value of a after the following expression? Assume the initial value of a = 5. Mention the steps.
a+=(a++)+(++a)
At first I thought this is undefined behaviour because a has been modified more than once. Then I read the question and it said "Mention the steps" so I probably thought this question is right.
Does applying parentheses make an undefined behaviour defined?
Is a sequence point created after evaluating a parentheses expression?
If it is defined,how do the parentheses matter since ++ and () have the same precedence?
No, applying parentheses doesn't make it a defined behaviour. It's still undefined. The C99 standard §6.5 ¶2 says
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be read only to determine the value
to be stored.
Putting a sub-expression in parentheses may force the order of evaluation of sub-expressions but it does not create a sequence point. Therefore, it does not guarantee when the side effects of the sub-expressions, if they produce any, will take place. Quoting the C99 standard again §5.1.2.3¶2
Evaluation of an expression may produce side effects. At certain
specified points in the execution sequence called sequence points, all
side effects of previous evaluations shall be complete and no side
effects of subsequent evaluations shall have taken place.
For the sake of completeness, following are sequence points laid down by the C99 standard in Annex C.
The call to a function, after the arguments have been evaluated.
The end of the first operand of the following operators: logical AND &&; logical OR ||; conditional ?; comma ,.
The end of a full declarator.
The end of a full expression; the expression in an expression statement; the controlling expression of a selection statement (if
or switch); the controlling expression of a while or do
statement; each of the expressions of a for statement; the
expression in a return statement.
Immediately before a library function returns.
After the actions associated with each formatted input/output function conversion specifier.
Immediately before and immediately after each call to a comparison function, and also between any call to a comparison function and any
movement of the objects passed as arguments to that call.
Adding parenthesis does not create a sequence point and in the more modern standards it does not create a sequenced before relationship with respect to side effects which is the problem with the expression that you have unless noted the rest of this will be with respect to C++11. Parenthesis are a primary expression covered in section 5.1 Primary expressions, which has the following grammar (emphasis mine going forward):
primary-expression:
literal
this
( expression )
[...]
and in paragraph 6 it says:
A parenthesized expression is a primary expression whose type and value are identical to those of the enclosed expression. The presence of parentheses does not affect whether the expression is an lvalue. The parenthesized expression can be used in exactly the same contexts as those where the enclosed expression can be used, and with the same meaning, except as otherwise indicated.
The postfix ++ is problematic since we can not determine when the side effect of updating a will happen pre C++11 and in C this applies to both the postfix ++ and prefix ++ operations. With respect to how undefined behavior changed for prefix ++ in C++11 see Assignment operator sequencing in C11 expressions.
The += operation is problematic since:
[...]E1 op = E2 is equivalent to E1 = E1 op E2 except that E1 is
evaluated only once[...]
So in C++11 the following went from undefined to defined:
a = ++a + 1 ;
but this remains undefined:
a = a++ + 1 ;
and both of the above are undefined pre C++11 and in both C99 and C11.
From the draft C++11 standard section 1.9 Program execution paragraph 15 says:
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced. [ Note: In an expression that is evaluated more than once during the execution of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be performed consistently in different evaluations. —end note ] The value computations of the operands of an operator are sequenced before the value computation of the result of the operator. If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

Sequence points and partial order

A few days back there was a discussion here about whether the expression
i = ++i + 1
invokes UB
(Undefined Behavior) or not.
Finally the conclusion was made that it invokes UB as the value of 'i' is changing more than once between two sequence points.
I was involved in a discussion with Johannes Schaub in that same thread. According to him
i=(i,i++,i)+1 ------ (1) /* invokes UB as well */
I said (1) does not invoke UB because the side effects of the previous subexpressions are cleared by the comma operator ',' between i and i++ and between i++ and i.
Then he gave the following explanation:
"Yes the sequence point after i++ completes all side effects before it, but there is nothing that stops the assignment side effect overlapping with the side effect of i++.The underlying problem is that the side effect of an assignment is not specified to happen after or before the evaluation of both operands of the assignment, and so sequence points cannot do anything with regard to protecting this: Sequence points induce a partial order: Just because there is a sequence point after and before i++ doesn't mean all side effects are sequenced with regard to i.
Also, notice that merely a sequence point means nothing: The order of evaluations isn't dictated by the form of code. It's dictated by semantic rules. In this case, there is no semantic rule saying when the assignment side effect happens with regard to evaluating both of its operands or subexpressions of those operands".
The statement written in "bold" confused me. As far as I know:
"At certain specified points in the execution sequence called sequence points,all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place."
Since,comma operators also specify execution order the side effect of i++ have been cancelled when we reach the last i.He(Johannes) would have been right had the order of evaluation been not specified(but in case of comma operator it is well specified).
So I just want to know whether (1) invokes UB or not?. Can someone give another valid explanation?
Thanks!
The C standard says this about assignment operators (C90 6.3.16 or C99 6.5.16 Assignment operators):
The side effect of updating the stored value of the left operand shall occur between the previous and the next sequence point.
It seems to me that in the statement:
i=(i,i++,i)+1;
the sequence point 'previous' to the assignment operator would be the second comma operator and the 'next' sequence point would be the end of the expression. So I'd say that the expression doesn't invoke undefined behavior.
However, this expression:
*(some_ptr + i) = (i,i++,i)+1;
would have undefined behavior because the order of evaluation of the 2 operands of the assignment operator is undefined, and in this case instead of the problem being when the assignment operator's side effect takes place, the problem is you don't know whether the value of i used in the left handle operand will be evaluated before or after the right hand side. This order of evaluation problem doesn't occur in the first example because in that expression the value of i isn't actually used in the left-hand side - all that the assignment operator is interested in is the "lvalue-ness" of i.
But I also think that all this is sketchy enough (and my understanding of the nuances involved are sketchy enough) that I wouldn't be surprised if someone can convince me otherwise (on either count).
i=(i,i++,i)+1 ------ (1) /* invokes UB as well */
It does not invoke undefined behaviour. The side effect of i++ will take place before the evaluation of the next sequence point, which is denoted by the comma following it, and also before the assignment.
Nice language sudoku, though. :-)
edit: There's a more elaborate explanation here.
I believe that the following expression definitely has undefined behaviour.
i + ((i, i++, i) + 1)
The reason is that the comma operator specifies sequence points between the subexpressions in parentheses but does not specify where in that sequence the evaluation of the left hand operand of + occurs. One possibility is between the sequence points surrounding i++ and this violates the 5/4 as i is written to between two sequence points but is also read twice between the same sequence points and not just to determine the value to be stored but also to determine the value of the first operand to the + operator.
This also has undefined behaviour.
i += (i, i++, i) + 1;
Now, I am not so sure about this statement.
i = (i, i++, i) + 1;
Although the same principals apply, i must be "evaluated" as a modifiable lvalue and can be done so at any time, but I'm not convinced that its value is ever read as part of this. (Or is there another restriction that the expression violates to cause UB?)
The sub-expression (i, i++, i) happens as part of determining the value to be stored and that sub-expression contains a sequence point after the storage of a value to i. I don't see any way that this wouldn't require the side effect of i++ to be complete before the determination of the value to be stored and hence the earliest possible point that the assignment side effect could occur.
After this sequnce point i's value is read at most once and only to determine the value that will be stored back to i, so this last part is fine.