Consider the classical sequence point example:
i = i++;
The C and C++ standards state that the behavior of the above expression is undefined because the = operator is not associated with a sequence point.
What confuses me is that ++ has a higher precedence than = and so, the above expression, based on precedence, must evaluate i++ first and then do the assignment. Thus, if we start with i = 0, we should always end up with i = 0 (or i = 1, if the expression was i = ++i) and not undefined behavior. What am I missing?
All operators produce a result. In addition, some operators, such as assignment operator = and compound assignment operators (+=, ++, >>=, etc.) produce side effects. The distinction between results and side effects is at the heart of this question.
Operator precedence governs the order in which operators are applied to produce their results. For instance, precedence rules require that * goes before +, + goes before &, and so on.
However, operator precedence says nothing about applying side effects. This is where sequence points (sequenced before, sequenced after, etc.) come into play. They say that in order for an expression to be well-defined, the application of side effects to the same location in memory must be separated by a sequence point.
This rule is broken by i = i++, because both ++ and = apply their side effects to the same variable i. First, ++ goes, because it has higher precedence. It computes its value by taking i's original value prior to the increment. Then = goes, because it has lower precedence. Its result is also the original value of i.
The crucial thing that is missing here is a sequence points separating side effects of the two operators. This is what makes behavior undefined.
Operator precedence (and associativity) state the order in which an expression is parsed and executed. However, this says nothing about the order of evaluation of the operands, which is a different term. Example:
a() + b() * c()
Operator precedence dictates that the result of b() and the result of c() must be multiplied before added together with the result of a().
However, it says nothing about the order in which these functions should be executed. The order of evaluation of each operator specifies this. Most often, the order of evaluation is unspecified (unspecified behavior), meaning that the standard lets the compiler do it in any order it likes. The compiler need not document this order nor does it need to behave consistently. The reason for this is to give compilers more freedom in expression parsing, meaning faster compilation and possibly also faster code.
In the above example, I wrote a simple test program and my compiler executed the above functions in the order a(), b(), c(). The fact that the program needs to execute both b() and c() before it can multiply the results, doesn't mean that it must evaluate those operands in any given order.
This is where sequence points come in. It is a given point in the program where all previous evaluations (and operations) must be done. So sequence points are mostly related to order of evaluation and not so much operator precedence.
In the example above, the three operands are unsequenced in relation to each other, meaning that no sequence point dictates the order of evaluation.
Therefore it turns problematic when side effects are introduced in such unsequenced expressions. If we write i++ + i++ * i++, then we still don't know the order in which these operands are evaluated, so we can't determine what the result will be. This is because both + and * have unspecified/unsequenced order of evaluation.
Had we written i++ || i++ && i++, then the behavior would be well-defined, because the && and || specifies the order of evaluation to be left-to-right and there is a sequence point between the evaluation of the left and the right operand. Thus if(i++ || i++ && i++) is perfectly portable and safe (although unreadable) code.
As for the expression i = i++;, the problem here is that the = is defined as (6.5.16):
The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.
This expression is actually close to be well-defined, because the text actually says that the left operand should not be updated before the right operand is computed. The problem is the very last sentence: the order of evaluation of the operands is unspecified/unsequenced.
And since the expression contains the side effect of i++, it invokes undefined behavior, since we can't know if the operand i or the operand i++ is evaluated first.
(There's more to it, since the standard also says that an operand should not be used twice in an expression for unrelated purposes, but that's another story.)
Operator precedence and order of evaluation are two different things. Let's have a look at them one by one:
Operator precedence rule: In an expression operands bound tighter to the operators having higher precedence.
For example
int a = 5;
int b = 10;
int c = 2;
int d;
d = a + b * c;
In the expression a + b * c, precedence of * is higher than that of + and therefore, b and c will bind to * and expression will be parsed as a + (b * c).
Order of evaluation rule: It describes how operands will be evaluated in an expression. In the statement
d = a>5 ? a : ++a;
a is guaranteed to be evaluated before evaluation of ++b or c.
But for the expression a + (b * c), though * has higher precedence than that of +, it is not guaranteed that a will be evaluated either before or after b or c and not even b and c ordered for their evaluation. Even a, b and c can evaluate in any order.
The simple rule is that: operator precedence is independent from order of evaluation and vice versa.
In the expression i = i++, higher precedence of ++ just tells the compiler to bind i with ++ operator and that's it. It says nothing about order of evaluation of the operands or which side effect (the one by = operator or one by ++) should take place first. Compiler is free to do anything.
Let's rename the i at left of assignment be il and at the right of assignment (in the expression i++) be ir, then the expression be like
il = ir++ // Note that suffix l and r are used for the sake of clarity.
// Both il and ir represents the same object.
Now compiler is free to evaluate the expression il = ir++ either as
temp = ir; // i = 0
ir = ir + 1; // i = 1 side effect by ++ before assignment
il = temp; // i = 0 result is 0
or
temp = ir; // i = 0
il = temp; // i = 0 side effect by assignment before ++
ir = ir + 1; // i = 1 result is 1
resulting in two different results 0 and 1 which depends on the sequence of side effects by assignment and ++ and hence invokes UB.
Related
I think, the assignment operators like =, +=... don't guarantee the order of evaluation of their operands, So It is usually Undefined Behavior to modify the same object in the same expression whose operator doesn't guarantee the order of evaluation of its operands.
My problem is why then I use this in my programs and in many examples:
int x = 0;
x = x + 1;
So is it UB in the assignment expression above?
Contrary to what some other people have said here, a single side effect (modification) can result in UB if it's unsequenced relative to a value computation.
However, there's a rule that states "The side effect (modification of the left argument) of the built-in assignment operator and of all built-in compound assignment operators is sequenced after the value computation (but not the side effects) of both left and right arguments" (Source: cppreference.com) which is why x = x + 1 isn't UB.
Take these three snippets of C code:
1) a = b + a++
2) a = b + a; a++
3) a = b + a, a++
Everyone knows that example 1 is a Very Bad Thing, and clearly invokes undefined behavior. Example 2 has no problems. My question is regarding example 3. Does the comma operator work like a semicolon in this kind of expression? Are 2 and 3 equivalent or is 3 just as undefined as 1?
Specifically I was considering this regarding something like free(foo), foo = bar. This is basically the same problem as above. Can I be sure that foo is freed before it's reassigned, or is this a clear sequence point problem?
I am aware that both examples are largely pointless and it makes far more sense to just use a semicolon and be done with it. I'm just asking out of curiosity.
Case 3 is well defined.
First, let's look at how the expression is parsed:
a = b + a, a++
The comma operator , has the lowest precedence, followed by the assignment operator =, the addition operator + and the postincrement operator ++. So with the implicit parenthesis it is parsed as:
(a = (b + a)), (a++)
From here, section 6.5.17 of the C standard regarding the comma operator , says the following:
2 The left operand of a comma operator is evaluated as a void expression; there is a sequence point between its
evaluation and that of the right operand. Then the right
operand is evaluated; the result has its type and value
Section 5.14 p1 of the C++11 standard has similar language:
A pair of expressions separated by a comma is evaluated left-to-right;
the left expression is a discarded- value expression.
Every value computation and side effect associated with the left
expression is sequenced before every value computation and side effect
associated with the right expression. The type and value of the result
are the type and value of the right operand; the result is of the same
value category as its right operand, and is a bit-field if its right
operand is a glvalue and a bit-field.
Because of the sequence point, a = b + a is guaranteed to be fully evaluated before a++ in the expression a = b + a, a++.
Regarding free(foo), foo = bar, this also guarantees that foo is free'ed before a new value is assigned.
a = b + a, a++; is well-defined, but a = (b + a, a++); can be undefined.
First of all, the operator precedence makes the expression equivalent to (a = (b+a)), a++;, where + has the highest precedence, followed by =, followed by ,. The comma operator includes a sequence point between the evaluation of its left and right operand. So the code is, uninterestingly, completely equivalent to:
a = b + a;
a++;
Which is of course well-defined.
Had we instead written a = (b + a, a++);, then the sequence point in the comma operator wouldn't save the day. Because then the expression would have been equivalent to
(void)(b + a);
a = a++;
In C and C++14 or older, a = a++ is unsequenced , (see C11 6.5.16/3). Meaning this is undefined behavior (Per C11 6.5/2). Note that C++11 and C++14 were badly formulated and ambiguous.
In C++17 or later, the operands of the = operator are sequenced right to left and this is still well-defined.
All of this assuming no C++ operator overloading takes place. In that case, the parameters to the overloaded operator function will be evaluated, a sequence point takes place before the function is called, and what happens from there depends on the internals of that function.
In the following code excerpt from a larger piece of code presented
void func(int* usedNum, int wher) {
*usedNum = *usedNum + 1 > wher ? ++(*usedNum) : wher + 1;
}
int main(void) {
int a = 11, b = 2;
func(&a, b);
}
a warning is emitted
warning: operation on '* usedNum' may be undefined [-Wsequence-point]
*usedNum = *usedNum + 1 > wher ? ++(*usedNum) : wher + 1;
Is there a problem with the code?
My source of doubt was this and the part where it says
The sequence points in the logical expressions such as && and || and ternary operator ?: and the comma operator mean that the left hand side operand is evaluated before the right hand side operand. These few operands are the only operands in C++ that introduce sequence points.
tl;dr
For those that find torturing to read through the comments: The initial question was not properly posed and it would be unfair to create misconceptions. My view on the topic had two sides
The ternary operator does not mess up (in an unexpected way) the sequence points (which holds, the two branches are sequenced in every version of C,C++ - see the link provided)
Is x = ++x the problem? As seen in the coliru link, we compile for c++14. There the operation is well defined (references on the comments), but older versions of c++ and c view this as undefined. So why is there a warning?
Answers focus both in C and C++; this is a good link. Lastly the C tag was there initially (my bad) and can't be removed because existing upvoted answers refer to it
When the condition is true, it is the equivalent of saying x = ++x. In C, and versions of C++ prior to C++11, this constitutes a modification and a read of x without an intervening sequence point and therefore is undefined behaviour if the truthy branch is followed. From C++11 onwards, x = ++x is sequenced and well defined.
Edit To clarify some issues from comments.
1) this would be well defined in all C and C++ standards:
x = (++x, x); // RHS evaluates to x after increment
because the expression in the parentheses involves the comma operator, which introduce a sequence point between the evaluation of its operands. So the whole expression on the RHS evaluates to x after an increment. But the code in your question does not involve the comma operator.
2) The ternary operator introduces a sequence point
It is a sequence point between the condition and the two branches. But this doesn't introduce a sequence point between either branch and the assignment.
The warning you are getting is probably due to the fact that you are compiling your code in c++03 mode or older. In C99 and C++03, expression
x = ++x;
invokes undefined behavior. The reason is that between two sequence points an object can't modify more than once.
This rule is changed in C11 and C++11. According to C11, the rule is as follows:
C11:6.5 Expressions:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
When *usedNum + 1 > wher will be true, then
*usedNum = *usedNum + 1 > wher ? ++(*usedNum) : wher + 1;
would be equivalent to
*usedNum = ++(*usedNum);
and according to new rule this is well defined in C++11 this is because the side effect by pre ++ is sequenced before the side effect by = operator. Read this answer for more detailed explanation.
But the same expression *usedNum = ++(*usedNum); invokes undefined behavior in C11. The reason is that there is no guarantee that side effect by = operator is sequenced after the side effect of pre ++ operator.
Note: In the expression
a = x++ ? x++ : 0;
there is sequence point after the first x++ and hence behavior is well defined. Same is true for
x = (++x, x);
because there is a sequence point between the evaluation of left and right operand and hence side effect is sequenced.
The C++11 standard (5.17, expr.ass) states that
In all cases, the assignment is sequenced after the value computation
of the right and left operands, and before the value computation of
the assignment expression. With respect to an
indeterminately-sequenced function call, the operation of a compound
assignment is a single evaluation
Does this mean, that the expression:
int a = 1, b = 10;
int c = (a+=1) + (b+=1);
if ( c == 10+1+1+1 ) {
printf("this is guaranteed");
} else {
printf("not guaranteed");
}
will always evaluate to c==23?
The expression
int c = (a+=1) + (b+=1);
(edit: added the missing brackets, I think this is what you intended)
has the following subexpressions
(1) a+=1
(2) b+=1
(3) (1)+(2)
(4) c = (3)
The order in which (1) and (2) are evaluated is unspecified, the compiler is free to choose any order it likes.
Both (1) and (2) must be evaluated before the compiler can evaluate (3).
(3) must be evaluated before the compiler can evaluate (4).
Now as the order of evaluation of (1) and (2) does not matter, the overall result is well defined, your code will always yield 13 and print "this is now standard". Note that is has always been this way, this is not new with C++11.
This has always been guaranteed, and the sequenced before rules
(or the sequence point rules in pre-C++11) aren't need to
determine this. In C++, each (sub-)expression has two important
effects in the generated code: it has a value (unless it is of
type void), and it may have side effects. The sequenced
before/sequence point rules affect when the side effects are
guaranteed to have taken place; they have no effect on the value
of the sub-expressions. In your case, for example, the value
of (a += 1) is the value a will have after the assignment,
regardless of when the actual assignment takes place.
In C++11, the actual modification of a is guaranteed to take
place before the modification of c; in pre C++11, there was no
guarantee concerning the order. In this case, however, there is
no way a conforming program could see this difference, so it
doesn't matter. (It would matter in cases like c = (c += 1),
which would be undefined behavior in pre-C++11.)
In your example the compiler shall issue an error because the priority of the addition operator is higher than priority of the assignment operator. So at first 1 + b will be calculated and then there will be an attempt to assign 1 to expression ( 1 + b ) but ( 1 + b ) is not an lvalue.
With the new college year upon us.
We have started to receive the standard why does ++ i ++ not work as expected questions.
After just answering one of these type of questions I was told that the new C++11 standard has changed and this is no longer undefined behavior. I have heard that sequence points have been replaced by sequenced before and sequenced after but have not read deep (or at all) into the subject.
So the question I was just answering had:
int i = 12;
k = ++ (++ i);
So the question is:
How has the sequence points changes in C++11 and how does it affect questions like the above. Is it still undefined behavior or is this now well defined?
The UB in those cases is based on [intro.execution]/15
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced. [...] The value computations of the operands of an operator are sequenced before the value computation of the result of the operator. If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
For ++(++i): [expr.pre.incr]/1 states that ++i is defined as i+=1. This leads to [expr.ass]/1, which says
In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.
Therefore, for ++(++i), equivalent to (i+=1)+=1, the inner assignment is sequenced before the outer assignment, and we have no UB.
[intro.execution]/15 has an example of UB:
i = i++ + 1; // the behavior is undefined
The case here is a bit different (thanks to Oktalist for pointing out a pre/postfix mistake here). [expr.post.incr]/1 describes the effects of postfix increment. It states:
The value computation of the ++ expression is sequenced before the modification of the operand object.
However, there is no requirement on the sequencing of the side effect (the modification of i). Such a requirement could also be imposed by the assignment-expression. But the assignment-expression only requires the value computations (but not the side effects) of the operands to be sequenced before the assignment. Therefore, the two modifications via i = .. and i++ are unsequenced, and we get undefined behaviour.
N.B. i = (i = 1); does not have the same problem: The inner assignment guarantees the side effect of i = 1 is sequenced before the value computation of the same expression. And the value is required for the outer assignment, which guarantees that it (the value computation of the right operand (i = 1)) is sequenced before the side effect of the outer assignment. Similarly, i = ++i + 1; (equivalent to i = (i+=1) + 1;) has defined behaviour.
The comma operator is an example where the side effects are sequenced; [expr.comma]/1
Every value computation and side effect associated with the left expression is sequenced before every value computation and side effect associated with the right expression.
[intro.execution]/15 includes the example i = 7, i++, i++; (read: (i=7), i++, i++;), which is defined behaviour (i becomes 9).
I don't think sequencing is relevant to your situation. The expression ++i++ is grouped as ++(i++), so:
If i is a built-in type, then this is invalid, since i++ is an rvalue.
If i is of user-defined type and the operators are overloaded, this is a nested function call, such as T::operator++(T::operator++(i), 0), and function arguments are evaluated before the function call is evaluated.