#include <iostream>
int main(){
int arr[7] = {0,1,2,3,4,3,2};
arr[0]++[arr]++[arr]++[arr]++[arr]++[arr]++[arr] = 5; //#1
for(auto i = 0;i<7;i++){
std::cout<<i<<" : "<< arr[i]<<std::endl;
}
}
Consider the above code, Is this evaluation at #1 would result in UB? This is a example I have saw in twitter.
According to the evaluation sequence for postfix ++:
expr.post.incr#1
The value computation of the ++ expression is sequenced before the modification of the operand object.
That means, such a example would result in UB
int arr[2] = {0};
(*(arr[0]++ + arr))++
Because, the side effect caused by expression arr[0]++ and (*(arr[0]++) + arr))++ are unsequenced and applied to the same memory location.
However, for the first example, It's difference. My argument is:
expr.sub#1
The expression E1[E2] is identical (by definition) to *((E1)+(E2)),..., The expression E1 is sequenced before the expression E2.
That means, every value computation and side effect associated with E1 are both sequenced before every value computation and side effect associated with E2.
To simplify the expression at #1, according to the grammar of expression, such a expression should conform to:
expr.ass
logical-or-expression assignment-operator initializer-clause
And expr.post#1
Where the logical-or-expression here is a postfix-expression. That is,
postfix-expression [ arr ] = 5;
Where the postfix-expression has the form postfix-expression ++, which in turn, the postfix-expression has the form postfix-expression[arr]. In simple, the left operand of assignment consists of two kinds of postfix-expressions, they alternate combination with each other.
Postfix expressions group left-to-right
So, let the subscript operation has the form E1[E2] and the postfix++ expression has the form PE++, then for the first example, it will give a following decomposition like this:
E1': arr[0]++
E2': arr
E1'[E2']: arr[0]++[arr]
PE'++ : E1'[E2']++
E1'': PE'++
E2'': arr
E1''[E2'']: PE'++[arr]
PE''++: E1''[E2''] ++
and so on...
That means, in order to caculate PE'++, E1'[E2'] should be calculated prior PE'++, which is identical to *((E1')+E2'), per the rule E1' is sequenced before E2', hence the side effect caused by E1' is sequenced before value computation for E2'.
In other words, each side effect caused by postfix++ expression must be evaluated prior to that expression combined with the subsequent [arr].
So, in this way, I think such a code at #1 should have well-defined behavior rather than UB. Is there anything I misunderstand? Whether the code is UB or not? If it's not UB, what's the correct result the code will give?
I believe your understanding is fine and the code is fine in C++ from C++17.
Is there anything I misunderstand?
No.
Whether the code is UB or not?
No.
If it's not UB, what's the result?
arr[0]++[arr]++[arr]++[arr]++[arr]++[arr]++[arr] = 5;
Side effect: arr[0] := 0 + 1 = 1
0[arr]++[arr]++[arr]++[arr]++[arr]++[arr] = 5;
Side effect: arr[0] := 1 + 1 = 2
1[arr]++[arr]++[arr]++[arr]++[arr] = 5;
Side effect: arr[1] := 1 + 1 = 2
1[arr]++[arr]++[arr]++[arr] = 5;
Side effect: arr[1] := 2 + 1 = 3
2[arr]++[arr]++[arr] = 5;
Side effect: arr[2] := 2 + 1 = 3
2[arr]++[arr] = 5;
Side effect: arr[2] := 3 + 1 = 4
3[arr] = 5;
Side effect: arr[3] := 5
I see the output would be:
0 : 2
1 : 3
2 : 4
3 : 5
4 : 4
5 : 3
6 : 2
Note that the part The expression E1 is sequenced before the expression E2 was added in C++17.
The code is undefined before C++17 and is undefined in C (the tweet was about C code), because in arr[0]++[arr]++ both side effects on arr[0] from ++ are unsequenced to each other.
The multiply postfix incremented expression is not UB by itself because the standard mandates that:
The value computation of the ++ expression is sequenced before
the modification of the operand object.
So in each x++[arr], the postfix incrementation is defered after the value computation and will end into arr[0][arr][arr][arr][arr][arr][arr] and finaly (as arr[0] is initialy 0) into arr[0].
Then you will have a number of incrementation to apply here to the same arr[0] element, and as demonstrated by KalilCuk, all will be fine on that part alone at least in C++17. As arr[0] is 0 all those post incrementation will apply on arr[0], still without UB in that specific user case even before C++17.
The problem is that the assignment is not a sequence point. So a compiler can choose to first apply the assignment and then increment the result (which would give 11), or to first increment the value (which will become 6) and then process the assignment with a final result of 5.
So you are invoking the same UB as the simpler:
i = 0;
i++ = 1; // 1 or 2 ?
Probably not the one you expected, but still UB...
Related
In the following code:
int main() {
int i, j;
j = 10;
i = (j++, j+100, 999+j);
cout << i;
return 0;
}
The output is 1010.
However shouldn't it be 1009, as ++ should be done after the whole expression is used?
The comma operator is a sequence point: as it says in the C++17 standard for example,
Every value computation and side effect associated with the left expression is sequenced
before every value computation and side effect associated with the right expression.
Thus, the effect of the ++ operator is guaranteed to occur before 999+j is evaluated.
++ should be done after the whole expression is used?
No. The postfix operator evaluates to the value of the old j and has the side effect of incrementing j.
Comma operator evaluates the second operand after the first operand is evaluated and its side-effects are evaluated.
A pair of expressions separated by a comma is evaluated left-to-right;
the left expression is a discarded- value expression (Clause 5)83.
Every value computation and side effect associated with the left
expression is sequenced before every value computation and side effect
associated with the right expression.
https://stackoverflow.com/a/7784819/2805305
Associativity of the comma operator is left to right.
So starting from j++, this will be evaluated first (j becomes 11)
Then j + 100 is evaluated (no use)
Then 999 + j is evaluated which is equal to 1010
This rightmost value is assigned to i
Thus, the output is 1010
Long Answer:
Built-in comma operator
The comma operator expressions have the form
E1 , E2
In a comma expression E1, E2, the expression E1 is evaluated, its
result is discarded (although if it has class type, it won't be
destroyed until the end of the containing full expression), and its
side effects are completed before evaluation of the expression E2
begins (note that a user-defined operator, cannot guarantee
sequencing) (until C++17).
This already answers your question, but I'll walk through it with reference to your code:
Start with something simple like
int value = (1 + 2, 2 + 3, 4 + 5); // value is assigned 9
Because ...the expression E1 is evaluated, its result is discarded... Here, since we have more than 2 operands, the associativity of the comma operator also comes into play.
However shouldn't it be 1009, as '++" should be done after the whole
expression is used?
Now see:
int j = 0;
int i = (j++, 9 + j);
Here, the value of i is 10 because ...and its side effects are completed before evaluation of the expression E2 begins... Hence, the incrementation of j has its effect before the evaluation of 9 + j.
I think now you can clearly understand why your
j = 10;
i = (j++, j+100, 999+j);
i is assigned a value of 1010.
After answering this question, there was a long discussion over whether the code in question was undefined behaviour or not. Here's the code:
std::map<string, size_t> word_count;
word_count["a"] = word_count.count("a") == 0 ? 1 : 2;
First of all, it was well-established that this was at least unspecified. The result differs based on which side of the assignment is evaluated first. In my answer, I followed through each of the four resulting cases, with factors of which side is evaluated first and whether the element exists prior to this.
There was a short form that came up as well:
(x = 0) = (x == 0) ? 1 : 2; //started as
(x = 0) = (y == "a") ? 1 : 2; //changed to
I claimed it was more like this:
(x = 0, x) = (x == 0) ? 1 : 2; //comma sequences x, like [] should
Eventually, I found an example that seemed to work for me:
i = (++i,i++,i); //well-defined per SO:Undefined Behaviour and Sequence Points
Back to the original, I broke it down into relevant function calls to make it easier to follow:
operator=(word_count.operator[]("a"), word_count.count("a") == 0 ? 1 : 2);
^ inserts element^ ^reads same element
|
assigns to element
If word_count["a"] does not exist, it was argued that it would be assigned to twice without a sequencing in between. I personally didn't see how that could happen if two things I thought were true actually were:
When a side is picked to be evaluated, the whole side has to be evaluated before the other side can start.
Constructs such as word_count["a"] = 1 exhibit well-defined behaviour, even in the case that an element is inserted and then assigned to.
Are these two statements true? Ultimately, is that actually undefined behaviour, and if it is, why does the second statement work (assuming it does)? If the second is false, I believe all the myMap[i]++;s in the world would be ill-formed.
Helpful Link: Undefined behavior and sequence points
The behavior is unspecified, but not undefined.
Notice, that in the expression:
word_count["a"] = word_count.count("a") == 0 ? 1 : 2;
// ^
The assignment operator marked with ^ is the built-in assignment operator, because std::map's operator [] returns a size_t&.
Per Paragraph 5.17/1 of the C++11 Standard on the built-in assignment operator(s):
The assignment operator (=) and the compound assignment operators all group right-to-left. [..] In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.
With respect to an indeterminately-sequenced function call, the operation of a compound assignment is
a single evaluation.
This means that in a built-in assignment such as:
a = b
First the operands are evaluated (in unspecified order), then the assignment is performed, and finally the value computation of the whole assignment expression is performed.
Considering the original expression:
word_count["a"] = word_count.count("a") == 0 ? 1 : 2;
// ^
Because of the paragraph quoted above, in no case there are two unsequenced assignments to the same object: the assignment marked with ^ will always be sequenced after the assignment performed by operator [] (as part of the evaluation of the left hand side expression) in case the key "a" is not present in the map.
However, the expression will have a different outcome based on which side of the assignment is evaluated first. Thus, the behavior is unspecified, but not undefined.
It is unspecified, but not undefined.
word_count.operator[]("a") and word_count.count("a") are function calls. Function executions are guaranteed by standard to not interleave - either first is fully sequenced before second or the other way around.
The specific definition can vary by standard, in C++11 relevant clause is in 1.9/15:
Every evaluation in the calling function (including other function
calls) that is not otherwise specifically sequenced before or after
the execution of the body of the called function is indeterminately
sequenced with respect to the execution of the called function.9
9) In other words, function executions do not interleave with each other.
indeterminately sequenced is defined in 1.9/13:
Evaluations A and B are indeterminately sequenced when either A is
sequenced before B or B is sequenced before A , but it is unspecified
which.
For example, evaluation of:
word_count["a"] = word_count.count("a");
consists of three parts:
execution of word_count.operator[]("a")
execution word_count.count("a")
assignment
Let < mean 'is sequenced before'. Quoted part of the standard guarantees that either 1 < 2 or 2 < 1. Part quoted in #Andy Prowl answer also shows that both 1 < 3 and 2 < 3. So, there are only two options:
1 < 2 < 3
2 < 1 < 3
In both cases everything is fully sequenced and there is no chance for UB.
After answering this question, there was a long discussion over whether the code in question was undefined behaviour or not. Here's the code:
std::map<string, size_t> word_count;
word_count["a"] = word_count.count("a") == 0 ? 1 : 2;
First of all, it was well-established that this was at least unspecified. The result differs based on which side of the assignment is evaluated first. In my answer, I followed through each of the four resulting cases, with factors of which side is evaluated first and whether the element exists prior to this.
There was a short form that came up as well:
(x = 0) = (x == 0) ? 1 : 2; //started as
(x = 0) = (y == "a") ? 1 : 2; //changed to
I claimed it was more like this:
(x = 0, x) = (x == 0) ? 1 : 2; //comma sequences x, like [] should
Eventually, I found an example that seemed to work for me:
i = (++i,i++,i); //well-defined per SO:Undefined Behaviour and Sequence Points
Back to the original, I broke it down into relevant function calls to make it easier to follow:
operator=(word_count.operator[]("a"), word_count.count("a") == 0 ? 1 : 2);
^ inserts element^ ^reads same element
|
assigns to element
If word_count["a"] does not exist, it was argued that it would be assigned to twice without a sequencing in between. I personally didn't see how that could happen if two things I thought were true actually were:
When a side is picked to be evaluated, the whole side has to be evaluated before the other side can start.
Constructs such as word_count["a"] = 1 exhibit well-defined behaviour, even in the case that an element is inserted and then assigned to.
Are these two statements true? Ultimately, is that actually undefined behaviour, and if it is, why does the second statement work (assuming it does)? If the second is false, I believe all the myMap[i]++;s in the world would be ill-formed.
Helpful Link: Undefined behavior and sequence points
The behavior is unspecified, but not undefined.
Notice, that in the expression:
word_count["a"] = word_count.count("a") == 0 ? 1 : 2;
// ^
The assignment operator marked with ^ is the built-in assignment operator, because std::map's operator [] returns a size_t&.
Per Paragraph 5.17/1 of the C++11 Standard on the built-in assignment operator(s):
The assignment operator (=) and the compound assignment operators all group right-to-left. [..] In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.
With respect to an indeterminately-sequenced function call, the operation of a compound assignment is
a single evaluation.
This means that in a built-in assignment such as:
a = b
First the operands are evaluated (in unspecified order), then the assignment is performed, and finally the value computation of the whole assignment expression is performed.
Considering the original expression:
word_count["a"] = word_count.count("a") == 0 ? 1 : 2;
// ^
Because of the paragraph quoted above, in no case there are two unsequenced assignments to the same object: the assignment marked with ^ will always be sequenced after the assignment performed by operator [] (as part of the evaluation of the left hand side expression) in case the key "a" is not present in the map.
However, the expression will have a different outcome based on which side of the assignment is evaluated first. Thus, the behavior is unspecified, but not undefined.
It is unspecified, but not undefined.
word_count.operator[]("a") and word_count.count("a") are function calls. Function executions are guaranteed by standard to not interleave - either first is fully sequenced before second or the other way around.
The specific definition can vary by standard, in C++11 relevant clause is in 1.9/15:
Every evaluation in the calling function (including other function
calls) that is not otherwise specifically sequenced before or after
the execution of the body of the called function is indeterminately
sequenced with respect to the execution of the called function.9
9) In other words, function executions do not interleave with each other.
indeterminately sequenced is defined in 1.9/13:
Evaluations A and B are indeterminately sequenced when either A is
sequenced before B or B is sequenced before A , but it is unspecified
which.
For example, evaluation of:
word_count["a"] = word_count.count("a");
consists of three parts:
execution of word_count.operator[]("a")
execution word_count.count("a")
assignment
Let < mean 'is sequenced before'. Quoted part of the standard guarantees that either 1 < 2 or 2 < 1. Part quoted in #Andy Prowl answer also shows that both 1 < 3 and 2 < 3. So, there are only two options:
1 < 2 < 3
2 < 1 < 3
In both cases everything is fully sequenced and there is no chance for UB.
Please, explain why this code is correct or why not:
In my opinion, line ++*p1 = *p2++ has undefined behaviour, because p1 is dereferenced first and then incrementing.
int main()
{
char a[] = "Hello";
char b[] = "World";
char* p1 = a;
char* p2 = b;
//*++p1 = *p2++; // is this OK?
++*p1 = *p2++; // is this OK? Or this is UB?
std::cout << a << "\n" << b;
return 0;
}
The first is ok
*++p1 = *p2++ // p1++; *p1 = *p2; p2++;
the second is UB with C++ because you are modifying what is pointed by p1 twice (once because of increment and once because of assignment) and there are no sequence points separating the two side effects.
With C++0x rules things are different and more complex to explain and to understand. If you write intentionally expressions like the second one, if it's not for a code golf competition and if you are working for me then consider yourself fired (even if that is legal in C++0x).
I don't know if it is legal in C++0x and I don't want to know. I've too few neurons to waste them this way.
In modern C++ (at least C++ 2011 and later) neither is undefined behavior. And even neither is implementation defined or unspecified. (All three terms are different things.)
These two lines are both well defined (but they do different things).
When you have pointers p1 and p2 to scalar types then
*++p1 = *p2++;
is equivalent to
p1 = p1 + 1;
*p1 = *p2;
p2 = p2 + 1;
(^^^this is also true for C++ 1998/2003)
and
++*p1 = *p2++;
is equivalent to
*p1 = *p1 + 1;
*p1 = *p2;
p2 = p2 + 1;
(^^^maybe also in C++ 1998/2003 or maybe not - as explained below)
Obviously in case 2 incrementing value and then assigning to it (thus overwriting just incremented value) is pointless - but there may be similar examples that make sense (e.g. += instead of =).
BUT like many people point out - just don't write the code that looks ambiguous or unreasonably complex. Write the code that is clear to you and supposed to be clear to the readers.
Old C++ 1998/2003 case for second expression is a strange matter:
At first after reading the description of prefix increment operator:
ISO/IEC 14882-2003 5.3.2:
The operand of prefix ++ is modified by adding 1, or set to true if it
is bool (this use is deprecated). The operand shall be a modifiable
lvalue. The type of the operand shall be an arithmetic type or a
pointer to a completely-defined object type. The value is the new
value of the operand; it is an lvalue. If x is not of type bool, the
expression ++x is equivalent to x+=1.
I personally have a strong feeling that everything is perfectly defined and obvious and the same as above for C++ 2011 and later.
At least in the sense that every reasonable C++ implementation will behave in exact same well defined way (including old ones).
Why it should be otherwise if we always intuitively rely on a general rule that in any simple operator evaluation within a complex expression we evaluate its operands first and after that apply the operator to the values of those operands. Right? Breaking this intuitive expectation would be extremely stupid for any programming language.
So for the full expression ++*p1 = *p2++; we have operands: 1 - ++*p1 evaluated as already incremented lvalue (as defined in the above quote from C++ 2003) and 2 - *p2++ that is an rvalue stored at pointer p2 before its increment. It doesn't look ambiguous at all. Of course in this case - no reason to increment a value you are overwriting anyway BUT if there was double increment instead - ++(++*p1); OR other kind of assignment like +=/-=/&=/*=/etc instead of simple assignment THAT would not be unreasonable at all.
Unfortunately all the intuition and logic is messed up by this:
ISO/IEC 14882-2003 - 5 Expressions:
Except where noted, the order of evaluation of operands
of individual operators and subexpressions of individual
expressions, and the order in which side effects
take place, is unspecified. Between the previous
and next sequence point a scalar object shall have its
stored value modified at most once by the evaluation
of an expression. Furthermore, the prior value shall be
accessed only to determine the value to be stored.
The requirements of this paragraph shall be met for each
allowable ordering of the subexpressions of a full
expression; otherwise the behavior is undefined.
[Example:
i = v[i++]; // the behavior is unspecified
i = 7, i++, i++; // i becomes 9
i = ++i + 1; // the behavior is unspecified
i = i + 1; // the value of i is incremented
—end example]
So this wording if interpreted in a paranoid way seems to imply that modification of a value stored in a specific location more than once without intervening sequence point is explicitly forbidden by this rule and the last sentence declares that failing to comply with every requirement is Undefined Behavior. AND our expression seems to modify the same location more that once (?) with no sequence point until the full expression evaluated. (This arbitrary and unreasonable limitation is reinforced further by example 3 - i = ++i + 1; though it says // the behavior is unspecified - not undefined as in the wording before - which only adds more confusion.)
BUT on the other hand... If we ignore the example 3. (Maybe i = ++i + 1; is a typo and there should have been postfix increment instead - i = i++ + 1;? Who knows... Anyway examples are not part of formal specification.) If we interpret this wording in the most permissive way - we can see that in each allowed order of evaluation of subexpressions of the whole expression - preincrement ++*p1 must be evaluated to an LVALUE (which is something that allows further modification) BEFORE applying assignment operator so the only valid final value at that location is the one that is stored with assignment operator. ALSO NOTE that conforming C++ implementation have no obligation to actually modify that location more than once and may instead store only final result - that is both reasonable optimization allowed by the standard and may be actual demand of this article.
Which one of those interpretations is correct? Paranoid or permissive? Universally applicable logic or some suspicious and ambiguous words in a document almost nobody really ever read? Blue pill or Red pill?
Who knows... It looks like a gray area that requires less ambiguous explanation.
If we interpret the quote from C++ 2003 standard above in a paranoid way then it looks like this code may be Undefined Behavior:
#include <iostream>
#define INC(x) (++(x))
int main()
{
int a = 5;
INC(INC(a));
std::cout << a;
return 0;
}
while this code is perfectly legitimate and well defined:
#include <iostream>
template<class T> T& INC(T& x) // sequence point after evaluation of the arguments
{ // and before execution of the function body
return ++x;
}
int main()
{
int a = 5;
INC(INC(a));
std::cout << a;
return 0;
}
Really?
All this looks very much like a defect of the old C++ standard.
Fortunately this has been addressed in newer C++ standards (starting with C++ 2011) as there is no such concept as sequence point anymore. Instead there is a relation - something sequenced before something. And of course the natural guarantee that evaluation of the argument expressions of any operator is sequenced before evaluation of the result of the operator is there.
ISO/IEC 14882-2011 - 1.9 Program execution
Sequenced before is an asymmetric, transitive, pair-wise relation between evaluations executed by a single thread (1.10), which induces
a partial order among those evaluations. Given any two evaluations A
and B, if A is sequenced before B, then the execution of A shall
precede the execution of B. If A is not sequenced before B and B is
not sequenced before A, then A and B are unsequenced. [ Note: The
execution of unsequenced evaluations can overlap. — end note ]
Evaluations A and B are indeterminately sequenced when either A is
sequenced before B or B is sequenced before A, but it is unspecified
which. [ Note: Indeterminately sequenced evaluations cannot overlap,
but either could be executed first. — end note ]
Every value computation and side effect associated with a full-expression is sequenced before every value computation and side
effect associated with the next full-expression to be evaluated.
Except where noted, evaluations of operands of individual operators
and of subexpressions of individual expressions are unsequenced. [
Note: In an expression that is evaluated more than once during the
execution of a program, unsequenced and indeterminately sequenced
evaluations of its subexpressions need not be performed consistently
in different evaluations. — end note ] The value computations of the
operands of an operator are sequenced before the value computation of
the result of the operator. If a side effect on a scalar object is
unsequenced relative to either anotherside effect on the same scalar
object or a value computation using the value of the same scalar
object, the behavior is undefined.
[ Example:
void f(int, int);
void g(int i, int* v) {
i = v[i++]; // the behavior is undefined
i = 7, i++, i++; // i becomes 9
i = i++ + 1; // the behavior is undefined
i = i + 1; // the value of i is incremented
f(i = -1, i = -1); // the behavior is undefined
}
— end example ]
(Also NOTE how C++ 2003 prefix increment example i = ++i + 1; is replaced by postfix increment example i = i++ + 1; in this C++ 2011 quote. :) )
Why does the following compile in C++?
int phew = 53;
++++++++++phew ;
The same code fails in C, why?
Note: The two defect reports DR#637 and DR#222 are important to understand the below's behavior rationale.
For explanation, in C++0x there are value computations and side effects. A side effect for example is an assigment, and a value computation is determining what an lvalue refers to or reading the value out of an lvalue. Note that C++0x has no sequence points anymore and this stuff is worded in terms of "sequenced before" / "sequenced after". And it is stated that
If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
++v is equivalent to v += 1 which is equivalent to v = v + 1 (except that v is only evaluated once). This yields to ++ (v = v + 1) which I will write as inc = inc + 1, where inc refers to the lvalue result of v = v + 1.
In C++0x ++ ++v is not undefined behavior because for a = b the assignment is sequenced after value computation of b and a, but before value computation of the assignment expression. It follows that the asignment in v = v + 1 is sequenced before value computation of inc. And the assignment in inc = inc + 1 is sequenced after value computation of inc. In the end, both assignments will thus be sequenced, and there is no undefined behavior.
That is because in C++ pre-increment operator returns an lvalue and it requires its operand to be an lvalue.
++++++++++phew ; in interpreted as ++(++(++(++(++phew))))
However your code invokes Undefined Behaviour because you are trying to modify the value of phew more than once between two sequence points.
In C, pre-increment operator returns an rvalue and requires its operand to be an lvalue. So your code doesn't compile in C mode.