This question already has answers here:
Undefined behavior and sequence points
(5 answers)
Closed 8 years ago.
My teacher provided me with this code and it returns 31,40, but I am unable to figure out why. What is the reason for it returning what it does?
void main() {
int *ptr;
int arr[5] = { 10, 20, 30, 40, 50 };
ptr = &arr[3];
cout << ++*ptr-- << ", " << *ptr;
}
cout << ++*ptr-- << ", " << *ptr;
is
operator <<(cout.operator <<(++*ptr--), ", ").operator <<(*ptr);
The problem can be reduced to:
f(f(ptr--), ptr)
whereas order of evaluation between f(ptr--) and ptr is unspecified (and more specificaly between ptr-- and ptr).
So you got undefined behavior for the given code.
The C++ standard states
Section 1.9/15 [intro.execution] : Except where noted, evaluations of operands of individual operators and of subexpressions of individual
expressions are unsequenced. (...) If a side effect on a scalar object is unsequenced relative to either another side effect on the
same scalar object or a value computation using the value of the same
scalar object, the behavior is undefined.
++*ptr-- and *ptr are unsequenced subexpressions of the same expression using the same object: nothing guarantees that they are evaluated from left to right. So according to the standard, this results in undefined behaviour. Your result tend to show that your compiler chooses to evaluate first *ptr and then ++*ptr--.
Edit: ++*ptr-- is ++(*ptr--)). Here the operand of operator ++ also uses object ptr on which -- does a side effect. So this is undefined behaviour as well. It appears that in your case, the compiler first evaluates *ptr-- which results in 40 and a decremented ptr, and then applies ++ on the dereferenced decremented pointer (i.e. 30 incremented by 1).
Related
Originally, I presented a more complicated example, this one was proposed by #n. 'pronouns' m. in a now-deleted answer. But the question became too long, see edit history if you are interested.
Has the following program well-defined behaviour in C++17?
int main()
{
int a=5;
(a += 1) += a;
return a;
}
I believe this expression is well-defined and evaluated like this:
The right side a is evaluated to 5.
There are no side-effects of the right side.
The left side is evaluated to a reference to a, a += 1 is well-defined for sure.
The left-side side-effect is executed, making a==6.
The assignment is evaluted, adding 5 to the current value of a, making it 11.
The relevant sections of the standard:
[intro.execution]/8:
An expression X is said to be sequenced before an expression Y if
every value computation and every side effect associated with the
expression X is sequenced before every value computation and every
side effect associated with the expression Y.
[expr.ass]/1 (emphasis mine):
The assignment operator (=) and the compound assignment operators all
group right-to-left. All require a modifiable lvalue as their left
operand; their result is an lvalue referring to the left operand. The
result in all cases is a bit-field if the left operand is a bit-field.
In all cases, the assignment is sequenced after the value computation
of the right and left operands, and before the value computation of
the assignment expression. The right operand is sequenced before the
left operand. With respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation.
The wording originally comes from the accepted paper P0145R3.
Now, I feel there is some ambiguity, even contradiction, in this second section.
The right operand is sequenced before the left operand.
Together with the definition of sequenced before strongly implies the ordering of side-effects, yet the previous sentence:
In all cases, the assignment is sequenced after the value computation
of the right and left operands, and before the value computation of
the assignment expression
only explicitly sequences the assignment after value computation, not their side-effects. Thus allowing this behaviour:
The right side a is evaluated to 5.
The left side is evaluated to a reference of a, a += 1 is well-defined for sure.
The assignment is evaluted, adding 5 to the current value of a, making it 10.
The left-side side-effect is executed, making a==11 or maybe even 6 if the old values was used even for the side-effect.
But this ordering clearly violates the definition of sequenced before since the side-effects of the left operand happened after the value computation of the right operand. Thus left operand was not sequenced after the right operand which violets the above mentioned sentence. No I done goofed. This is allowed behaviour, right? I.e. the assignment can interleave the right-left evaluation. Or it can be done after both full evaluations.
Running the code gcc outputs 12, clang 11. Furthermore, gcc warns about
<source>: In function 'int main()':
<source>:4:8: warning: operation on 'a' may be undefined [-Wsequence-point]
4 | (a += 1) += a;
| ~~~^~~~~
I am terrible at reading assembly, maybe someone can at least rewrite how gcc got to 12? (a += 1), a+=a works but that seems extra wrong.
Well, thinking more about it, the right side also does evaluate to a reference to a, not just to a value 5. So Gcc could still be right, in that case clang could be wrong.
In order to follow better what is actually performed, let's try to mimic the same with our own type and add some printouts:
class Number {
int num = 0;
public:
Number(int n): num(n) {}
Number operator+=(int i) {
std::cout << "+=(int) for *this = " << num
<< " and int = " << i << std::endl;
num += i;
return *this;
}
Number& operator+=(Number n) {
std::cout << "+=(Number) for *this = " << num
<< " and Number = " << n << std::endl;
num += n.num;
return *this;
}
operator int() const {
return num;
}
};
Then when we run:
Number a {5};
(a += 1) += a;
std::cout << "result: " << a << std::endl;
We get different results with gcc and clang (and without any warning!).
gcc:
+=(int) for *this = 5 and int = 1
+=(Number) for *this = 6 and Number = 6
result: 12
clang:
+=(int) for *this = 5 and int = 1
+=(Number) for *this = 6 and Number = 5
result: 11
Which is the same result as for ints in the question. Even though it is not the same exact story: built-in assignment has its own sequencing rules, as opposed to overloaded operator which is a function call, still the similarity is interesting.
It seems that while gcc keeps the right side as a reference and turns it to a value on the call to +=, clang on the other hand turns the right side to a value first.
The next step would be to add a copy constructor to our Number class, to follow exactly when the reference is turned into a value. Doing that results with calling the copy constructor as the first operation, both by clang and gcc, and the result is the same for both: 11.
It seems that gcc delays the reference to value conversion (both in the built-in assignment as well as with user defined type without a user defined copy constructor). Is it coherent with C++17 defined sequencing? To me it seems as a gcc bug, at least for the built-in assignment as in the question, as it sounds that the conversion from reference to value is part of the "value computation" that shall be sequenced before the assignment.
As for a strange behavior of clang reported in previous version of the original post - returning different results in assert and when printing:
constexpr int foo() {
int res = 0;
(res = 5) |= (res *= 2);
return res;
}
int main() {
std::cout << foo() << std::endl; // prints 5
assert(foo() == 5); // fails in clang 11.0 - constexpr foo() is 10
// fixed in clang 11.x - correct value is 5
}
This relates to a bug in clang. The failure of the assert is wrong and is due to wrong evaluation order of this expression in clang, during constant evaluation in compile time. The value should be 5. This bug is already fixed in clang trunk.
Hello I have a simple question: is modifying an object more than once in the same expression; once through its identifier (name) and second through a reference to it or a pointer that points at it Undefined Behavior?
int i = 1;
std::cout << i << ", " << ++i << std::endl; //1- Error. Undefined Behavior
int& refI = i;
std::cout << i << ", " << ++refI << std::endl; //2- Is this OK?
int* ptrI = &refI; // ptrI point to the referred-to object (&i)
std::cout << i << ", " << ++*ptrI << std::endl; // 3 is this also OK?
In the second it seems to work fine but I am confused about it because from what I've learned; a Reference is just an alias name for an already existing object. and any change to it will affect the reffered-to object. Thus what I see here is that i and refI are the same so modifying the same object (i) more than once here in the same expression.
But Why all the compilers treat statement 2 as a well-defined behavior?
What about statement 3 (ptrI)?
All of them have undefined behavior before C++17 and all have well-defined behavior since C++17.
Note that you are not modifying i more than once in either example. You are modifying it only with the increment.
However, it is also undefined behavior to have a side effect on one scalar (here the increment of i) be unsequenced with a value computation (here the left-hand use of i). Whether the side effect is produced by directly acting on the variable or through a reference or pointer does not matter.
Before C++17, the << operator did not imply any sequencing of its operands, so the behavior is undefined in all your examples.
Since C++17, the << operator is guaranteed to evaluate its operands from left-to-right. C++17 also extended the sequencing rules for operators to overloaded operators when called with the operator notation. So in all your examples the behavior is well-defined and the left-hand use of i is evaluated first, before i's value is incremented.
Note however, that some compilers didn't implement these changes to the evaluation rules very timely, so even if you use the -std=c++17 flag, it might still unfortunately violate the expected behavior with older and current compiler versions.
In addition, at least in the case of GCC, the -Wsequence-point warning is explicitly documented to warn even for behavior that became well-defined in C++17, to help the user to avoid writing code that would have undefined behavior in C and earlier C++ versions, see GCC documentation.
The compiler is not required (and not able to) diagnose all cases of undefined behavior. In some simple situations it will be able to give you a warning (which you can turn into an error using -Werror or similar), but in more complex cases it will not. Still, your program will loose any guarantee on its behavior if you have undefined behavior, whether diagnosed or not.
The order of evaluation rules are defined in terms of objects, not references or pointers, or whatever method you take to obtain the object.
That said, your three examples are exactly equivalent in terms of the order of evaluation rules (if we are only considering the object defined as i).
Hence let's just look at your first example:
std::cout << i << ", " << ++i << std::endl;
For simplicity we can ignore the ", " and std::endl, hence:
std::cout << i << ++i;
VC(X) = Value computation of X
SE(X) = Side effects of X
Exec(X) = The execution of the function body of X
X <--- Y = X is sequenced after Y
Since c++ 11 and until c++ 17, this is undefined behavior because the side effects of D (see the graph) is unsequenced relative to the value calculation of C. However, both involves object i. This is undefined behavior.
Since c++ 17, there is an extra guarantee (on the << and >> expressions) that both the value computation and side effects of C will be sequenced before the value computation and side effects of D (marked by the dotted lines), therefore the code becomes well-defined.
All of these result in Undefined Behaviour. Just because a compiler doesn't give a warning doesn't make it not so. Anything can happen with UB: it works, it works sometimes, it crashes, it blows up your computer, &c.
This question already has answers here:
New Sequence Points in C++11
(2 answers)
So why is i = ++i + 1 well-defined in C++11?
(3 answers)
Closed 4 years ago.
I've read something about order of evaluation and I understand some mistakes caused by order of evaluation.
My basic rule comes from a text and example:
Order of operand evaluation is independent of precedence and associativity.
In most cases, the order is largely unspecified.
So for expression like this: int i = f1() * f2();
f1 and f2 must be called before the multiplication can be done. After all, it is their results that are multiplied. However, we have no way of knowing whether f1 will be called before f2 or vice versa.
Undefined behavior examples:
int i = 0;
cout << i << " " << ++i << endl;
How I understand: I treat i and ++i as a function. I don't know which evaluates first, so the first i could be 0 or 1 and (the rule) makes sense.
while(beg != s.end())
*beg = toupper(*beg++); //Just another example.
I think the key to understand this is to treat each operand as a "evaluation unit", and one don't know order of evaluation within these units but can know order in every single unit.
But for i = ++i + 2 reference here, Why is it wrong? I can't explain with my own conclusion.
The left i is used as a lvalue and not a pointer. ++i simply rewrite the original value and doesn't change the storage address. What could be wrong if it evaluates first or latter? My rule fails here.
Quite long but try to provide enough background info, thank for your patience.
I don't know sequence point which is mentioned frequently in answers.. So I think I need to read something about it first. Btw, the debate is not very helpful, for a newbie simply want to know why is it considered wrong, like before C++11?
I find this answer Undefined behavior and sequence points explain well why i = ++i + 2 is undefined behaviour before C++11
C++11 has new sequencing rules. In particular, for a pre-increment (++i) the side effect (writing the new value) is sequenced-before the further use of the new, incremented value. Since the assignment i= is sequenced-after the evaluation of its right-hand side, this means the write ++i is transitively sequenced-before the write i=(++i + 2)
It would be another matter for i=(i++ + 2). For post-increment, the side effect is sequenced-after, which means the two assignments are no longer sequenced relatively to each other. That IS undefined behavior.
Your two "subfunctions" in i = ++i + 2 are an explicit assignment made by the = operator and an implicit assignment done by the ++ operator.
The preincrement operator is defined to return an incremented value of the variable, and this definitely shall be used for addition (performed by the + operator). However, it was not defined, when the incremented value should be stored back into i.
As a result, it was undefined whether the final value of i is old_i incremented plus 2 or just old_i incremented.
This question already has answers here:
Undefined behavior and sequence points
(5 answers)
Closed 9 years ago.
I do not understand why the output of following program is 63:
#include <iostream>
int main() {
int a = 20;
a += a + ++a;
std::cout << a;
}
I was expecting it to be 61. What exactly a += a + ++a; does?
Standard says: "Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression" (5 Expressions, §4), i.e. the following:
a += a + ++a
yields undefined behavior just like:
a = ++a;
does already. It also says: "the prior value shall be accessed only to determine the value to be stored", i.e. if you want to change a, you can use a in the same expression just to retrieve the previous value:
a = a + 1; // OK
... "otherwise the behavior is undefined."
You're triggering undefined behavior and there is no 'correct' answer. Your compiler can chose what order to evaluate the arguments of the plus operator.
it looks like ++a is evaluating before the rest of the expression, so it's as though a is 21` in a statement like
a += a + a;
at any rate, don't use ++a inside of an arithmetic expression like that anyway. It's confusing for people, and is probably undefined behavior
Please, explain why this code is correct or why not:
In my opinion, line ++*p1 = *p2++ has undefined behaviour, because p1 is dereferenced first and then incrementing.
int main()
{
char a[] = "Hello";
char b[] = "World";
char* p1 = a;
char* p2 = b;
//*++p1 = *p2++; // is this OK?
++*p1 = *p2++; // is this OK? Or this is UB?
std::cout << a << "\n" << b;
return 0;
}
The first is ok
*++p1 = *p2++ // p1++; *p1 = *p2; p2++;
the second is UB with C++ because you are modifying what is pointed by p1 twice (once because of increment and once because of assignment) and there are no sequence points separating the two side effects.
With C++0x rules things are different and more complex to explain and to understand. If you write intentionally expressions like the second one, if it's not for a code golf competition and if you are working for me then consider yourself fired (even if that is legal in C++0x).
I don't know if it is legal in C++0x and I don't want to know. I've too few neurons to waste them this way.
In modern C++ (at least C++ 2011 and later) neither is undefined behavior. And even neither is implementation defined or unspecified. (All three terms are different things.)
These two lines are both well defined (but they do different things).
When you have pointers p1 and p2 to scalar types then
*++p1 = *p2++;
is equivalent to
p1 = p1 + 1;
*p1 = *p2;
p2 = p2 + 1;
(^^^this is also true for C++ 1998/2003)
and
++*p1 = *p2++;
is equivalent to
*p1 = *p1 + 1;
*p1 = *p2;
p2 = p2 + 1;
(^^^maybe also in C++ 1998/2003 or maybe not - as explained below)
Obviously in case 2 incrementing value and then assigning to it (thus overwriting just incremented value) is pointless - but there may be similar examples that make sense (e.g. += instead of =).
BUT like many people point out - just don't write the code that looks ambiguous or unreasonably complex. Write the code that is clear to you and supposed to be clear to the readers.
Old C++ 1998/2003 case for second expression is a strange matter:
At first after reading the description of prefix increment operator:
ISO/IEC 14882-2003 5.3.2:
The operand of prefix ++ is modified by adding 1, or set to true if it
is bool (this use is deprecated). The operand shall be a modifiable
lvalue. The type of the operand shall be an arithmetic type or a
pointer to a completely-defined object type. The value is the new
value of the operand; it is an lvalue. If x is not of type bool, the
expression ++x is equivalent to x+=1.
I personally have a strong feeling that everything is perfectly defined and obvious and the same as above for C++ 2011 and later.
At least in the sense that every reasonable C++ implementation will behave in exact same well defined way (including old ones).
Why it should be otherwise if we always intuitively rely on a general rule that in any simple operator evaluation within a complex expression we evaluate its operands first and after that apply the operator to the values of those operands. Right? Breaking this intuitive expectation would be extremely stupid for any programming language.
So for the full expression ++*p1 = *p2++; we have operands: 1 - ++*p1 evaluated as already incremented lvalue (as defined in the above quote from C++ 2003) and 2 - *p2++ that is an rvalue stored at pointer p2 before its increment. It doesn't look ambiguous at all. Of course in this case - no reason to increment a value you are overwriting anyway BUT if there was double increment instead - ++(++*p1); OR other kind of assignment like +=/-=/&=/*=/etc instead of simple assignment THAT would not be unreasonable at all.
Unfortunately all the intuition and logic is messed up by this:
ISO/IEC 14882-2003 - 5 Expressions:
Except where noted, the order of evaluation of operands
of individual operators and subexpressions of individual
expressions, and the order in which side effects
take place, is unspecified. Between the previous
and next sequence point a scalar object shall have its
stored value modified at most once by the evaluation
of an expression. Furthermore, the prior value shall be
accessed only to determine the value to be stored.
The requirements of this paragraph shall be met for each
allowable ordering of the subexpressions of a full
expression; otherwise the behavior is undefined.
[Example:
i = v[i++]; // the behavior is unspecified
i = 7, i++, i++; // i becomes 9
i = ++i + 1; // the behavior is unspecified
i = i + 1; // the value of i is incremented
—end example]
So this wording if interpreted in a paranoid way seems to imply that modification of a value stored in a specific location more than once without intervening sequence point is explicitly forbidden by this rule and the last sentence declares that failing to comply with every requirement is Undefined Behavior. AND our expression seems to modify the same location more that once (?) with no sequence point until the full expression evaluated. (This arbitrary and unreasonable limitation is reinforced further by example 3 - i = ++i + 1; though it says // the behavior is unspecified - not undefined as in the wording before - which only adds more confusion.)
BUT on the other hand... If we ignore the example 3. (Maybe i = ++i + 1; is a typo and there should have been postfix increment instead - i = i++ + 1;? Who knows... Anyway examples are not part of formal specification.) If we interpret this wording in the most permissive way - we can see that in each allowed order of evaluation of subexpressions of the whole expression - preincrement ++*p1 must be evaluated to an LVALUE (which is something that allows further modification) BEFORE applying assignment operator so the only valid final value at that location is the one that is stored with assignment operator. ALSO NOTE that conforming C++ implementation have no obligation to actually modify that location more than once and may instead store only final result - that is both reasonable optimization allowed by the standard and may be actual demand of this article.
Which one of those interpretations is correct? Paranoid or permissive? Universally applicable logic or some suspicious and ambiguous words in a document almost nobody really ever read? Blue pill or Red pill?
Who knows... It looks like a gray area that requires less ambiguous explanation.
If we interpret the quote from C++ 2003 standard above in a paranoid way then it looks like this code may be Undefined Behavior:
#include <iostream>
#define INC(x) (++(x))
int main()
{
int a = 5;
INC(INC(a));
std::cout << a;
return 0;
}
while this code is perfectly legitimate and well defined:
#include <iostream>
template<class T> T& INC(T& x) // sequence point after evaluation of the arguments
{ // and before execution of the function body
return ++x;
}
int main()
{
int a = 5;
INC(INC(a));
std::cout << a;
return 0;
}
Really?
All this looks very much like a defect of the old C++ standard.
Fortunately this has been addressed in newer C++ standards (starting with C++ 2011) as there is no such concept as sequence point anymore. Instead there is a relation - something sequenced before something. And of course the natural guarantee that evaluation of the argument expressions of any operator is sequenced before evaluation of the result of the operator is there.
ISO/IEC 14882-2011 - 1.9 Program execution
Sequenced before is an asymmetric, transitive, pair-wise relation between evaluations executed by a single thread (1.10), which induces
a partial order among those evaluations. Given any two evaluations A
and B, if A is sequenced before B, then the execution of A shall
precede the execution of B. If A is not sequenced before B and B is
not sequenced before A, then A and B are unsequenced. [ Note: The
execution of unsequenced evaluations can overlap. — end note ]
Evaluations A and B are indeterminately sequenced when either A is
sequenced before B or B is sequenced before A, but it is unspecified
which. [ Note: Indeterminately sequenced evaluations cannot overlap,
but either could be executed first. — end note ]
Every value computation and side effect associated with a full-expression is sequenced before every value computation and side
effect associated with the next full-expression to be evaluated.
Except where noted, evaluations of operands of individual operators
and of subexpressions of individual expressions are unsequenced. [
Note: In an expression that is evaluated more than once during the
execution of a program, unsequenced and indeterminately sequenced
evaluations of its subexpressions need not be performed consistently
in different evaluations. — end note ] The value computations of the
operands of an operator are sequenced before the value computation of
the result of the operator. If a side effect on a scalar object is
unsequenced relative to either anotherside effect on the same scalar
object or a value computation using the value of the same scalar
object, the behavior is undefined.
[ Example:
void f(int, int);
void g(int i, int* v) {
i = v[i++]; // the behavior is undefined
i = 7, i++, i++; // i becomes 9
i = i++ + 1; // the behavior is undefined
i = i + 1; // the value of i is incremented
f(i = -1, i = -1); // the behavior is undefined
}
— end example ]
(Also NOTE how C++ 2003 prefix increment example i = ++i + 1; is replaced by postfix increment example i = i++ + 1; in this C++ 2011 quote. :) )