c++11 Order of evaluation (undefined behavior)

c++11 Order of evaluation (undefined behavior) - c++

vec[ival++] <= vec[ival]
This expression has undefined behavior, because the order of evaluation of operator's (<=) operands is undefined.
How can we rewrite that expression to avoid the undefined behavior?
I've found an answer that appears to work:
vec[ival] <= vec[ival + 1]
If that is the right way to avoid the undefined behavior, why does rewriting it that way avoid the undefined behavior?
Adding any reference about how to fix that expression would be great.

Yes, your first example has undefined behavior because we have an unsequenced modification and access of a memory location, which is undefined behavior. This is covered in draft C++ standard [intro.execution]p10 (emphasis mine):
Except where noted, evaluations of operands of individual operators
and of subexpressions of individual expressions are unsequenced.
[ Note: In an expression that is evaluated more than once during the
execution of a program, unsequenced and indeterminately sequenced
evaluations of its subexpressions need not be performed consistently
in different evaluations. — end note  ] The value computations of the
operands of an operator are sequenced before the value computation of
the result of the operator. If a side effect on a memory location
([intro.memory]) is unsequenced relative to either another side effect
on the same memory location or a value computation using the value of
any object in the same memory location, and they are not potentially
concurrent ([intro.multithread]), the behavior is undefined. [ Note:
The next subclause imposes similar, but more complex restrictions on
potentially concurrent computations. — end note  ] [ Example:
void g(int i) {
i = 7, i++, i++; // i becomes 9
i = i++ + 1; // the value of i is incremented
i = i++ + i; // the behavior is undefined
i = i + 1; // the value of i is incremented
}
— end example  ]
If we check out the section of relational operators which covers <= [expr.rel] it does not specify an order of evaluation, so we are covered by intro.exection and thus we have undefined behavior.
Having unspecified order of evaluation is not sufficient for undefined behavior as the example in Order of evaluation of assignment statement in C++ demonstrates.
Your second example avoids the undefined behavior since you are not introducing a side effect to ival, you are just reading the memory location twice.
I believe that is a reasonable way to solve the problem, it is readable and not surprising. An alternate method could include introducing a second variable, e.g. index and prev_index. It is hard to come with a fast rule given such a small code snippet.

It avoids undefined behavior because you are not changing the value of ival. The issue you're seeing in the first sample is that we can't determine what the values of ival are at the times that they're used. In the second sample, there's no confusion.

Let's start with the worst problem first, and that is the Undefined Behavior. C++ uses sequencing rules. Statements are executed in seqeuence. Usually that's top-to-bottom, but if statements, for statements, function calls and the like can change that.
Now withing a statement there still might be a further sequence of execution, but I'm very intentionally writing might. By defaults, the various parts of a single statement are not sequenced relative to each other. That's why you can get the varying order of execution. But worse, if you change and use a single object without sequencing, you have Undefined Behavior. That is bad - anything might happen.
The proposed solution (iVal + 1) doesn't change iVal anymore, but generates a new value. That is entirely safe. Still, it leaves iVal unchanged.
You may want to check on std::adjacent_find(). Chances are that the loop you're trying to write is already in the Standard Library.

The first problem is that as the initial code exhibited undefined behavior, under the C++ standard there is no "fix". The behavior of that line of code is not specified by the C++ standard; to know what it is supposed to do you have to have another source of information.
Formally, that expression can be rewritten as system("format c:"), as the C++ standard does not mandate any behavior from a program that exhibits undefined behavior.
But in practice, when you run into something like that, you have to read the original programmer's mind.
Well, you can solve anything with lambdas.
[&]{ bool ret = vec[ival] <= vec[ival+1]; ++ival; return ret; }()
Second,
vec[ival] <= vec[ival+1]
is not the same, because it lacks the side effect of ival being 1 greater after the expression is evaluated.

Related

Why is ptr = (ptr)++ Undefined Behavior

I am trying to learn how to explain the cause of(if any) of undefined behavior in the following cases(given below).
int i = 0, *ptr = &i;
i = ++i; //is this UB? If yes then why according to C++11
*ptr = (*ptr)++; //i think this is UB but i am unable to explain exactly why is this so
*ptr = ++(*ptr); //i think this is not UB but can't explain why
I have looked at many SO posts describing UB for different pointer cases similar to the cases above, but still i am unable to explain why exactly(like using which point(s) from the standard we can prove that they will result in UB) they result in UB.
I am looking for explanations according to C++11(or C++14) but not C++17 and not Pre-C++11.

Undefined behavior stems from this:
C++11 [intro.execution]/15 Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced... If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
C++17 [intro.execution]/17 Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced... If a side effect on a memory location (4.4) is unsequenced relative to either another side effect on the same memory location or a value computation using the value of any object in the same memory location, and they are not potentially concurrent (4.7), the behavior is undefined.
This text is similar. The main difference lies in "except where noted" part; in C++17, the order of evaluation of operands is specified for more operators than in C++11. Thus:
C++17 [expr.ass]/1 In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression. The right operand is sequenced before the left operand.
C++11 lacks the bolded part. This part is what makes i = i++ well-defined in C++17, but undefined in C++11. That's because for postfix increment, the side effect is not part of a value computation of the expression:
C++11 and C++17 [expr.post.incr]/1 The value computation of the ++ expression is sequenced before the modification of the operand object.
So "the assignment is sequenced after the value computation of the right and left operands" is not by itself sufficient: the assignment is sequenced after the value computation of i++, and the side effect is also sequenced after that same value computation, but nothing says how they are sequenced relative to each other. Therefore, they are unsequenced, and they are both modifying the same object (here, i). This exhibits undefined behavior.
The addition of "the right operand is sequenced before the left operand" in C++17 means that the side effect of i++ is sequenced before the value computation of i, and both are sequenced before the assignment.
On the other hand, for pre-increment the side effect is necessarily part of the evaluation of the expression:
C++11 and C++17 [expr.pre.incr]/1 ... The result is the updated operand; it is an lvalue ...
So the value computation of ++i involves incrementing i first, and then applying an lvalue-to-rvalue conversion to obtain the updated value. This value computation is sequenced before the assignment in both C++11 and C++17, and so the two side effects on i are sequenced relative to each other; no undefined behavior.
Nothing changes in this analysis if i is replaced with (*ptr). That's just another way to refer to the same object or memory location.

The C++ Standard is based upon the C Standard, whose authors didn't need any particular "reason" to say that implementations may process a construct in whatever fashion would be most useful to their customers [which is what they intended the phrase "Undefined Behavior" to mean]. Many platforms can cheaply guarantee, for small primitive types, that race conditions involving a read and conflicting write to the same object will always yield either old or new data, and that race conditions involving conflicting writes will result every individual subsequent read seeing one of the written values. Rather than trying to identify all of the cases where implementations should or should not be expected to uphold guarantee, the Standard allows implementations to, at their leisure, process code "in a documented manner characteristic of the environment". Because it's not practical for all implementations to offer such guarantees in all possible scenarios, and because the range of scenarios where such guarantees would be practical would be different on different platforms, the authors of the Standard allowed implementations to weigh the pros and cons of offering various behavioral guarantees on their particular target platforms, rather than trying to write precise rules that would be appropriate for all possible implementations.
Note also that if one were to do something like:
*p = (*q)++;
return q[0] + q[i]; // where 'i' is some object of type `int`.
when p and q are equal and i is zero, a compiler might quite plausibly generate code where the assignment would undo the effect of the increment, but which would return the sum of the old value of q, plus 1, plus the actual stored value of q (which would be the old value, rather than the incremented value). Although this would be a logical consequence of the specified race-condition semantics, trying to specify it precisely would have been sufficiently awkward that the Standard simply allows implementations to specify the behavior as tightly or loosely as they see fit.

Strange behavior of multiplying a pre-increment variable to itself in c++

sorry if my question is duplicate or not worth answering.
I have been the following code that produces the result that I am not understanding how.
int x=5;
int y;
y = ++x * ++x;
cout<<x <<endl;
cout<<y;
According to my little understanding of programming, the value of y should be 42, but, the output on the computer is 49. Kindly help what will be the output of variable y.
I am executing the code in Dev-C++.
Thanks in advance.

The short answer is that you have an undefined behavior in there.
The exact answer depends on which standard you use.
Pre C++11, we had the notion of sequence points. Intuitively, they are points in which all the values have been properly stored, such as at the end of statement (at the semicolon). The standard says that
Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an
expression.
, which means that between 2 sequence points (for the sake of simplicity, read as between 2 semicolons), the value of a variable cannot be changed more than once. You change the value twice, using the increment operator.
C++11 removed the notion of sequence points with relationships of sequenced before, sequenced after or unsequenced, referring to the order in which the expressions are evaluated.
In arithmetic expressions,
If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or [...] the behaviour is undefined.
and there is no sequencing between the operators of an arithmetic expression. Therefore, it is still a case of undefined behavior, just for another reason.
This means in practice that the compiler can choose what to do, and, in your case, it produces the results you observe. You should try to avoid undefined behaviour in your programs as much as possible. Below are some references that expand on the subject:
https://en.cppreference.com/w/cpp/language/eval_order
Undefined behavior and sequence points

being sure about "unknown evaluation order"

Since version 1.80, Cppcheck tells me that
Expression 'msg[ipos++]=checksum(&msg[1],ipos-1)' depends on order of evaluation of side effects
in this code sequence (simplified, data is a variable)
BYTE msg[MAX_MSG_SIZE]; // msg can be smaller, depending on data encoded
int ipos = 0;
msg[ipos++] = MSG_START;
ipos += encode(&msg[ipos], data);
msg[ipos++] = checksum(&msg[1], ipos-1); // <---- Undefined Behaviour?
msg[ipos++] = MSG_END; // increment ipos to the actual size of msg
and treats this as an error, not a portability issue.
It's C code (incorporated into a C++-dominated project), compiled with a C++98-complient compiler, and meanwhile runs as expected for decades. Cppcheck is run with C++03, C89, auto-detect language.
I confess that the code should better be rewritten. But before doing this, I try to figure out: Is it really dependent on evaluation order? As I understand it, the right operand is being evaluated first (it needs to before the call), then the assignment is taking place (to msg[ipos]) with the increment of ipos done last.
Am I wrong with this assumption, or is it just a false positive?

This code does indeed depend on evaluation order in a way which is not well defined:
msg[ipos++] = checksum(&msg[1], ipos-1);
Specifically, it is not specified whether ipos++ will increment before or after ipos-1 is evaluated. This is because there is no "sequence point" at the =, only at the end of the full expression (the ;).
The function call is a sequence point. But that only guarantees that ipos-1 happens before the function call. It does not guarantee that ipos++ happens after.
It appears the code should be rewritten this way:
msg[ipos] = checksum(&msg[1], ipos-1);
ipos++; // or ++ipos

The order of evaluation of the operands of = is unspecified. So to begin with, the code relies on unspecified behavior.
What makes the case even worse is that ipos is used twice in the same expression with no sequence point in between, for unrelated purposes - which leads to undefined behavior.
C99 6.5
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
The very same text applies to C90, C99, C++98 and C++03. In C11 and C++11 the wording has changed but the meaning is the same. It is undefined behavior, pre-C++11.
The compiler is not required to give a diagnostic for undefined behavior. You were lucky that it did. It is not a false positive - your code contains a severe bug, all the way from the original C code.

C + C++ = undefined behavior? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How do we explain the result of the expression (++x)+(++x)+(++x)?
Undefined Behavior and Sequence Points
I have the problem, when the code
U = C + C++;
Runs in different way for standart types and for my own types.
I have an example http://ideone.com/4S1uA where I have different values for int and my class Int, which should represent the way real Int works.
Is it possible to make my class behave the same way, as the standard int works? Has this code undefined behavior?
WHY it is undefiend behaivior? C++ has an operation priorities, so the c++ should be evaluated first, as it change the value of a, so for addition as first argument should be passed new value of a and as the second the old value. And it's works this way for class Int, but not for standart int.

Has this code undefined behavior?
Yes. The order in which the operands are evaluated, with respect to the side effect, is undefined.
Section 6.5(2) of the standard says:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an unsequenced side
effect occurs in any of the orderings.
Since int is a scalar type, and since the side effect here is unsequenced, the behavior is undefined.
You should write your code like this:
U = 2*C;
C++;

Yes, that is undefined behavior. You can't access a variable twice in a statement that also modifies it because the order in which the expression 'C' and the expression 'C++' are evaluated is not defined.

The concept involved here is one of sequence points. To quote the opening sentence from the Wikipedia article:
A sequence point in imperative programming defines any point in a computer program's execution at which it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have yet been performed.
In C, the + operator does not create a sequence point. Therefore the order of side effects is not defined. However, in C++, an overloaded operator + is a function call, which does create a sequence point. This creates different behavior with respect to side effects. Note that while the order in which function arguments are evaluated is not specified, all side effects are completed before the function enters. So if C + C++ involves an overloaded + operator, then the C++ side effect will have been applied to the left argument of + before the + function executes. This is unlike the case for int values, where the left side may or may not be evaluated before the side effect of the right side is complete.

Why is `i = ++i + 1` unspecified behavior?

Consider the following C++ Standard ISO/IEC 14882:2003(E) citation (section 5, paragraph 4):
Except where noted, the order of
evaluation of operands of individual
operators and subexpressions of individual
expressions, and the order in
which side effects take place, is
unspecified. 53) Between the previous
and next sequence point a scalar
object shall have its stored value
modified at most once by the
evaluation of an expression.
Furthermore, the prior value shall be
accessed only to determine the value
to be stored. The requirements of this
paragraph shall be met for each
allowable ordering of the
subexpressions of a full expression;
otherwise the behavior is undefined.
[Example:
i = v[i++]; // the behavior is unspecified
i = 7, i++, i++; // i becomes 9
i = ++i + 1; // the behavior is unspecified
i = i + 1; // the value of i is incremented
—end example]
I was surprised that i = ++i + 1 gives an undefined value of i.
Does anybody know of a compiler implementation which does not give 2 for the following case?
int i = 0;
i = ++i + 1;
std::cout << i << std::endl;
The thing is that operator= has two args. First one is always i reference.
The order of evaluation does not matter in this case.
I do not see any problem except C++ Standard taboo.
Please, do not consider such cases where the order of arguments is important to evaluation. For example, ++i + i is obviously undefined. Please, consider only my case
i = ++i + 1.
Why does the C++ Standard prohibit such expressions?

You make the mistake of thinking of operator= as a two-argument function, where the side effects of the arguments must be completely evaluated before the function begins. If that were the case, then the expression i = ++i + 1 would have multiple sequence points, and ++i would be fully evaluated before the assignment began. That's not the case, though. What's being evaluated in the intrinsic assignment operator, not a user-defined operator. There's only one sequence point in that expression.
The result of ++i is evaluated before the assignment (and before the addition operator), but the side effect is not necessarily applied right away. The result of ++i + 1 is always the same as i + 2, so that's the value that gets assigned to i as part of the assignment operator. The result of ++i is always i + 1, so that's what gets assigned to i as part of the increment operator. There is no sequence point to control which value should get assigned first.
Since the code is violating the rule that "between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression," the behavior is undefined. Practically, though, it's likely that either i + 1 or i + 2 will be assigned first, then the other value will be assigned, and finally the program will continue running as usual — no nasal demons or exploding toilets, and no i + 3, either.

It's undefined behaviour, not (just) unspecified behaviour because there are two writes to i without an intervening sequence point. It is this way by definition as far as the standard specifies.
The standard allows compilers to generate code that delays writes back to storage - or from another view point, to resequence the instructions implementing side effects - any way it chooses so long as it complies with the requirements of sequence points.
The issue with this statement expression is that it implies two writes to i without an intervening sequence point:
i = i++ + 1;
One write is for the value of the original value of i "plus one" and the other is for that value "plus one" again. These writes could happen in any order or blow up completely as far as the standard allows. Theoretically this even gives implementations the freedom to perform writebacks in parallel without bothering to check for simultaneous access errors.

C/C++ defines a concept called sequence points, which refer to a point in execution where it's guaranteed that all effects of previous evaluations will have already been performed. Saying i = ++i + 1 is undefined because it increments i and also assigns i to itself, neither of which is a defined sequence point alone. Therefore, it is unspecified which will happen first.

Update for C++11 (09/30/2011)
Stop, this is well defined in C++11. It was undefined only in C++03, but C++11 is more flexible.
int i = 0;
i = ++i + 1;
After that line, i will be 2. The reason for this change was ... because it already works in practice and it would have been more work to make it be undefined than to just leave it defined in the rules of C++11 (actually, that this works now is more of an accident than a deliberate change, so please don't do it in your code!).
Straight from the horse's mouth
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#637

Given two choices: defined or undefined, which choice would you have made?
The authors of the standard had two choices: define the behavior or specify it as undefined.
Given the clearly unwise nature of writing such code in the first place, it doesn't make any sense to specify a result for it. One would want to discourage code like that and not encourage it. It's not useful or necessary for anything.
Furthermore, standards committees do not have any way to force compiler writers to do anything. Had they required a specific behavior it is likely that the requirement would have been ignored.
There are practical reasons as well, but I suspect they were subordinate to the above general consideration. But for the record, any sort of required behavior for this kind of expression and related kinds will restrict the compiler's ability to generate code, to factor out common subexpressions, to move objects between registers and memory, etc. C was already handicapped by weak visibility restrictions. Languages like Fortran long ago realized that aliased parameters and globals were an optimization-killer and I believe they simply prohibited them.
I know you were interested in a specific expression, but the exact nature of any given construct doesn't matter very much. It's not going to be easy to predict what a complex code generator will do and the language attempts to not require those predictions in silly cases.

The important part of the standard is:
its stored value modified at most once by the evaluation of an expression
You modify the value twice, once with the ++ operator, once with the assignment

Please note that your copy of the standard is outdated and contains a known (and fixed) error just in 1st and 3rd code lines of your example, see:
C++ Standard Core Language Issue Table of Contents, Revision 67, #351
and
Andrew Koenig: Sequence point error: unspecified or undefined?
The topic is not easy to get just reading the standard (which is pretty obscure :( in this case).
For example, will it be well(or not)-defined, unspecified or else in general case actually depends not only on the statement structure, but also on memory contents (to be specific, variable values) at the moment of execution, another example:
++i, ++i; //ok
(++i, ++j) + (++i, ++j); //ub, see the first reference below (12.1 - 12.3)
Please have a look at (it has it all clear and precise):
JTC1/SC22/WG14 N926 "Sequence Point Analysis"
Also, Angelika Langer has an article on the topic (though not as clear as the previous one):
"Sequence Points and Expression Evaluation in C++"
There was also a discussion in Russian (though with some apparently erroneous statements in the comments and in the post itself):
"Точки следования (sequence points)"

The following code demonstrates how you could get the wrong(unexpected) result:
int main()
{
int i = 0;
__asm { // here standard conformant implementation of i = ++i + 1
mov eax, i;
inc eax;
mov ecx, 1;
add ecx, eax;
mov i, ecx;
mov i, eax; // delayed write
};
cout << i << endl;
}
It will print 1 as a result.

Assuming you are asking "Why is the language designed this way?".
You say that i = ++i + i is "obviously undefined" but i = ++i + 1 should leave i with a defined value? Frankly, that would not be very consistent. I prefer to have either everything perfectly defined, or everything consistently unspecified. In C++ I have the latter. It's not a terribly bad choice per se - for one thing, it prevents you from writing evil code which makes five or six modifications in the same "statement".

Argument by analogy: If you think of operators as types of functions, then it kind of makes sense. If you had a class with an overloaded operator=, your assignment statement would be equivalent to something like this:
operator=(i, ++i+1)
(The first parameter is actually passed in implicitly via the this pointer, but this is just for illustration.)
For a plain function call, this is obviously undefined. The value of the first argument depends on when the second argument is evaluated. However with primitive types you get away with it because the original value of i is simply overwritten; its value doesn't matter. But if you were doing some other magic in your own operator=, then the difference could surface.
Simply put: all operators act like functions, and should therefore behave according to the same notions. If i + ++i is undefined, then i = ++i should be undefined as well.

How about, we just all agree to never, never, write code like this? If the compiler doesn't know what you want to do, how do you expect the poor sap that is following on behind you to understand what you wanted to do? Putting i++; on it's own line will not kill you.

The underlying reason is because of the way the compiler handles reading and writing of values. The compiler is allowed to store an intermediate value in memory and only actually commit the value at the end of the expression. We read the expression ++i as "increase i by one and return it", but a compiler might see it as "load the value of i, add one, return it, and the commit it back to memory before someone uses it again. The compiler is encouraged to avoid reading/writing to the actual memory location as much as possible, because that would slow the program down.
In the specific case of i = ++i + 1, it suffers largely due to the need of consistent behavioral rules. Many compilers will do the 'right thing' in such a situation, but what if one of the is was actually a pointer, pointing to i? Without this rule, the compiler would have to be very careful to make sure it performed the loads and stores in the right order. This rule serves to allow for more optimization opportunities.
A similar case is that of the so-called strict-aliasing rule. You can't assign a value (say, an int) through a value of an unrelated type (say, a float) with only a few exceptions. This keeps the compiler from having to worry that some float * being used will change the value of an int, and greatly improves optimization potential.

The problem here is that the standard allows a compiler to completely reorder a statement while it is executing. It is not, however, allowed to reorder statements (so long as any such reordering results in changed program behavior). Therefore, the expression i = ++i + 1; may be evaluated two ways:
++i; // i = 2
i = i + 1;
or
i = i + 1; // i = 2
++i;
or
i = i + 1; ++i; //(Running in parallel using, say, an SSE instruction) i = 1
This gets even worse when you have user defined types thrown in the mix, where the ++ operator can have whatever effect on the type the author of the type wants, in which case the order used in evaluation matters significantly.

i = v[i++]; // the behavior is unspecified
i = ++i + 1; // the behavior is unspecified
All the above expressions invoke Undefined Behavior.
i = 7, i++, i++; // i becomes 9
This is fine.
Read Steve Summit's C-FAQs.

From ++i, i must assigned "1", but with i = ++i + 1, it must be assigned the value "2". Since there is no intervening sequence point, the compiler can assume that the same variable is not being written twice, so this two operations can be done in any order. so yes, the compiler would be correct if the final value is 1.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js