Related
This question already has answers here:
New Sequence Points in C++11
(2 answers)
So why is i = ++i + 1 well-defined in C++11?
(3 answers)
Closed 4 years ago.
I've read something about order of evaluation and I understand some mistakes caused by order of evaluation.
My basic rule comes from a text and example:
Order of operand evaluation is independent of precedence and associativity.
In most cases, the order is largely unspecified.
So for expression like this: int i = f1() * f2();
f1 and f2 must be called before the multiplication can be done. After all, it is their results that are multiplied. However, we have no way of knowing whether f1 will be called before f2 or vice versa.
Undefined behavior examples:
int i = 0;
cout << i << " " << ++i << endl;
How I understand: I treat i and ++i as a function. I don't know which evaluates first, so the first i could be 0 or 1 and (the rule) makes sense.
while(beg != s.end())
*beg = toupper(*beg++); //Just another example.
I think the key to understand this is to treat each operand as a "evaluation unit", and one don't know order of evaluation within these units but can know order in every single unit.
But for i = ++i + 2 reference here, Why is it wrong? I can't explain with my own conclusion.
The left i is used as a lvalue and not a pointer. ++i simply rewrite the original value and doesn't change the storage address. What could be wrong if it evaluates first or latter? My rule fails here.
Quite long but try to provide enough background info, thank for your patience.
I don't know sequence point which is mentioned frequently in answers.. So I think I need to read something about it first. Btw, the debate is not very helpful, for a newbie simply want to know why is it considered wrong, like before C++11?
I find this answer Undefined behavior and sequence points explain well why i = ++i + 2 is undefined behaviour before C++11
C++11 has new sequencing rules. In particular, for a pre-increment (++i) the side effect (writing the new value) is sequenced-before the further use of the new, incremented value. Since the assignment i= is sequenced-after the evaluation of its right-hand side, this means the write ++i is transitively sequenced-before the write i=(++i + 2)
It would be another matter for i=(i++ + 2). For post-increment, the side effect is sequenced-after, which means the two assignments are no longer sequenced relatively to each other. That IS undefined behavior.
Your two "subfunctions" in i = ++i + 2 are an explicit assignment made by the = operator and an implicit assignment done by the ++ operator.
The preincrement operator is defined to return an incremented value of the variable, and this definitely shall be used for addition (performed by the + operator). However, it was not defined, when the incremented value should be stored back into i.
As a result, it was undefined whether the final value of i is old_i incremented plus 2 or just old_i incremented.
From the C++ (C++11) standard, §1.9.15 which discusses ordering of evaluation, is the following code example:
void g(int i, int* v) {
i = v[i++]; // the behavior is undefined
}
As noted in the code sample, the behavior is undefined.
(Note: The answer to another question with the slightly different construct i + i++, Why is a = i + i++ undefined and not unspecified behaviour, might apply here: The answer is essentially that the behavior is undefined for historical reasons, and not out of necessity. However, the standard seems to imply some justification for this being undefined - see quote immediately below. Also, that linked question indicates agreement that the behavior should be unspecified, whereas in this question I am asking why the behavior is not well-specified.)
The reasoning given by the standard for the undefined behavior is as follows:
If a side effect on a scalar object is unsequenced relative to either
another side effect on the same scalar object or a value computation
using the value of the same scalar object, the behavior is undefined.
In this example I would think that the subexpression i++ would be completely evaluated before the subexpression v[...] is evaluated, and that the result of evaluation of the subexpression is i (before the increment), but that the value of i is the incremented value after that subexpression has been completely evaluated. I would think that at that point (after the subexpression i++ has been completely evaluated), the evaluation v[...] takes place, followed by the assignment i = ....
Therefore, although the incrementing of i is pointless, I would nonetheless think that this should be defined.
Why is this undefined behavior?
I would think that the subexpression i++ would be completely evaluated before the subexpression v[...] is evaluated
But why would you think that?
One historical reason for this code being UB is to allow compiler optimizations to move side-effects around anywhere between sequence points. The fewer sequence points, the more potential opportunities to optimize but the more confused programmers. If the code says:
a = v[i++];
The intention of the standard is that the code emitted can be:
a = v[i];
++i;
which might be two instructions where:
tmp = i;
++i;
a = v[tmp];
would be more than two.
The "optimized code" breaks when a is i, but the standard permits the optimization anyway, by saying that behavior of the original code is undefined when a is i.
The standard easily could say that i++ must be evaluated before the assignment as you suggest. Then the behavior would be fully defined and the optimization would be forbidden. But that's not how C and C++ do business.
Also beware that many examples raised in these discussions make it easier to tell that there's UB around than it is in general. This leads to people saying that it's "obvious" the behavior should be defined and the optimization forbidden. But consider:
void g(int *i, int* v, int *dst) {
*dst = v[(*i)++];
}
The behavior of this function is defined when i != dst, and in that case you'd want all the optimization you can get (which is why C99 introduces restrict, to allow more optimizations than C89 or C++ do). In order to give you the optimization, behavior is undefined when i == dst. The C and C++ standards tread a fine line when it comes to aliasing, between undefined behavior that's not expected by the programmer, and forbidding desirable optimizations that fail in certain cases. The number of questions about it on SO suggests that the questioners would prefer a bit less optimization and a bit more defined behavior, but it's still not simple to draw the line.
Aside from whether the behavior is fully defined is the issue of whether it should be UB, or merely unspecified order of execution of certain well-defined operations corresponding to the sub-expressions. The reason C goes for UB is all to do with the idea of sequence points, and the fact that the compiler need not actually have a notion of the value of a modified object, until the next sequence point. So rather than constrain the optimizer by saying that "the" value changes at some unspecified point, the standard just says (to paraphrase): (1) any code that relies on the value of a modified object prior to the next sequence point, has UB; (2) any code that modifies a modified object has UB. Where a "modified object" is any object that would have been modified since the last sequence point in one or more of the legal orders of evaluation of the subexpressions.
Other languages (e.g. Java) go the whole way and completely define the order of expression side-effects, so there's definitely a case against C's approach. C++ just doesn't accept that case.
I'm going to design a pathological computer1. It is a multi-core, high-latency, single-thread system with in-thread joins that operates with byte-level instructions. So you make a request for something to happen, then the computer runs (in its own "thread" or "task") a byte-level set of instructions, and a certain number of cycles later the operation is complete.
Meanwhile, the main thread of execution continues:
void foo(int v[], int i){
i = v[i++];
}
becomes in pseudo-code:
input variable i // = 0x00000000
input variable v // = &[0xBAADF00D, 0xABABABABAB, 0x10101010]
task get_i_value: GET_VAR_VALUE<int>(i)
reg indx = WAIT(get_i_value)
task write_i++_back: WRITE(i, INC(indx))
task get_v_value: GET_VAR_VALUE<int*>(v)
reg arr = WAIT(get_v_value)
task get_v[i]_value = CALC(arr + sizeof(int)*indx)
reg pval = WAIT(get_v[i]_value)
task read_v[i]_value = LOAD_VALUE<int>(pval)
reg got_value = WAIT(read_v[i]_value)
task write_i_value_again = WRITE(i, got_value)
(discard, discard) = WAIT(write_i++_back, write_i_value_again)
So you'll notice that I didn't wait on write_i++_back until the very end, the same time as I was waiting on write_i_value_again (which value I loaded from v[]). And, in fact, those writes are the only writes back to memory.
Imagine if write to memory are the really slow part of this computer design, and they get batched up into a queue of things that get processed by a parallel memory modifying unit that does things on a per-byte basis.
So the write(i, 0x00000001) and write(i, 0xBAADF00D) execute unordered and in parallel. Each gets turned into byte-level writes, and they are randomly ordered.
We end up writing 0x00 then 0xBA to the high byte, then 0xAD and 0x00 to the next byte, then 0xF0 0x00 to the next byte, and finally 0x0D 0x01 to the low byte. The resulting value in i is 0xBA000001, which few would expect, yet would be a valid result to your undefined operation.
Now, all I did there was result in an unspecified value. We haven't crashed the system. But the compiler would be free to make it completely undefined -- maybe sending two such requests to the memory controller for the same address in the same batch of instructions actually crashes the system. That would still be a "valid" way to compile C++, and a "valid" execution environment.
Remember, this is a language where restricting the size of pointers to 8 bits is still a valid execution environment. C++ allows for compiling to rather wonkey targets.
1: As noted in #SteveJessop's comment below, the joke is that this pathological computer behaves a lot like a modern desktop computer, until you get down to the byte-level operations. Non-atomic int writing by a CPU isn't all that rare on some hardware (such as when the int isn't aligned the way the CPU wants it to be aligned).
The reason is not just historical. Example:
int f(int& i0, int& i1) {
return i0 + i1++;
}
Now, what happens with this call:
int i = 3;
int j = f(i, i);
It's certainly possible to put requirements on the code in f so that the result of this call is well defined (Java does this), but C and C++ don't impose constraints; this gives more freedom to optimizers.
You specifically refer to the C++11 standard so I'm going to answer with the C++11 answer. It is, however, very similar to the C++03 answer, but the definition of sequencing is different.
C++11 defines a sequenced before relation between evaluations on a single thread. It is asymmetric, transitive and pair-wise. If some evaluation A is not sequenced before some evaluation B and B is also not sequenced before A, then the two evaluations are unsequenced.
Evaluating an expression includes both value computations (working out the value of some expression) and side effects. One instance of a side effect is the modification of an object, which is the most important one for answering question. Other things also count as side effects. If a side effect is unsequenced relative to another side effect or value computation on the same object, then your program has undefined behaviour.
So that's the set up. The first important rule is:
Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.
So any full expression is fully evaluated before the next full expression. In your question, we're only dealing with one full expression, namely i = v[i++], so we don't need to worry about this. The next important rule is:
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced.
That means that in a + b, for example, the evaluation of a and b are unsequenced (they may be evaluated in any order). Now for our final important rule:
The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.
So for a + b, the sequenced before relationships can be represented by a tree where a directed arrow represents the sequenced before relationship:
a + b (value computation)
^ ^
| |
a b (value computation)
If two evaluations occur in separate branches of the tree, they are unsequenced, so this tree shows that the evaluations of a and b are unsequenced relative to each other.
Now, let's do the same thing to your i = v[i++] example. We make use of the fact that v[i++] is defined to be equivalent to *(v + (i++)). We also use some extra knowledge about the sequencing of postfix increment:
The value computation of the ++ expression is sequenced before the modification of the operand object.
So here we go (a node of the tree is a value computation unless specified as a side effect):
i = v[i++]
^ ^
| |
i★ v[i++] = *(v + (i++))
^
|
v + (i++)
^ ^
| |
v ++ (side effect on i)★
^
|
i
Here you can see that the side effect on i, i++, is in a separate branch to the usage of i in front of the assignment operator (I marked each of these evaluations with a ★). So we definitely have undefined behaviour! I highly recommend drawing these diagrams if you ever wonder if your sequencing of evaluations is going to cause you trouble.
So now we get the question about the fact that the value of i before the assignment operator doesn't matter, because we write over it anyway. But actually, in the general case, that's not true. We can override the assignment operator and make use of the value of the object before the assignment. The standard doesn't care that we don't use that value - the rules are defined such that having any value computation unsequenced with a side effect will be undefined behaviour. No buts. This undefined behaviour is there to allow the compiler to emit more optimized code. If we add sequencing for the assignment operator, this optimization cannot be employed.
In this example I would think that the subexpression i++ would be completely evaluated before the subexpression v[...] is evaluated, and that the result of evaluation of the subexpression is i (before the increment), but that the value of i is the incremented value after that subexpression has been completely evaluated.
The increment in i++ must be evaluated before indexing v and thus before assigning to i, but storing the value of that increment back to memory need not happen before. In the statement i = v[i++] there are two suboperations that modify i (i.e. will end up causing a store from a register into the variable i). The expression i++ is equivalent to x=i+1, i=x, and there is no requirement that both operations need to take place sequentially:
x = i+1;
y = v[i];
i = y;
i = x;
With that expansion, the result of i is unrelated to the value in v[i]. On a different expansion, the i = x assignment could take place before the i = y assignment, and the result would be i = v[i]
There two rules.
The first rule is about multiple writes which give rise to a "write-write hazard": the same object cannot be modified more than once between two sequence points.
The second rule is about "read-write hazards". It is this: if an object is modified in an expression, and also accessed, then all accesses to its value must be for the purpose of computing the new value.
Expressions like i++ + i++ and your expression i = v[i++] violate the first rule. They modify an object twice.
An expression like i + i++ violates the second rule. The subexpression i on the left observes the value of a modified object, without being involved in the calculation of its new value.
So, i = v[i++] violates a different rule (bad write-write) from i + i++ (bad read-write).
The rules are too simplistic, which gives rise to classes of puzzling expressions. Consider this:
p = p->next = q
This appears to have a sane data flow dependency that is free of hazards: the assignment p = cannot take place until the new value is known. The new value is the result of p->next = q. The the value q should not "race ahead" and get inside p, such that p->next is affected.
Yet, this expression breaks the second rule: p is modified, and also used for a purpose not related to computing its new value, namely determining the storage location where the value of q is placed!
So, perversely, compilers are allowed to partially evaluate p->next = q to determine that the result is q, and store that into p, and then go back and complete the p->next = assignment. Or so it would seem.
A key issue here is, what is the value of an assignment expression? The C standard says that the value of an assignment expression is that of the lvalue, after the assignment. But that is ambiguous: it could be interpreted as meaning "the value which the lvalue will have, once the assignment takes place" or as "the value which can be observed in the lvalue after the assignment has taken place". In C++ this is made clear by the wording "[i]n all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.", so p = p->next = q appears to be valid C++, but dubious C.
I would share your arguments if the example were v[++i], but since i++ modifies i as a side-effect, it is undefined as to when the value is modified. The standard could probably mandate a result one way or the other, but there's no true way of knowing what the value of i should be: (i + 1) or (v[i + 1]).
Think about the sequences of machine operations necessary for each of the following assignment statements, assuming the given declarations are in effect:
extern int *foo(void);
extern int *p;
*p = *foo();
*foo() = *p;
If the evaluation of the subscript on the left side and the value on the right side are unsequenced, the most efficient ways to process the two function calls would likely be something like:
[For *p = *foo()]
call foo (which yields result in r0 and trashes r1)
load r0 from address held in r0
load r1 from address held in p
store r0 to address held in r1
[For *foo() = *p]
call foo (which yields result in r0 and trashes r1)
load r1 from address held in p
load r1 from address held in r1
store r1 to address held in r0
In either case, if p or *p were read into a register before the call to foo, then unless "foo" promises not to disturb that register, the compiler would need to add an extra step to save its value before calling "foo", and another extra step to restore the value afterward. That extra step might be avoided by using a register that "foo" won't disturb, but that would only help if there were a such a register which didn't hold a value needed by the surrounding code.
Letting the compiler read the value of "p" before or after the function call, at its leisure, will allow both patterns above to be handled efficiently. Requiring that the address of the left-hand operand of "=" always be evaluated before the right hand side would likely make the first assignment above less efficient than it otherwise could be, and requiring that the address of the left-hand operand be evaluated after the right-hand side would make the second assignment less efficient.
When I run this code, the output is 11, 10.
Why on earth would that be? Can someone give me an explanation of this that will hopefully enlighten me?
Thanks
#include <iostream>
using namespace std;
void print(int x, int y)
{
cout << x << endl;
cout << y << endl;
}
int main()
{
int x = 10;
print(x, x++);
}
The C++ standard states (A note in section 1.9.16):
Value computations and side effects associated with the different argument expressions are unsequenced.
In other words, it's undefined and/or compiler-dependent which order the arguments are evaluated in before their value is passed into the function. So on some compilers (which evaluate the left argument first) that code would output 10, 10 and on others (which evaluate the right argument first) it will output 11, 10. In general you should never rely on undefined behaviour.
To help you understand this, imagine that each argument expression is evaluated before the function is called like so (not that this is exactly how it actually works, it's just an easy way to think of it that will help you understand the sequencing):
int arg1 = x; // This line
int arg2 = x++; // And this line can be swapped.
print(arg1, arg2);
The C++ Standard says that the two argument expression are unsequenced. So, if we write out the argument expressions on separate lines like this, their order should not be significant, because the standard says they can be evaluated in any order. Some compilers might evaluate them in the order above, others might swap them:
int arg2 = x++; // And this line can be swapped.
int arg1 = x; // This line
print(arg1, arg2);
That makes it pretty obvious how arg2 can hold the value 10, while arg1 holds the value 11.
You should always avoid this undefined behaviour in your code.
On a whole the statement:
print(x, x++);
results in an Undefined Behavior. Once a program has an Undefined Behavior it ceases to be an valid C++ program and literally any behavior is possible.So it is pointless to find reasoning for such an program.
Why is this Undefined Behavior?
Lets evaluate the program step by step to the point where we can beyond any doubt prove that it causes Undefined Behavior.
The order of evaluation of arguments to a function is Unspecified[Ref 1].
Unspecified means that an implementation is allowed to implement this particular functionality as it desires and it is not required to document the detail about it.
Applying the above rule to your function call:
print(x, x++);
An implementation might evaluate this as:
Left to Right or
Right to Left or
Any Magical order(in case of more than two function arguments)
In short you cannot rely on an implementation to follow any specific order because it is not required to as per the C++ Standard.
In C/C++ you cannot read or write to a variable more than once without an intervening sequence point[Ref 2].If you do so it results in an Undefined Behavior.Irrespective of whether either of the arguments gets evaluated first in the said function, there is no sequence point between them,a sequence point exists only after evaluation of all function arguments[Ref 3].
In this case x is being accessed without an intervening sequence point and hence it results in an Undefined Behavior.
Simply put it is best to write any code which does not invoke such Undefined Behaviors because once you do so you cannot expect any specific behavior from such a program.
[Ref 1] C++03 Standard §5.2.2.8
Para 8:
[...] The order of evaluation of function arguments is unspecified. [...]
[Ref 2]C++03 5 Expressions [expr]:
Para 4:
....
Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored. The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full
expression; otherwise the behavior is undefined.
[Ref 3]C++03 1.9 Program execution [intro.execution]:
Para 17:
When calling a function (whether or not the function is inline), there is a sequence point after the evaluation of all function arguments (if any) which takes place before execution of any expressions or statements in the function body.
x++ is a function parameter and they may be evaluated in an unspecified order which means the behavior is undefined and not portable (or legal).
I believe this has to do with the function call stack where the last argument goes in first. So x++ is your y and x is the local x in print().
Late answer. Ignoring the issue of the order of evaluation, note that the C++ standard explains how post increment and post decrement operate: " Post-increment and post-decrement creates a copy of the object, increments or decrements the value of the object and returns the copy from before the increment or decrement."
https://en.cppreference.com/w/cpp/language/operator_incdec
As an example where the difference in outcome is significant, consider std::list::splice, such as:
mylist.splice(where, mylist, iter++);
This will move the node pointed by iter to just before the node pointed by where. The sequence will be make a copy of iter to be passed to splice, increment iter, then call splice using the copy of iter before it was incremented. After splice returns, iter will point to the next node after the node iter originally pointed to, as opposed to the next node after iter's new location in the list after it was moved.
Possible Duplicate:
Could anyone explain these undefined behaviors (i = i++ + ++i , i = i++, etc…)
According to c++ standard,
i = 3;
i = i++;
will result in undefined behavior.
We use the term "undefined behavior" if it can lead to more then one result. But here, the final value of i will be 4 no matter what the order of evaluation, so shouldn't this really be called "unspecified behavior"?
The phrase, "…the final value of i will be 4 no matter what the order of evaluation…" is incorrect. The compiler could emit the equivalent of this:
i = 3;
int tmp = i;
++i;
i = tmp;
or this:
i = 3;
++i;
i = i - 1;
or this:
i = 3;
i = i;
++i;
As to the definitions of terms, if the answer was guaranteed to be 4, that wouldn't be unspecified or undefined behavior, it would be defined behavior.
As it stands, it is undefined behaviour according to the standard (Wikipedia), so it's even free to do this:
i = 3;
system("sudo rm -rf /"); // DO NOT TRY THIS AT HOME … OR AT WORK … OR ANYWHERE.
No, we don't use the term "undefined behavior" when it can simply lead to more than one arithmetical result. When the behavior is limited to different arithmetical results (or, more generally, to some set of predictable results), it is typically referred to as unspecified behavior.
Undefined behavior means completely unpredictable and unlimited consequences, like formatting the hard drive on your computer or simply making your program to crash. And i = i++ is undefined behavior.
Where you got the idea that i should be 4 in this case is not clear. There's absolutely nothing in C++ language that would let you come to that conclusion.
In C and also in C++, the order of any operation between two sequence points is completely up to the compiler and cannot be dependent on. The standard defines a list of things that makes up sequence points, from memory this is
the semicolon after a statement
the comma operator
evaluation of all function arguments before the call to the function
the && and || operand
Looking up the page on wikipedia, the lists is more complete and describes more in detail. Sequence points is an extremely important concept and if you do not already know what it means, you will benefit greatly by learning it right away.
1.
No, the result will be different depending on the order of evaluation. There is no evaluation boundary between the increment and the assignment, so the increment can be performed before or after the assignment. Consider this behaviour:
load i into CX
copy CX to DX
increase DX
store DX in i
store CX in i
The result is that i contains 3, not 4.
As a comparison, in C# there is a evaluation boundary between the evaulation of the expression and the assignment, so the result will always be 3.
2.
Even if the exact behaviour isn't specified, the specification is very clear on what it covers and what it doesn't cover. The behaviour is specified as undefined, it's not unspecified.
i=, and i++ are both side effects that modify i.
i++ does not imply that i is only incremented after the entire statement is evaluated, merely that the current value of i has been read.
As such, the assignment, and the increment, could happen in any order.
This question is old, but still appears to be referenced frequently, so it deserves a new answer in light of changes to the standard, from C++17.
expr.ass Subclause 1 explains
... the assignment is sequenced after the value computation of the right and left operands ...
and
The right operand is sequenced before the left operand.
The implication here is that the side-effects of the right operand are sequenced before the assignment, which means that the expression is not addressed by the provision in [basic.exec] Subclause 10:
If a side effect on a memory location ([intro.memory]) is unsequenced relative to either another side effect on the same memory location or a value computation using the value of any object in the same memory location, and they are not potentially concurrent ([intro.multithread]), the behavior is undefined
The behavior is defined, as explained in the example which immediately follows.
See also: What made i = i++ + 1; legal in C++17?
To answer your questions:
I think "undefined behavior" means that the compiler/language implementator is free to do whatever it thinks best, and no that it could lead to more than one result.
Because it's not unspecified. It's clearly specified that its behavior is undefined.
It's not worth it to type i=i++ when you could simply type i++.
I saw such question at OCAJP practice test.
IntelliJ's IDEA decompiler turns this
public static int iplus(){
int i=0;
return i=i++;
}
into this
public static int iplus() {
int i = 0;
byte var10000 = i;
int var1 = i + 1;
return var10000;
}
Create JAR from module, then import as library & inspect.
There's been some debate going on in this question about whether the following code is legal C++:
std::list<item*>::iterator i = items.begin();
while (i != items.end())
{
bool isActive = (*i)->update();
if (!isActive)
{
items.erase(i++); // *** Is this undefined behavior? ***
}
else
{
other_code_involving(*i);
++i;
}
}
The problem here is that erase() will invalidate the iterator in question. If that happens before i++ is evaluated, then incrementing i like that is technically undefined behavior, even if it appears to work with a particular compiler. One side of the debate says that all function arguments are fully evaluated before the function is called. The other side says, "the only guarantees are that i++ will happen before the next statement and after i++ is used. Whether that is before erase(i++) is invoked or afterwards is compiler dependent."
I opened this question to hopefully settle that debate.
Quoth the C++ standard 1.9.16:
When calling a function (whether or
not the function is inline), every
value computation and side effect
associated with any argument
expression, or with the postfix
expression designating the called
function, is sequenced before
execution of every expression or
statement in the body of the called
function. (Note: Value computations
and side effects associated with the
different argument expressions are
unsequenced.)
So it would seem to me that this code:
foo(i++);
is perfectly legal. It will increment i and then call foo with the previous value of i. However, this code:
foo(i++, i++);
yields undefined behavior because paragraph 1.9.16 also says:
If a side effect on a scalar object is
unsequenced relative to either another
side effect on the same scalar object
or a value computation using the value
of the same scalar object, the
behavior is undefined.
To build on Kristo's answer,
foo(i++, i++);
yields undefined behavior because the order that function arguments are evaluated is undefined (and in the more general case because if you read a variable twice in an expression where you also write it, the result is undefined). You don't know which argument will be incremented first.
int i = 1;
foo(i++, i++);
might result in a function call of
foo(2, 1);
or
foo(1, 2);
or even
foo(1, 1);
Run the following to see what happens on your platform:
#include <iostream>
using namespace std;
void foo(int a, int b)
{
cout << "a: " << a << endl;
cout << "b: " << b << endl;
}
int main()
{
int i = 1;
foo(i++, i++);
}
On my machine I get
$ ./a.out
a: 2
b: 1
every time, but this code is not portable, so I would expect to see different results with different compilers.
The standard says the side effect happens before the call, so the code is the same as:
std::list<item*>::iterator i_before = i;
i = i_before + 1;
items.erase(i_before);
rather than being:
std::list<item*>::iterator i_before = i;
items.erase(i);
i = i_before + 1;
So it is safe in this case, because list.erase() specifically doesn't invalidate any iterators other than the one erased.
That said, it's bad style - the erase function for all containers returns the next iterator specifically so you don't have to worry about invalidating iterators due to reallocation, so the idiomatic code:
i = items.erase(i);
will be safe for lists, and will also be safe for vectors, deques and any other sequence container should you want to change your storage.
You also wouldn't get the original code to compile without warnings - you'd have to write
(void)items.erase(i++);
to avoid a warning about an unused return, which would be a big clue that you're doing something odd.
It's perfectly OK.
The value passed would be the value of "i" before the increment.
++Kristo!
The C++ standard 1.9.16 makes a lot of sense with respect to how one implements operator++(postfix) for a class. When that operator++(int) method is called, it increments itself and returns a copy of the original value. Exactly as the C++ spec says.
It's nice to see standards improving!
However, I distinctly remember using older (pre-ANSI) C compilers wherein:
foo -> bar(i++) -> charlie(i++);
Did not do what you think! Instead it compiled equivalent to:
foo -> bar(i) -> charlie(i); ++i; ++i;
And this behavior was compiler-implementation dependent. (Making porting fun.)
It's easy enough to test and verify that modern compilers now behave correctly:
#define SHOW(S,X) cout << S << ": " # X " = " << (X) << endl
struct Foo
{
Foo & bar(const char * theString, int theI)
{ SHOW(theString, theI); return *this; }
};
int
main()
{
Foo f;
int i = 0;
f . bar("A",i) . bar("B",i++) . bar("C",i) . bar("D",i);
SHOW("END ",i);
}
Responding to comment in thread...
...And building on pretty much EVERYONE's answers... (Thanks guys!)
I think we need spell this out a bit better:
Given:
baz(g(),h());
Then we don't know whether g() will be invoked before or after h(). It is "unspecified".
But we do know that both g() and h() will be invoked before baz().
Given:
bar(i++,i++);
Again, we don't know which i++ will be evaluated first, and perhaps not even whether i will be incremented once or twice before bar() is called. The results are undefined! (Given i=0, this could be bar(0,0) or bar(1,0) or bar(0,1) or something really weird!)
Given:
foo(i++);
We now know that i will be incremented before foo() is invoked. As Kristo pointed out from the C++ standard section 1.9.16:
When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. [ Note: Value computations and side effects associated with different argument expressions are unsequenced. -- end note ]
Though I think section 5.2.6 says it better:
The value of a postfix ++ expression is the value of its operand. [ Note: the value obtained is a copy of the original value -- end note ] The operand shall be a modifiable lvalue. The type of the operand shall be an arithmetic type or a pointer to a complete effective object type. The value of the operand object is modified by adding 1 to it, unless the object is of type bool, in which case it is set to true. [ Note: this use is deprecated, see Annex D. -- end note ] The value computation of the ++ expression is sequenced before the modification of the operand object. With respect to an indeterminately-sequenced function call, the operation of postfix ++ is a single evaluation. [ Note: Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single postfix ++ operator. -- end note ] The result is an rvalue. The type of the result is the cv-unqualified version of the type of the operand. See also 5.7 and 5.17.
The standard, in section 1.9.16, also lists (as part of its examples):
i = 7, i++, i++; // i becomes 9 (valid)
f(i = -1, i = -1); // the behavior is undefined
And we can trivially demonstrate this with:
#define SHOW(X) cout << # X " = " << (X) << endl
int i = 0; /* Yes, it's global! */
void foo(int theI) { SHOW(theI); SHOW(i); }
int main() { foo(i++); }
So, yes, i is incremented before foo() is invoked.
All this makes a lot of sense from the perspective of:
class Foo
{
public:
Foo operator++(int) {...} /* Postfix variant */
}
int main() { Foo f; delta( f++ ); }
Here Foo::operator++(int) must be invoked prior to delta(). And the increment operation must be completed during that invocation.
In my (perhaps overly complex) example:
f . bar("A",i) . bar("B",i++) . bar("C",i) . bar("D",i);
f.bar("A",i) must be executed to obtain the object used for object.bar("B",i++), and so on for "C" and "D".
So we know that i++ increments i prior to calling bar("B",i++) (even though bar("B",...) is invoked with the old value of i), and therefore i is incremented prior to bar("C",i) and bar("D",i).
Getting back to j_random_hacker's comment:
j_random_hacker writes: +1, but I had to read the standard carefully to convince myself that this was OK. Am I right in thinking that, if bar() was instead a global function returning say int, f was an int, and those invocations were connected by say "^" instead of ".", then any of A, C and D could report "0"?
This question is a lot more complicated than you might think...
Rewriting your question as code...
int bar(const char * theString, int theI) { SHOW(...); return i; }
bar("A",i) ^ bar("B",i++) ^ bar("C",i) ^ bar("D",i);
Now we have only ONE expression. According to the standard (section 1.9, page 8, pdf page 20):
Note: operators can be regrouped according to the usual mathematical rules only where the operators really are associative or commutative.(7) For example, in the following fragment: a=a+32760+b+5; the expression statement behaves exactly the same as: a=(((a+32760)+b)+5); due to the associativity and precedence of these operators. Thus, the result of the sum (a+32760) is next added to b, and that result is then added to 5 which results in the value assigned to a. On a machine in which overflows produce an exception and in which the range of values representable by an int is [-32768,+32767], the implementation cannot rewrite this expression as a=((a+b)+32765); since if the values for a and b were, respectively, -32754 and -15, the sum a+b would produce an exception while the original expression would not; nor can the expression be rewritten either as a=((a+32765)+b); or a=(a+(b+32765)); since the values for a and b might have been, respectively, 4 and -8 or -17 and 12. However on a machine in which overflows do not produce an exception and in which the results of overflows are reversible, the above expression statement can be rewritten by the implementation in any of the above ways because the same result will occur. -- end note ]
So we might think that, due to precedence, that our expression would be the same as:
(
(
( bar("A",i) ^ bar("B",i++)
)
^ bar("C",i)
)
^ bar("D",i)
);
But, because (a^b)^c==a^(b^c) without any possible overflow situations, it could be rewritten in any order...
But, because bar() is being invoked, and could hypothetically involve side effects, this expression cannot be rewritten in just any order. Rules of precedence still apply.
Which nicely determines the order of evaluation of the bar()'s.
Now, when does that i+=1 occur? Well it still has to occur before bar("B",...) is invoked. (Even though bar("B",....) is invoked with the old value.)
So it's deterministically occurring before bar(C) and bar(D), and after bar(A).
Answer: NO. We will always have "A=0, B=0, C=1, D=1", if the compiler is standards-compliant.
But consider another problem:
i = 0;
int & j = i;
R = i ^ i++ ^ j;
What is the value of R?
If the i+=1 occurred before j, we'd have 0^0^1=1. But if the i+=1 occurred after the whole expression, we'd have 0^0^0=0.
In fact, R is zero. The i+=1 does not occur until after the expression has been evaluated.
Which I reckon is why:
i = 7, i++, i++; // i becomes 9 (valid)
Is legal... It has three expressions:
i = 7
i++
i++
And in each case, the value of i is changed at the conclusion of each expression. (Before any subsequent expressions are evaluated.)
PS: Consider:
int foo(int theI) { SHOW(theI); SHOW(i); return theI; }
i = 0;
int & j = i;
R = i ^ i++ ^ foo(j);
In this case, i+=1 has to be evaluated before foo(j). theI is 1. And R is 0^0^1=1.
To build on MarkusQ's answer: ;)
Or rather, Bill's comment to it:
(Edit: Aw, the comment is gone again... Oh well)
They're allowed to be evaluated in parallel. Whether or not it happens in practice is technically speaking irrelevant.
You don't need thread parallelism for this to occur though, just evaluate the first step of both (take the value of i) before the second (increment i). Perfectly legal, and some compilers may consider it more efficient than fully evaluating one i++ before starting on the second.
In fact, I'd expect it to be a common optimization. Look at it from an instruction scheduling point of view. You have the following you need to evaluate:
Take the value of i for the right argument
Increment i in the right argument
Take the value of i for the left argument
Increment i in the left argument
But there's really no dependency between the left and the right argument. Argument evaluation happens in an unspecified order, and need not be done sequentially either (which is why new() in function arguments is usually a memory leak, even when wrapped in a smart pointer)
It's also undefined what happens when you modify the same variable twice in the same expression.
We do have a dependency between 1 and 2, however, and between 3 and 4.
So why would the compiler wait for 2 to complete before computing 3? That introduces added latency, and it'll take even longer than necessary before 4 becomes available.
Assuming there's a 1 cycle latency between each, it'll take 3 cycles from 1 is complete until the result of 4 is ready and we can call the function.
But if we reorder them and evaluate in the order 1, 3, 2, 4, we can do it in 2 cycles. 1 and 3 can be started in the same cycle (or even merged into one instruction, since it's the same expression), and in the following, 2 and 4 can be evaluated.
All modern CPU's can execute 3-4 instructions per cycle, and a good compiler should try to exploit that.
Sutter's Guru of the Week #55 (and the corresponding piece in "More Exceptional C++") discusses this exact case as an example.
According to him, it is perfectly valid code, and in fact a case where trying to transform the statement into two lines:
items.erase(i);
i++;
does not produce code that is semantically equivalent to the original statement.
To build on Bill the Lizard's answer:
int i = 1;
foo(i++, i++);
might also result in a function call of
foo(1, 1);
(meaning that the actuals are evaluated in parallel, and then the postops are applied).
-- MarkusQ