This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Pre & post increment operator behavior in C, C++, Java, & C#
Here is a test case:
void foo(int i, int j)
{
printf("%d %d", i, j);
}
...
test = 0;
foo(test++, test);
I would expect to get a "0 1" output, but I get "0 0"
What gives??
This is an example of unspecified behavior. The standard does not say what order arguments should be evaluated in. This is a compiler implementation decision. The compiler is free to evaluate the arguments to the function in any order.
In this case, it looks like actually processes the arguments right to left instead of the expected left to right.
In general, doing side-effects in arguments is bad programming practice.
Instead of foo(test++, test); you should write foo(test, test+1); test++;
It would be semantically equivalent to what you are trying to accomplish.
Edit:
As Anthony correctly points out, it is undefined to both read and modify a single variable without an intervening sequence point. So in this case, the behavior is indeed undefined. So the compiler is free to generate whatever code it wants.
This is not just unspecified behaviour, it is actually undefined behaviour .
Yes, the order of argument evaluation is unspecified, but it is undefined to both read and modify a single variable without an intervening sequence point unless the read is solely for the purpose of computing the new value. There is no sequence point between the evaluations of function arguments, so f(test,test++) is undefined behaviour: test is being read for one argument and modified for the other. If you move the modification into a function then you're fine:
int preincrement(int* p)
{
return ++(*p);
}
int test;
printf("%d %d\n",preincrement(&test),test);
This is because there is a sequence point on entry and exit to preincrement, so the call must be evaluated either before or after the simple read. Now the order is just unspecified.
Note also that the comma operator provides a sequence point, so
int dummy;
dummy=test++,test;
is fine --- the increment happens before the read, so dummy is set to the new value.
Everything I said originally is WRONG! The point in time at which the side-affect is calculated is unspecified. Visual C++ will perform the increment after the call to foo() if test is a local variable, but if test is declared as static or global it will be incremented before the call to foo() and produce different results, although the final value of test will be correct.
The increment should really be done in a separate statement after the call to foo(). Even if the behaviour was specified in the C/C++ standard it would be confusing. You would think that C++ compilers would flag this as a potential error.
Here is a good description of sequence points and unspecified behaviour.
<----START OF WRONG WRONG WRONG---->
The "++" bit of "test++" gets executed after the call to foo. So you pass in (0,0) to foo, not (1,0)
Here is the assembler output from Visual Studio 2002:
mov ecx, DWORD PTR _i$[ebp]
push ecx
mov edx, DWORD PTR tv66[ebp]
push edx
call _foo
add esp, 8
mov eax, DWORD PTR _i$[ebp]
add eax, 1
mov DWORD PTR _i$[ebp], eax
The increment is done AFTER the call to foo(). While this behavior is by design, it is certainly confusing to the casual reader and should probably be avoided. The increment should really be done in a separate statement after the call to foo()
<----END OF WRONG WRONG WRONG ---->
It's "unspecified behavior", but in practice with the way the C call stack is specified it almost always guarantees that you will see it as 0, 0 and never 1, 0.
As someone noted, the assembler output by VC pushes the right most parameter on the stack first. This is how C function calls are implemented in assembler. This is to accommodate C's "endless parameter list" feature. By pushing parameters in a right-to-left order, the first parameter is guaranteed to be the top item on the stack.
Take printf's signature:
int printf(const char *format, ...);
Those ellipses denote an unknown number of parameters. If parameters were pushed left-to-right, the format would be at the bottom of a stack of which we don't know the size.
Knowing that in C (and C++) that parameters are processed left-to-right, we can determine the simplest way of parsing and interpreting a function call. Get to the end of the parameter list, and start pushing, evaluating any complex statements as you go.
However, even this can't save you as most C compilers have an option to parse functions "Pascal style". And all this means is that the function parameters are pushed on the stack in a left-to-right fashion. If, for instance, printf was compiled with the Pascal option, then the output would most likely be 1, 0 (however, since printf uses the ellipse, I don't think it can be compiled Pascal style).
C doesn't guarantee the order of evaluation of parameters in a function call, so with this you might get the results "0 1" or "0 0". The order can change from compiler to compiler, and the same compiler could choose different orders based on optimization parameters.
It's safer to write foo(test, test + 1) and then do ++test in the next line. Anyway, the compiler should optimize it if possible.
The order of evaluation for arguments to a function is undefined. In this case it appears that it did them right-to-left.
(Modifying variables between sequence points basically allows a compiler to do anything it wants.)
Um, now that the OP has been edited for consistency, it is out of sync with the answers. The fundamental answer about order of evaluation is correct. However the specific possible values are different for the foo(++test, test); case.
++test will be incremented before being passed, so the first argument will always be 1. The second argument will be 0, or 1 depending on evaluation order.
According to the C standard, it is undefined behaviour to have more than one references to a variable in a single sequence point (here you can think of that as being a statement, or parameters to a function) where one of more of those references includes a pre/post modification.
So:
foo(f++,f) <--undefined as to when f increments.
And likewise (I see this all the time in user code):
*p = p++ + p;
Typically a compiler will not change its behaviour for this type of thing (except for major revisions).
Avoid it by turning on warnings and paying attention to them.
To repeat what others have said, this is not unspecified behavior, but rather undefined. This program can legally output anything or nothing, leave n at any value, or send insulting email to your boss.
As a matter of practice, compiler writers will usually just do what's easiest for them to write, which generally means that the program will fetch n once or twice, call the function, and increment sometime. This, like any other conceivable behavior, is just fine according to the standard. There is no reason to expect the same behavior between compilers, or versions, or with different compiler options. There is no reason why two different but similar-looking examples in the same program have to be compiled consistently, although that's the way I'd bet.
In short, don't do this. Test it under different circumstances if you're curious, but don't pretend that there is a single correct or even predictable result.
The compiler might not be evaluating the arguments in the order you'd expect.
Related
Okay, I'm aware that the standard dictates that a C++ implementation may choose in which order arguments of a function are evaluated, but are there any implementations that actually 'take advantage' of this in a scenario where it would actually affect the program?
Classic Example:
int i = 0;
foo(i++, i++);
Note: I'm not looking for someone to tell me that the order of evaluation can't be relied on, I'm well aware of that. I'm only interested in whether any compilers actually do evaluate out of a left-to-right order because my guess would be that if they did lots of poorly written code would break (rightly so, but they would still probably complain).
It depends on the argument type, the called function's calling convention, the archtecture and the compiler. On an x86, the Pascal calling convention evaluates arguments left to right whereas in the C calling convention (__cdecl) it is right to left. Most programs which run on multiple platforms do take into account the calling conventions to skip surprises.
There is a nice article on Raymond Chen' blog if you are interested. You may also want to take a look at the Stack and Calling section of the GCC manual.
Edit: So long as we are splitting hairs: My answer treats this not as a language question but as a platform one. The language standard does not gurantee or prefer one over the other and leaves it as unspecified. Note the wording. It does not say this is undefined. Unspecified in this sense means something you cannot count on, non-portable behavior. I don't have the C spec/draft handy but it should be similar to that from my n2798 draft (C++)
Certain other aspects and operations of the abstract machine are described in this International Standard as unspecified (for example, order of evaluation of arguments to a function). Where possible, this International Standard defines a set of allowable behaviors. These define the nondeterministic aspects of the abstract machine. An instance of the abstract machine can thus have more than one possible execution sequence for a given program and a given input.
I found answer in c++ standards.
Paragraph 5.2.2.8:
The order of evaluation of arguments is unspecified. All side effects of argument expression evaluations take effect before the function is entered. The order of evaluation of the postfix expression and the argument expression list is
unspecified.
In other words, It depends on compiler only.
Read this
It's not an exact copy of your question, but my answer (and a few others) cover your question as well.
There are very good optimization reasons why the compiler might not just choose right-to-left but also interleave them.
The standard doesn't even guarantee a sequential ordering. It only guarantees that when the function gets called, all arguments have been fully evaluated.
And yes, I have seen a few versions of GCC do exactly this. For your example, foo(0,0) would be called, and i would be 2 afterwards. (I can't give you the exact version number of the compiler. It was a while ago - but I wouldn't be surprised to see this behavior pop up again. It's an efficient way to schedule instructions)
All arguments are evaluated. Order not defined (as per standard). But all implementations of C/C++ (that I know of) evaluate function arguments from right to left. EDIT: CLang is an exception (see comment below).
I believe that the right-to-left evaluation order has been very very old (since the first C compilers). Certainly way before C++ was invented, and most implementations of C++ would be keeping the same evaluation order because early C++ implementations simply translated into C.
There are some technical reasons for evaluating function arguments right-to-left. In stack architectures, arguments are typically pushed onto the stack. In C/C++, you can call a function with more arguments than actually specified -- the extra arguments are simiply ignored. If arguments are evaluated left-to-right, and pushed left-to-right, then the stack slot right under the stack pointer will hold the last argument, and there is no way for the function to get at the offset of any particular argument (because the actual number of arguments pushed depends on the caller).
In a right-to-left push order, the stack slot right under the stack pointer will always hold the first argument, and the next slot holds the second argument etc. Argument offsets will always be deterministic for the function (which may be written and compiled elsewhere into a library, separately from where it is called).
Now, right-to-left push order does not mandate right-to-left evaluation order, but in early compilers, memory is scarce. In right-to-left evaluation order, the same stack can be used in-place (essentially, after evaluating the argument -- which may be an expression or a funciton call! -- the return value is already at the right position on the stack). In left-to-right evaluation, the argument values must be stored separately and the pushed back to the stack in reverse order.
Last time I saw differences was between VS2005 and GCC 3.x on an x86 hardware in 2007.
So it's (was?) a very likely situation. So I never rely on evaluation order anymore. Maybe it's better now.
I expect that most modern compilers would attempt to interleave the instructions computing the arguments, given that they are required by the C++ standard to be independent and thus lack any interdependencies. Doing this should help to keep a deeply-pipelined CPU's execution units full and thereby increase throughput. (At least I would expect that a compiler that claims to be an optimising compiler would do so when optimisation flags are given.)
I was reading about order of evaluation violations, and they give an example that puzzles me.
1) If a side effect on a scalar object is un-sequenced relative to another side effect on the same scalar object, the behavior is undefined.
// snip
f(i = -1, i = -1); // undefined behavior
In this context, i is a scalar object, which apparently means
Arithmetic types (3.9.1), enumeration types, pointer types, pointer to member types (3.9.2), std::nullptr_t, and cv-qualified versions of these types (3.9.3) are collectively called scalar types.
I don’t see how the statement is ambiguous in that case. It seems to me that regardless of if the first or second argument is evaluated first, i ends up as -1, and both arguments are also -1.
Can someone please clarify?
UPDATE
I really appreciate all the discussion. So far, I like #harmic’s answer a lot since it exposes the pitfalls and intricacies of defining this statement in spite of how straight forward it looks at first glance. #acheong87 points out some issues that come up when using references, but I think that's orthogonal to the unsequenced side effects aspect of this question.
SUMMARY
Since this question got a ton of attention, I will summarize the main points/answers. First, allow me a small digression to point out that "why" can have closely related yet subtly different meanings, namely "for what cause", "for what reason", and "for what purpose". I will group the answers by which of those meanings of "why" they addressed.
for what cause
The main answer here comes from Paul Draper, with Martin J contributing a similar but not as extensive answer. Paul Draper's answer boils down to
It is undefined behavior because it is not defined what the behavior is.
The answer is overall very good in terms of explaining what the C++ standard says. It also addresses some related cases of UB such as f(++i, ++i); and f(i=1, i=-1);. In the first of the related cases, it's not clear if the first argument should be i+1 and the second i+2 or vice versa; in the second, it's not clear if i should be 1 or -1 after the function call. Both of these cases are UB because they fall under the following rule:
If a side effect on a scalar object is unsequenced relative to another side effect on the same scalar object, the behavior is undefined.
Therefore, f(i=-1, i=-1) is also UB since it falls under the same rule, despite that the intention of the programmer is (IMHO) obvious and unambiguous.
Paul Draper also makes it explicit in his conclusion that
Could it have been defined behavior? Yes. Was it defined? No.
which brings us to the question of "for what reason/purpose was f(i=-1, i=-1) left as undefined behavior?"
for what reason / purpose
Although there are some oversights (maybe careless) in the C++ standard, many omissions are well-reasoned and serve a specific purpose. Although I am aware that the purpose is often either "make the compiler-writer's job easier", or "faster code", I was mainly interested to know if there is a good reason leave f(i=-1, i=-1) as UB.
harmic and supercat provide the main answers that provide a reason for the UB. Harmic points out that an optimizing compiler that might break up the ostensibly atomic assignment operations into multiple machine instructions, and that it might further interleave those instructions for optimal speed. This could lead to some very surprising results: i ends up as -2 in his scenario! Thus, harmic demonstrates how assigning the same value to a variable more than once can have ill effects if the operations are unsequenced.
supercat provides a related exposition of the pitfalls of trying to get f(i=-1, i=-1) to do what it looks like it ought to do. He points out that on some architectures, there are hard restrictions against multiple simultaneous writes to the same memory address. A compiler could have a hard time catching this if we were dealing with something less trivial than f(i=-1, i=-1).
davidf also provides an example of interleaving instructions very similar to harmic's.
Although each of harmic's, supercat's and davidf' examples are somewhat contrived, taken together they still serve to provide a tangible reason why f(i=-1, i=-1) should be undefined behavior.
I accepted harmic's answer because it did the best job of addressing all meanings of why, even though Paul Draper's answer addressed the "for what cause" portion better.
other answers
JohnB points out that if we consider overloaded assignment operators (instead of just plain scalars), then we can run into trouble as well.
Since the operations are unsequenced, there is nothing to say that the instructions performing the assignment cannot be interleaved. It might be optimal to do so, depending on CPU architecture. The referenced page states this:
If A is not sequenced before B and B is not sequenced before A, then
two possibilities exist:
evaluations of A and B are unsequenced: they may be performed in any order and may overlap (within a single thread of execution, the
compiler may interleave the CPU instructions that comprise A and B)
evaluations of A and B are indeterminately-sequenced: they may be performed in any order but may not overlap: either A will be complete
before B, or B will be complete before A. The order may be the
opposite the next time the same expression is evaluated.
That by itself doesn't seem like it would cause a problem - assuming that the operation being performed is storing the value -1 into a memory location. But there is also nothing to say that the compiler cannot optimize that into a separate set of instructions that has the same effect, but which could fail if the operation was interleaved with another operation on the same memory location.
For example, imagine that it was more efficient to zero the memory, then decrement it, compared with loading the value -1 in. Then this:
f(i=-1, i=-1)
might become:
clear i
clear i
decr i
decr i
Now i is -2.
It is probably a bogus example, but it is possible.
First, "scalar object" means a type like a int, float, or a pointer (see What is a scalar Object in C++?).
Second, it may seem more obvious that
f(++i, ++i);
would have undefined behavior. But
f(i = -1, i = -1);
is less obvious.
A slightly different example:
int i;
f(i = 1, i = -1);
std::cout << i << "\n";
What assignment happened "last", i = 1, or i = -1? It's not defined in the standard. Really, that means i could be 5 (see harmic's answer for a completely plausible explanation for how this chould be the case). Or you program could segfault. Or reformat your hard drive.
But now you ask: "What about my example? I used the same value (-1) for both assignments. What could possibly be unclear about that?"
You are correct...except in the way the C++ standards committee described this.
If a side effect on a scalar object is unsequenced relative to another side effect on the same scalar object, the behavior is undefined.
They could have made a special exception for your special case, but they didn't. (And why should they? What use would that ever possibly have?) So, i could still be 5. Or your hard drive could be empty. Thus the answer to your question is:
It is undefined behavior because it is not defined what the behavior is.
(This deserves emphasis because many programmers think "undefined" means "random", or "unpredictable". It doesn't; it means not defined by the standard. The behavior could be 100% consistent, and still be undefined.)
Could it have been defined behavior? Yes. Was it defined? No. Hence, it is "undefined".
That said, "undefined" doesn't mean that a compiler will format your hard drive...it means that it could and it would still be a standards-compliant compiler. Realistically, I'm sure g++, Clang, and MSVC will all do what you expected. They just wouldn't "have to".
A different question might be Why did the C++ standards committee choose to make this side-effect unsequenced?. That answer will involve history and opinions of the committee. Or What is good about having this side-effect unsequenced in C++?, which permits any justification, whether or not it was the actual reasoning of the standards committee. You could ask those questions here, or at programmers.stackexchange.com.
A practical reason to not make an exception from the rules just because the two values are the same:
// config.h
#define VALUEA 1
// defaults.h
#define VALUEB 1
// prog.cpp
f(i = VALUEA, i = VALUEB);
Consider the case this was allowed.
Now, some months later, the need arises to change
#define VALUEB 2
Seemingly harmless, isn't it? And yet suddenly prog.cpp wouldn't compile anymore.
Yet, we feel that compilation should not depend on the value of a literal.
Bottom line: there is no exception to the rule because it would make successful compilation depend on the value (rather the type) of a constant.
EDIT
#HeartWare pointed out that constant expressions of the form A DIV B are not allowed in some languages, when B is 0, and cause compilation to fail. Hence changing of a constant could cause compilation errors in some other place. Which is, IMHO, unfortunate. But it is certainly good to restrict such things to the unavoidable.
The confusion is that storing a constant value into a local variable is not one atomic instruction on every architecture the C is designed to be run on. The processor the code runs on matters more than the compiler in this case. For example, on ARM where each instruction can not carry a complete 32 bits constant, storing an int in a variable needs more that one instruction. Example with this pseudo code where you can only store 8 bits at a time and must work in a 32 bits register, i is a int32:
reg = 0xFF; // first instruction
reg |= 0xFF00; // second
reg |= 0xFF0000; // third
reg |= 0xFF000000; // fourth
i = reg; // last
You can imagine that if the compiler wants to optimize it may interleave the same sequence twice, and you don't know what value will get written to i; and let's say that he is not very smart:
reg = 0xFF;
reg |= 0xFF00;
reg |= 0xFF0000;
reg = 0xFF;
reg |= 0xFF000000;
i = reg; // writes 0xFF0000FF == -16776961
reg |= 0xFF00;
reg |= 0xFF0000;
reg |= 0xFF000000;
i = reg; // writes 0xFFFFFFFF == -1
However in my tests gcc is kind enough to recognize that the same value is used twice and generates it once and does nothing weird. I get -1, -1
But my example is still valid as it is important to consider that even a constant may not be as obvious as it seems to be.
Behavior is commonly specified as undefined if there is some conceivable reason why a compiler which was trying to be "helpful" might do something which would cause totally unexpected behavior.
In the case where a variable is written multiple times with nothing to ensure that the writes happen at distinct times, some kinds of hardware might allow multiple "store" operations to be performed simultaneously to different addresses using a dual-port memory. However, some dual-port memories expressly forbid the scenario where two stores hit the same address simultaneously, regardless of whether or not the values written match. If a compiler for such a machine notices two unsequenced attempts to write the same variable, it might either refuse to compile or ensure that the two writes cannot get scheduled simultaneously. But if one or both of the accesses is via a pointer or reference, the compiler might not always be able to tell whether both writes might hit the same storage location. In that case, it might schedule the writes simultaneously, causing a hardware trap on the access attempt.
Of course, the fact that someone might implement a C compiler on such a platform does not suggest that such behavior shouldn't be defined on hardware platforms when using stores of types small enough to be processed atomically. Trying to store two different values in unsequenced fashion could cause weirdness if a compiler isn't aware of it; for example, given:
uint8_t v; // Global
void hey(uint8_t *p)
{
moo(v=5, (*p)=6);
zoo(v);
zoo(v);
}
if the compiler in-lines the call to "moo" and can tell it doesn't modify
"v", it might store a 5 to v, then store a 6 to *p, then pass 5 to "zoo",
and then pass the contents of v to "zoo". If "zoo" doesn't modify "v",
there should be no way the two calls should be passed different values,
but that could easily happen anyway. On the other hand, in cases where
both stores would write the same value, such weirdness could not occur and
there would on most platforms be no sensible reason for an implementation
to do anything weird. Unfortunately, some compiler writers don't need any
excuse for silly behaviors beyond "because the Standard allows it", so even
those cases aren't safe.
C++17 defines stricter evaluation rules. In particular, it sequences function arguments (although in unspecified order).
N5659 §4.6:15
Evaluations A and B are indeterminately sequenced when either A is sequenced before B or B is sequenced before A,
but it is unspecified which. [ Note: Indeterminately sequenced evaluations cannot overlap, but either could
be executed first. —end note ]
N5659 § 8.2.2:5
The
initialization of a parameter, including every associated value computation and side effect, is indeterminately
sequenced with respect to that of any other parameter.
It allows some cases which would be UB before:
f(i = -1, i = -1); // value of i is -1
f(i = -1, i = -2); // value of i is either -1 or -2, but not specified which one
The fact that the result would be the same in most implementations in this case is incidental; the order of evaluation is still undefined. Consider f(i = -1, i = -2): here, order matters. The only reason it doesn't matter in your example is the accident that both values are -1.
Given that the expression is specified as one with an undefined behaviour, a maliciously compliant compiler might display an inappropriate image when you evaluate f(i = -1, i = -1) and abort the execution - and still be considered completely correct. Luckily, no compilers I am aware of do so.
It looks to me like the only rule pertaining to sequencing of function argument expression is here:
3) When calling a function (whether or not the function is inline, and whether or not explicit function call syntax is used), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function.
This does not define sequencing between argument expressions, so we end up in this case:
1) If a side effect on a scalar object is unsequenced relative to another side effect on the same scalar object, the behavior is undefined.
In practice, on most compilers, the example you quoted will run fine (as opposed to "erasing your hard disk" and other theoretical undefined behavior consequences).
It is, however, a liability, as it depends on specific compiler behaviour, even if the two assigned values are the same. Also, obviously, if you tried to assign different values, the results would be "truly" undefined:
void f(int l, int r) {
return l < -1;
}
auto b = f(i = -1, i = -2);
if (b) {
formatDisk();
}
The assignment operator could be overloaded, in which case the order could matter:
struct A {
bool first;
A () : first (false) {
}
const A & operator = (int i) {
first = !first;
return * this;
}
};
void f (A a1, A a2) {
// ...
}
// ...
A i;
f (i = -1, i = -1); // the argument evaluated first has ax.first == true
Actually, there's a reason not to depend on the fact that compiler will check that i is assigned with the same value twice, so that it's possible to replace it with single assignment. What if we have some expressions?
void g(int a, int b, int c, int n) {
int i;
// hey, compiler has to prove Fermat's theorem now!
f(i = 1, i = (ipow(a, n) + ipow(b, n) == ipow(c, n)));
}
This is just answering the "I'm not sure what "scalar object" could mean besides something like an int or a float".
I would interpret the "scalar object" as a abbreviation of "scalar type object", or just "scalar type variable". Then, pointer, enum (constant) are of scalar type.
This is a MSDN article of Scalar Types.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Undefined Behavior and Sequence Points
How the statement x=x++ + y++; executes to the value 3?
I was wondering how printf work in a case like this:
int i = 0;
printf("%4d%4d", i++, i);
Result is 0 1
in another case
int i = 0;
printf("%4d%4d", i, i++);
Result is 1 0
This has nothing to do with printf, and everything to do with the order in which the parameters are evaluated and the way the compiler executes your code. The behavior is undefined, and the results will depend on your compiler, calling convention, and phase of the moon.
In both your examples, the rules of pre/post incrementing are taking precedence. Your particular compiler understands that it must use the value of i before evaluating the increment, and is giving precedence to the parameter that invokes a function call over the one that doesn't. Your second usage of the variable i is causing the compiler to insert an intermediary statement in the process of calling printf,
It's important to note that i++ doesn't mean (as is commonly taught) "increment i after executing this line", it just means "increment i at some point after giving me its value, and before executing the next line". That's a lot of wiggle room for the compiler to do what is formally called "undefined behavior."
As #Als points out in a comment, you've managed to combine both undefined and unspecified behaviors in one line of code.
This is not due to printf it's due to you being in a case of undefined behaviour
Consider the following C++ Standard ISO/IEC 14882:2003(E) citation (section 5, paragraph 4):
Except where noted, the order of
evaluation of operands of individual
operators and subexpressions of individual
expressions, and the order in
which side effects take place, is
unspecified. 53) Between the previous
and next sequence point a scalar
object shall have its stored value
modified at most once by the
evaluation of an expression.
Furthermore, the prior value shall be
accessed only to determine the value
to be stored. The requirements of this
paragraph shall be met for each
allowable ordering of the
subexpressions of a full expression;
otherwise the behavior is undefined.
[Example:
i = v[i++]; // the behavior is unspecified
i = 7, i++, i++; // i becomes 9
i = ++i + 1; // the behavior is unspecified
i = i + 1; // the value of i is incremented
—end example]
I was surprised that i = ++i + 1 gives an undefined value of i.
Does anybody know of a compiler implementation which does not give 2 for the following case?
int i = 0;
i = ++i + 1;
std::cout << i << std::endl;
The thing is that operator= has two args. First one is always i reference.
The order of evaluation does not matter in this case.
I do not see any problem except C++ Standard taboo.
Please, do not consider such cases where the order of arguments is important to evaluation. For example, ++i + i is obviously undefined. Please, consider only my case
i = ++i + 1.
Why does the C++ Standard prohibit such expressions?
You make the mistake of thinking of operator= as a two-argument function, where the side effects of the arguments must be completely evaluated before the function begins. If that were the case, then the expression i = ++i + 1 would have multiple sequence points, and ++i would be fully evaluated before the assignment began. That's not the case, though. What's being evaluated in the intrinsic assignment operator, not a user-defined operator. There's only one sequence point in that expression.
The result of ++i is evaluated before the assignment (and before the addition operator), but the side effect is not necessarily applied right away. The result of ++i + 1 is always the same as i + 2, so that's the value that gets assigned to i as part of the assignment operator. The result of ++i is always i + 1, so that's what gets assigned to i as part of the increment operator. There is no sequence point to control which value should get assigned first.
Since the code is violating the rule that "between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression," the behavior is undefined. Practically, though, it's likely that either i + 1 or i + 2 will be assigned first, then the other value will be assigned, and finally the program will continue running as usual — no nasal demons or exploding toilets, and no i + 3, either.
It's undefined behaviour, not (just) unspecified behaviour because there are two writes to i without an intervening sequence point. It is this way by definition as far as the standard specifies.
The standard allows compilers to generate code that delays writes back to storage - or from another view point, to resequence the instructions implementing side effects - any way it chooses so long as it complies with the requirements of sequence points.
The issue with this statement expression is that it implies two writes to i without an intervening sequence point:
i = i++ + 1;
One write is for the value of the original value of i "plus one" and the other is for that value "plus one" again. These writes could happen in any order or blow up completely as far as the standard allows. Theoretically this even gives implementations the freedom to perform writebacks in parallel without bothering to check for simultaneous access errors.
C/C++ defines a concept called sequence points, which refer to a point in execution where it's guaranteed that all effects of previous evaluations will have already been performed. Saying i = ++i + 1 is undefined because it increments i and also assigns i to itself, neither of which is a defined sequence point alone. Therefore, it is unspecified which will happen first.
Update for C++11 (09/30/2011)
Stop, this is well defined in C++11. It was undefined only in C++03, but C++11 is more flexible.
int i = 0;
i = ++i + 1;
After that line, i will be 2. The reason for this change was ... because it already works in practice and it would have been more work to make it be undefined than to just leave it defined in the rules of C++11 (actually, that this works now is more of an accident than a deliberate change, so please don't do it in your code!).
Straight from the horse's mouth
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#637
Given two choices: defined or undefined, which choice would you have made?
The authors of the standard had two choices: define the behavior or specify it as undefined.
Given the clearly unwise nature of writing such code in the first place, it doesn't make any sense to specify a result for it. One would want to discourage code like that and not encourage it. It's not useful or necessary for anything.
Furthermore, standards committees do not have any way to force compiler writers to do anything. Had they required a specific behavior it is likely that the requirement would have been ignored.
There are practical reasons as well, but I suspect they were subordinate to the above general consideration. But for the record, any sort of required behavior for this kind of expression and related kinds will restrict the compiler's ability to generate code, to factor out common subexpressions, to move objects between registers and memory, etc. C was already handicapped by weak visibility restrictions. Languages like Fortran long ago realized that aliased parameters and globals were an optimization-killer and I believe they simply prohibited them.
I know you were interested in a specific expression, but the exact nature of any given construct doesn't matter very much. It's not going to be easy to predict what a complex code generator will do and the language attempts to not require those predictions in silly cases.
The important part of the standard is:
its stored value modified at most once by the evaluation of an expression
You modify the value twice, once with the ++ operator, once with the assignment
Please note that your copy of the standard is outdated and contains a known (and fixed) error just in 1st and 3rd code lines of your example, see:
C++ Standard Core Language Issue Table of Contents, Revision 67, #351
and
Andrew Koenig: Sequence point error: unspecified or undefined?
The topic is not easy to get just reading the standard (which is pretty obscure :( in this case).
For example, will it be well(or not)-defined, unspecified or else in general case actually depends not only on the statement structure, but also on memory contents (to be specific, variable values) at the moment of execution, another example:
++i, ++i; //ok
(++i, ++j) + (++i, ++j); //ub, see the first reference below (12.1 - 12.3)
Please have a look at (it has it all clear and precise):
JTC1/SC22/WG14 N926 "Sequence Point Analysis"
Also, Angelika Langer has an article on the topic (though not as clear as the previous one):
"Sequence Points and Expression Evaluation in C++"
There was also a discussion in Russian (though with some apparently erroneous statements in the comments and in the post itself):
"Точки следования (sequence points)"
The following code demonstrates how you could get the wrong(unexpected) result:
int main()
{
int i = 0;
__asm { // here standard conformant implementation of i = ++i + 1
mov eax, i;
inc eax;
mov ecx, 1;
add ecx, eax;
mov i, ecx;
mov i, eax; // delayed write
};
cout << i << endl;
}
It will print 1 as a result.
Assuming you are asking "Why is the language designed this way?".
You say that i = ++i + i is "obviously undefined" but i = ++i + 1 should leave i with a defined value? Frankly, that would not be very consistent. I prefer to have either everything perfectly defined, or everything consistently unspecified. In C++ I have the latter. It's not a terribly bad choice per se - for one thing, it prevents you from writing evil code which makes five or six modifications in the same "statement".
Argument by analogy: If you think of operators as types of functions, then it kind of makes sense. If you had a class with an overloaded operator=, your assignment statement would be equivalent to something like this:
operator=(i, ++i+1)
(The first parameter is actually passed in implicitly via the this pointer, but this is just for illustration.)
For a plain function call, this is obviously undefined. The value of the first argument depends on when the second argument is evaluated. However with primitive types you get away with it because the original value of i is simply overwritten; its value doesn't matter. But if you were doing some other magic in your own operator=, then the difference could surface.
Simply put: all operators act like functions, and should therefore behave according to the same notions. If i + ++i is undefined, then i = ++i should be undefined as well.
How about, we just all agree to never, never, write code like this? If the compiler doesn't know what you want to do, how do you expect the poor sap that is following on behind you to understand what you wanted to do? Putting i++; on it's own line will not kill you.
The underlying reason is because of the way the compiler handles reading and writing of values. The compiler is allowed to store an intermediate value in memory and only actually commit the value at the end of the expression. We read the expression ++i as "increase i by one and return it", but a compiler might see it as "load the value of i, add one, return it, and the commit it back to memory before someone uses it again. The compiler is encouraged to avoid reading/writing to the actual memory location as much as possible, because that would slow the program down.
In the specific case of i = ++i + 1, it suffers largely due to the need of consistent behavioral rules. Many compilers will do the 'right thing' in such a situation, but what if one of the is was actually a pointer, pointing to i? Without this rule, the compiler would have to be very careful to make sure it performed the loads and stores in the right order. This rule serves to allow for more optimization opportunities.
A similar case is that of the so-called strict-aliasing rule. You can't assign a value (say, an int) through a value of an unrelated type (say, a float) with only a few exceptions. This keeps the compiler from having to worry that some float * being used will change the value of an int, and greatly improves optimization potential.
The problem here is that the standard allows a compiler to completely reorder a statement while it is executing. It is not, however, allowed to reorder statements (so long as any such reordering results in changed program behavior). Therefore, the expression i = ++i + 1; may be evaluated two ways:
++i; // i = 2
i = i + 1;
or
i = i + 1; // i = 2
++i;
or
i = i + 1; ++i; //(Running in parallel using, say, an SSE instruction) i = 1
This gets even worse when you have user defined types thrown in the mix, where the ++ operator can have whatever effect on the type the author of the type wants, in which case the order used in evaluation matters significantly.
i = v[i++]; // the behavior is unspecified
i = ++i + 1; // the behavior is unspecified
All the above expressions invoke Undefined Behavior.
i = 7, i++, i++; // i becomes 9
This is fine.
Read Steve Summit's C-FAQs.
From ++i, i must assigned "1", but with i = ++i + 1, it must be assigned the value "2". Since there is no intervening sequence point, the compiler can assume that the same variable is not being written twice, so this two operations can be done in any order. so yes, the compiler would be correct if the final value is 1.
Okay, I'm aware that the standard dictates that a C++ implementation may choose in which order arguments of a function are evaluated, but are there any implementations that actually 'take advantage' of this in a scenario where it would actually affect the program?
Classic Example:
int i = 0;
foo(i++, i++);
Note: I'm not looking for someone to tell me that the order of evaluation can't be relied on, I'm well aware of that. I'm only interested in whether any compilers actually do evaluate out of a left-to-right order because my guess would be that if they did lots of poorly written code would break (rightly so, but they would still probably complain).
It depends on the argument type, the called function's calling convention, the archtecture and the compiler. On an x86, the Pascal calling convention evaluates arguments left to right whereas in the C calling convention (__cdecl) it is right to left. Most programs which run on multiple platforms do take into account the calling conventions to skip surprises.
There is a nice article on Raymond Chen' blog if you are interested. You may also want to take a look at the Stack and Calling section of the GCC manual.
Edit: So long as we are splitting hairs: My answer treats this not as a language question but as a platform one. The language standard does not gurantee or prefer one over the other and leaves it as unspecified. Note the wording. It does not say this is undefined. Unspecified in this sense means something you cannot count on, non-portable behavior. I don't have the C spec/draft handy but it should be similar to that from my n2798 draft (C++)
Certain other aspects and operations of the abstract machine are described in this International Standard as unspecified (for example, order of evaluation of arguments to a function). Where possible, this International Standard defines a set of allowable behaviors. These define the nondeterministic aspects of the abstract machine. An instance of the abstract machine can thus have more than one possible execution sequence for a given program and a given input.
I found answer in c++ standards.
Paragraph 5.2.2.8:
The order of evaluation of arguments is unspecified. All side effects of argument expression evaluations take effect before the function is entered. The order of evaluation of the postfix expression and the argument expression list is
unspecified.
In other words, It depends on compiler only.
Read this
It's not an exact copy of your question, but my answer (and a few others) cover your question as well.
There are very good optimization reasons why the compiler might not just choose right-to-left but also interleave them.
The standard doesn't even guarantee a sequential ordering. It only guarantees that when the function gets called, all arguments have been fully evaluated.
And yes, I have seen a few versions of GCC do exactly this. For your example, foo(0,0) would be called, and i would be 2 afterwards. (I can't give you the exact version number of the compiler. It was a while ago - but I wouldn't be surprised to see this behavior pop up again. It's an efficient way to schedule instructions)
All arguments are evaluated. Order not defined (as per standard). But all implementations of C/C++ (that I know of) evaluate function arguments from right to left. EDIT: CLang is an exception (see comment below).
I believe that the right-to-left evaluation order has been very very old (since the first C compilers). Certainly way before C++ was invented, and most implementations of C++ would be keeping the same evaluation order because early C++ implementations simply translated into C.
There are some technical reasons for evaluating function arguments right-to-left. In stack architectures, arguments are typically pushed onto the stack. In C/C++, you can call a function with more arguments than actually specified -- the extra arguments are simiply ignored. If arguments are evaluated left-to-right, and pushed left-to-right, then the stack slot right under the stack pointer will hold the last argument, and there is no way for the function to get at the offset of any particular argument (because the actual number of arguments pushed depends on the caller).
In a right-to-left push order, the stack slot right under the stack pointer will always hold the first argument, and the next slot holds the second argument etc. Argument offsets will always be deterministic for the function (which may be written and compiled elsewhere into a library, separately from where it is called).
Now, right-to-left push order does not mandate right-to-left evaluation order, but in early compilers, memory is scarce. In right-to-left evaluation order, the same stack can be used in-place (essentially, after evaluating the argument -- which may be an expression or a funciton call! -- the return value is already at the right position on the stack). In left-to-right evaluation, the argument values must be stored separately and the pushed back to the stack in reverse order.
Last time I saw differences was between VS2005 and GCC 3.x on an x86 hardware in 2007.
So it's (was?) a very likely situation. So I never rely on evaluation order anymore. Maybe it's better now.
I expect that most modern compilers would attempt to interleave the instructions computing the arguments, given that they are required by the C++ standard to be independent and thus lack any interdependencies. Doing this should help to keep a deeply-pipelined CPU's execution units full and thereby increase throughput. (At least I would expect that a compiler that claims to be an optimising compiler would do so when optimisation flags are given.)