I -believe- that what I'm trying to do is probably valid because it is separated in both instances by a comma (not a typical assignment), but I have no idea for sure and search isn't bringing up anything about these two specific situations.
In both cases, I was using the variable as an index for two parallel arrays.
int a[3] = {10, 20, 30};
int b[3] = {20, 40, 60};
Situation #1: Initializing a struct for the arrays
struct testStruct {
int t1;
int t2;
};
int i = 0;
testStruct test = {a[++i], b[i]}
Expected result of final line: test = {20, 40}
Situation #2: Passing specific values from the arrays as function args
void testFunc(int t1, int t2) {
// do stuff
}
int i = 0;
test(a[++i], b[i]);
Expected result of final line: test(20, 40)
Is this valid code? And if it is, is it valid in all compilers?
Is the result what I expect? If so, is it because of the arrays or because of the comma?
Thanks!
I would advice against using such "tricks" in your code in the long term this is maintenance nightmare and is hard to reason about. There are almost always alternatives, for example this code:
testStruct test = {a[++i], b[i]}
could be changed to:
++i ;
testStruct test = {a[i], b[i]}
So having said that, neither case uses the comma operator in both functions calls and intialization lists the comma is a grammar elements and nothing else.
Your first situation is well defined although there is some caveats depending on whether this is C++11 or pre C++11.
In both cases there is a sequence point after each comma, although pre C++11 the order of evaluation is not specified. So we can see this for the pre C++11 case by going to defect report 430 which says:
A recent GCC bug report (
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11633) asks about the
validity of
int count = 23; int foo[] = { count++, count++, count++ };
is this undefined or unspecified or something else?
and the answer is (emphasis mine going forward):
I believe the standard is clear that each initializer expression in
the above is a full-expression (1.9 [intro.execution]/12-13; see also
issue 392) and therefore there is a sequence point after each
expression (1.9 [intro.execution]/16). I agree that the standard does
not seem to dictate the order in which the expressions are evaluated,
and perhaps it should. Does anyone know of a compiler that would not
evaluate the expressions left to right?
In C++11 it is baked in the draft C++11 standard in section 8.4.5 paragraph which says:
Within the initializer-list of a braced-init-list, the
initializer-clauses, including any that result from pack expansions
(14.5.3), are evaluated in the order in which they appear. That is,
every value computation and side effect associated with a given
initializer-clause is sequenced before every value computation and
side effect associated with any initializer-clause that follows it in
the comma-separated list of the initializer-list.
I am sticking with C++11 going forward since it does not change the answer for the rest of the content, although the wording on sequencing does vary the conclusion is the same.
The second situation invokes undefined behavior since the order of evaluation of arguments to a function are unspecified and their evaluation is indeterminately sequenced with respect to one another. We can see this undefined behavior from section 1.9 paragraph 15 which says:
Except where noted, evaluations of operands of individual operators
and of subexpressions of individual expressions are unsequenced. [
Note: In an expression that is evaluated more than once during the
execution of a program, unsequenced and indeterminately sequenced
evaluations of its subexpressions need not be performed consistently
in different evaluations. —end note ] The value computations of the
operands of an operator are sequenced before the value computation of
the result of the operator. If a side effect on a scalar object is
unsequenced relative to either another side effect on the same scalar
object or a value computation using the value of the same scalar
object, the behavior is undefined.
and provides the following example:
f(i = -1, i = -1); // the behavior is undefined
The first code snippet
testStruct test = {a[++i], b[i]};
is valid (if to add a semicolon), because according to the C++ Standard
4 Within the initializer-list of a braced-init-list, the
initializer-clauses, including any that result from pack expansions
(14.5.3), are evaluated in the order in which they appear. That is,
every value computation and side effect associated with a given
initializer-clause is sequenced before every value computation and
side effect associated with any initializer-clause that follows it in
the comma-separated list of the initializer-list.
Take into account that the compiler GCC 4.8.1 has (or at least had) a bug that I described
here
Though the description is written in Russian you could it translate using for example service of goofle "translate"
However the second code snippet
test(a[++i], b[i]);
has undefined behaviour because the order of evaluation of function arguments is unspecified.
You could write instead
test( { a[++i], b[i] } );
In this case the expression would be well-formed.
Related
map<int, int> mp;
printf("%d ", mp.size());
mp[10]=mp.size();
printf("%d\n", mp[10]);
This code yields an answer that is not very intuitive:
0 1
I understand why it happens - the left side of the assignment returns reference to mp[10]'s underlying value and at the same time creates aforementioned value, and only then is the right side evaluated, using the newly computed size() of the map.
Is this behaviour stated anywhere in C++ standard? Or is the order of evaluation undefined?
Result was obtained using g++ 5.2.1.
Yes, this is covered by the standard and it is unspecified behavior. This particular case is covered in a recent C++ standards proposal: N4228: Refining Expression Evaluation Order for Idiomatic C++ which seeks to refine the order of evaluation rules to make it well specified for certain cases.
It describes this problem as follows:
Expression evaluation order is a recurring discussion topic in the C++
community. In a nutshell, given an expression such as f(a, b,
c), the order in which the sub-expressions f, a, b, c are evaluated is left unspecified by the standard. If any two of these sub-expressions happen to modify the same object without intervening sequence points, the behavior of the program is undefined. For instance, the expression f(i++, i) where i is an
integer variable leads to undefined behavior , as does v[i]
= i++. Even when the behavior is not undefined, the result of evaluating an expression can still be anybody’s guess. Consider
the following program fragment:
#include <map>
int main() {
std::map<int, int> m;
m[0] = m.size(); // #1
}
What should the map object m look like after evaluation of the
statement marked #1? { {0, 0 } } or {{0, 1 } } ?
We know that unless specified the evaluations of sub-expressions are unsequenced, this is from the draft C++11 standard section 1.9 Program execution which says:
Except where noted, evaluations of operands of individual operators
and of subexpressions of individual expressions are unsequenced.[...]
and all the section 5.17 Assignment and compound assignment operators [expr.ass] says is:
[...]In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.[...]
So this section does not nail down the order of evaluation but we know this is not undefined behavior since both operator [] and size() are function calls and section 1.9 tells us(emphasis mine):
[...]When calling a function (whether or not the function is inline), every value computation and side effect
associated with any argument expression, or with the postfix expression designating the called function, is
sequenced before execution of every expression or statement in the body of the called function. [ Note: Value
computations and side effects associated with different argument expressions are unsequenced. —end note ]
Every evaluation in the calling function (including other function calls) that is not otherwise specifically
sequenced before or after the execution of the body of the called function is indeterminately sequenced with
respect to the execution of the called function.9[...]
Note, I cover the second interesting example from the N4228 proposal in the question Does this code from “The C++ Programming Language” 4th edition section 36.3.6 have well-defined behavior?.
Update
It seems like a revised version of N4228 was accepted by the Evolution Working Group at the last WG21 meeting but the paper(P0145R0) is not yet available. So this could possibly no longer be unspecified in C++17.
Update 2
Revision 3 of p0145 made this specified and update [expr.ass]p1:
The assignment operator (=) and the compound assignment operators all group right-to-left.
All require a modifiable lvalue as their left operand; their result is an lvalue referring to the left operand.
The result in all cases is a bit-field if the left operand is a bit-field.
In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression. The right operand is sequenced before the left operand. ...
From the C++11 standard (emphasis mine):
5.17 Assignment and compound assignment operators
1 The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modifiable lvalue as their left operand and return an lvalue referring to the left operand. The result in all cases is a bit-field if the left operand is a bit-field. In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.
Whether the left operand is evaluated first or the right operand is evaluated first is not specified by the language. A compiler is free to choose to evaluate either operand first. Since the final result of your code depends on the order of evaluation of the operands, I would say it is unspecified behavior rather than undefined behavior.
1.3.25 unspecified behavior
behavior, for a well-formed program construct and correct data, that depends on the implementation
I'm sure that the standard does not specify for an expression x = y; which order x or y is evaluated in the C++ standard (this is the reason why you can't do *p++ = *p++ for example, because p++ is not done in a defined order).
In other words, to guarantee order x = y; in a defined order, you need to do break it up into two sequence points.
T tmp = y;
x = tmp;
(Of course, in this particular case, one might presume the compiler prefers to do operator[] before size() because it can then store the value directly into the result of operator[] instead of keeping it in a temporary place, to store it later after operator[] has been evaluated - but I'm pretty sure the compiler doesn't NEED to do it in that order)
Let's take a look at what your code breaks down to:
mp.operator[](10).operator=(mp.size());
which pretty much tells the story that in the first part an entry to 10 is created and in the second part the size of the container is assigned to the integer reference in position of 10.
But now you get into the order of evaluation problem which is unspecified. Here is a much simpler example .
When should map::size() get called, before or after map::operator(int const &); ?
Nobody really knows.
I am curious about initializer lists and sequence points. I read a while ago that the order of evaluation in initializer lists is left to right. If that is so, then there must be some kind of sequence point between the points of evaluation, am I wrong? So with that said is the following valid code? Is there anything that causes undefined behavior in it?
int i = 0;
struct S {
S(...) {}
operator int() { return i; }
};
int main() {
i = S{++i, ++i};
}
Any and all responses are appreciated.
Yes, the code is valid and does not have undefined behavior. Expressions in an initizalizer list are evaluated left-to-right and sequenced before the evaluation of the constructor of S. Therefore, your program should consistently assign value 2 to variable i.
Quoting § 8.5.4 of the C++ Standard:
"Within the initializer-list of a braced-init-list, the initializer-clauses, including any that result from pack expansions (14.5.3), are evaluated in the order in which they appear. That is, every value computation and side effect associated with a given initializer-clause is sequenced before every value computation and side effect associated with any initializer-clause that follows it in the comma-separated list of the initializer-list."
Thus, what happens is:
++i gets evaluated, yielding i = 1 (first argument of S's constructor);
++i gets evaluated, yielding i = 2 (second argument of S's constructor);
S's constructor is executed;
S's conversion operator is executed, returning value 2;
value 2 is assigned to i (which already had value 2).
Another relevant paragraph of the Standard is § 1.9/15, which also mentions similar examples that do have undefined behavior:
i = v[i++]; // the behavior is undefined
i = i++ + 1; // the behavior is undefined
However, the same paragraph says:
"Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced. [...] When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function."
Since 1) the evaluation of the expressions in the initializer list is sequenced left-to-right, 2) the execution of the constructor of S is sequenced after the evaluation of all expressions in the initializer list, and 3) the assignment to i is sequenced after the execution of the constructor of S (and its conversion operator), the behavior is well-defined.
Yes indeed you have case of undefined behavior.
Below are examples of situations causing undefined behavior:
A variable is changed several times within one sequence point. As a
canonical example, the i=i++ expression is often cited where
assignment of the i variable and its increment are performed at the
same time. To learn more about this kind of errors, read the section
"sequence points".
Using a variable before initializing it. Undefined behavior occurs
when trying to use the variable.
Memory allocation using the new [] operator and subsequent release
using the delete operator. For example: T *p = new T[10]; delete p;.
The correct code is: T *p = new T[10]; delete [] p;.
EDITED
Also your code S{++i, ++i}; is not compiled for VS2012. May be you mean S(++i, ++i);?
If you use "()" then undefined behavour is present. In other case your source code is incorrect.
map<int, int> mp;
printf("%d ", mp.size());
mp[10]=mp.size();
printf("%d\n", mp[10]);
This code yields an answer that is not very intuitive:
0 1
I understand why it happens - the left side of the assignment returns reference to mp[10]'s underlying value and at the same time creates aforementioned value, and only then is the right side evaluated, using the newly computed size() of the map.
Is this behaviour stated anywhere in C++ standard? Or is the order of evaluation undefined?
Result was obtained using g++ 5.2.1.
Yes, this is covered by the standard and it is unspecified behavior. This particular case is covered in a recent C++ standards proposal: N4228: Refining Expression Evaluation Order for Idiomatic C++ which seeks to refine the order of evaluation rules to make it well specified for certain cases.
It describes this problem as follows:
Expression evaluation order is a recurring discussion topic in the C++
community. In a nutshell, given an expression such as f(a, b,
c), the order in which the sub-expressions f, a, b, c are evaluated is left unspecified by the standard. If any two of these sub-expressions happen to modify the same object without intervening sequence points, the behavior of the program is undefined. For instance, the expression f(i++, i) where i is an
integer variable leads to undefined behavior , as does v[i]
= i++. Even when the behavior is not undefined, the result of evaluating an expression can still be anybody’s guess. Consider
the following program fragment:
#include <map>
int main() {
std::map<int, int> m;
m[0] = m.size(); // #1
}
What should the map object m look like after evaluation of the
statement marked #1? { {0, 0 } } or {{0, 1 } } ?
We know that unless specified the evaluations of sub-expressions are unsequenced, this is from the draft C++11 standard section 1.9 Program execution which says:
Except where noted, evaluations of operands of individual operators
and of subexpressions of individual expressions are unsequenced.[...]
and all the section 5.17 Assignment and compound assignment operators [expr.ass] says is:
[...]In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.[...]
So this section does not nail down the order of evaluation but we know this is not undefined behavior since both operator [] and size() are function calls and section 1.9 tells us(emphasis mine):
[...]When calling a function (whether or not the function is inline), every value computation and side effect
associated with any argument expression, or with the postfix expression designating the called function, is
sequenced before execution of every expression or statement in the body of the called function. [ Note: Value
computations and side effects associated with different argument expressions are unsequenced. —end note ]
Every evaluation in the calling function (including other function calls) that is not otherwise specifically
sequenced before or after the execution of the body of the called function is indeterminately sequenced with
respect to the execution of the called function.9[...]
Note, I cover the second interesting example from the N4228 proposal in the question Does this code from “The C++ Programming Language” 4th edition section 36.3.6 have well-defined behavior?.
Update
It seems like a revised version of N4228 was accepted by the Evolution Working Group at the last WG21 meeting but the paper(P0145R0) is not yet available. So this could possibly no longer be unspecified in C++17.
Update 2
Revision 3 of p0145 made this specified and update [expr.ass]p1:
The assignment operator (=) and the compound assignment operators all group right-to-left.
All require a modifiable lvalue as their left operand; their result is an lvalue referring to the left operand.
The result in all cases is a bit-field if the left operand is a bit-field.
In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression. The right operand is sequenced before the left operand. ...
From the C++11 standard (emphasis mine):
5.17 Assignment and compound assignment operators
1 The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modifiable lvalue as their left operand and return an lvalue referring to the left operand. The result in all cases is a bit-field if the left operand is a bit-field. In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.
Whether the left operand is evaluated first or the right operand is evaluated first is not specified by the language. A compiler is free to choose to evaluate either operand first. Since the final result of your code depends on the order of evaluation of the operands, I would say it is unspecified behavior rather than undefined behavior.
1.3.25 unspecified behavior
behavior, for a well-formed program construct and correct data, that depends on the implementation
I'm sure that the standard does not specify for an expression x = y; which order x or y is evaluated in the C++ standard (this is the reason why you can't do *p++ = *p++ for example, because p++ is not done in a defined order).
In other words, to guarantee order x = y; in a defined order, you need to do break it up into two sequence points.
T tmp = y;
x = tmp;
(Of course, in this particular case, one might presume the compiler prefers to do operator[] before size() because it can then store the value directly into the result of operator[] instead of keeping it in a temporary place, to store it later after operator[] has been evaluated - but I'm pretty sure the compiler doesn't NEED to do it in that order)
Let's take a look at what your code breaks down to:
mp.operator[](10).operator=(mp.size());
which pretty much tells the story that in the first part an entry to 10 is created and in the second part the size of the container is assigned to the integer reference in position of 10.
But now you get into the order of evaluation problem which is unspecified. Here is a much simpler example .
When should map::size() get called, before or after map::operator(int const &); ?
Nobody really knows.
I've looked at a bunch of questions regarding sequence points, and haven't been able to figure out if the order of evaluation for x*f(x) is guaranteed if f modifies x, and is this different for f(x)*x.
Consider this code:
#include <iostream>
int fx(int &x) {
x = x + 1;
return x;
}
int f1(int &x) {
return fx(x)*x; // Line A
}
int f2(int &x) {
return x*fx(x); // Line B
}
int main(void) {
int a = 6, b = 6;
std::cout << f1(a) << " " << f2(b) << std::endl;
}
This prints 49 42 on g++ 4.8.4 (Ubuntu 14.04).
I'm wondering whether this is guaranteed behavior or unspecified.
Specifically, in this program, fx gets called twice, with x=6 both times, and returns 7 both times. The difference is that Line A computes 7*7 (taking the value of x after fx returns) while Line B computes 6*7 (taking the value of x before fx returns).
Is this guaranteed behavior? If yes, what part of the standard specifies this?
Also: If I change all the functions to use int *x instead of int &x and make corresponding changes to places they're called from, I get C code which has the same issues. Is the answer any different for C?
In terms of evaluation sequence, it is easier to think of x*f(x) as if it was:
operator*(x, f(x));
so that there are no mathematical preconceptions on how multiplication is supposed to work.
As #dan04 helpfully pointed out, the standard says:
Section 1.9.15: “Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced.”
This means that the compiler is free to evaluate these arguments in any order, the sequence point being operator* call. The only guarantee is that before the operator* is called, both arguments have to be evaluated.
In your example, conceptually, you could be certain that at least one of the arguments will be 7, but you cannot be certain that both of them will. To me, this would be enough to label this behaviour as undefined; however, #user2079303 answer explains well why it is not technically the case.
Regardless of whether the behaviour is undefined or indeterminate, you cannot use such an expression in a well-behaved program.
The evaluation order of arguments is not specified by the standard, so the behaviour that you see is not guaranteed.
Since you mention sequence points, I'll consider the c++03 standard which uses that term while the later standards have changed wording and abandoned the term.
ISO/IEC 14882:2003(E) §5 /4:
Except where noted, the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified...
There is also discussion on whether this is undefined behaviour or is the order merely unspecified. The rest of that paragraph sheds some light (or doubt) on that.
ISO/IEC 14882:2003(E) §5 /4:
... Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored. The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full expression; otherwise the behavior is undefined.
x is indeed modified in f and it's value is read as an operand in the same expression where f is called. And it's not specified whether x reads the modified or non-modified value. That might scream Undefined Behaviour! to you, but hold your horses, because the standard also states:
ISO/IEC 14882:2003(E) §1.9 /17:
... When calling a function (whether or not the function is inline), there is a sequence point after the evaluation of all function arguments (if any) which takes place before execution of any expressions or statements in the function body. There is also a sequence point after the copying of a returned value and before the execution of any expressions outside the function 11) ...
So, if f(x) is evaluated first, then there is a sequence point after copying the returned value. So the above rule about UB does not apply because the read of x is not between the next and previous sequence point. The x operand will have the modified value.
If x is evaluated first, then there is a sequence point after evaluating the arguments of f(x) Again, the rule about UB does not apply. In this case x operand will have the non-modified value.
In summary, the order is unspecified but there is no undefined behaviour. It's a bug, but the outcome is predictable to some degree. The behaviour is the same in the later standards, even though the wording changed. I'll not delve into those since it's already covered well in other good answers.
Since you ask about similar situation in C
C89 (draft) 3.3/3:
Except as indicated by the syntax 27 or otherwise specified later (for the function-call operator () , && , || , ?: , and comma operators), the order of evaluation of subexpressions and the order in which side effects take place are both unspecified.
The function call exception is already mentioned here. Following is the paragraph that implies the undefined behaviour if there were no sequence points:
C89 (draft) 3.3/2:
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored.26
And here are the sequence points defined:
C89 (draft) A.2
The following are the sequence points described in 2.1.2.3
The call to a function, after the arguments have been evaluated (3.3.2.2).
...
... the expression in a return statement (3.6.6.4).
The conclusions are the same as in C++.
A quick note on something I don't see covered explicitly by the other answers:
if the order of evaluation for x*f(x) is guaranteed if f modifies x, and is this different for f(x)*x.
Consider, as in Maksim's answer
operator*(x, f(x));
now there are only two ways of evaluating both arguments before the call as required:
auto lhs = x; // or auto rhs = f(x);
auto rhs = f(x); // or auto lhs = x;
return lhs * rhs
So, when you ask
I'm wondering whether this is guaranteed behavior or unspecified.
the standard doesn't specify which of those two behaviours the compiler must choose, but it does specify those are the only valid behaviours.
So, it's neither guaranteed nor entirely unspecified.
Oh, and:
I've looked at a bunch of questions regarding sequence points, and haven't been able to figure out if the order of evaluation ...
sequence points are a used in the C language standard's treatment of this, but not in the C++ standard.
In the expression x * y, the terms x and y are unsequenced. This is one of the three possible sequencing relations, which are:
A sequenced-before B: A must be evaluated, with all side-effects complete, before B begins evaluationg
A and B indeterminately-sequenced: one of the two following cases is true: A is sequenced-before B, or B is sequenced-before A. It is unspecified which of those two cases holds.
A and B unsequenced: There is no sequencing relation defined between A and B.
It is important to note that these are pair-wise relations. We cannot say "x is unsequenced". We can only say that two operations are unsequenced with respect to each other.
Also important is that these relations are transitive; and the latter two relations are symmetric.
unspecified is a technical term which means that the Standard specifies a set number of possible results. This is different to undefined behaviour which means that the Standard does not cover the behaviour at all. See here for further reading.
Moving onto the code x * f(x). This is identical to f(x) * x, because as discussed above, x and f(x) are unsequenced, with respect to each other, in both cases.
Now we come to the point where several people seem to be coming unstuck. Evaluating the expression f(x) is unsequenced with respect to x. However, it does not follow that any statements inside the function body of f are also unsequenced with respect to x. In fact, there are sequencing relations surrounding any function call, and those relations cannot be ignored.
Here is the text from C++14:
When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. [Note: Value computations and side effects associated with different argument expressions are unsequenced. —end note ]
Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with
respect to the execution of the called function.
with footnote:
In other words, function executions do not interleave with each other.
The bolded text clearly states that for the two expressions:
A: x = x + 1; inside f(x)
B: evaluating the first x in the expression x * f(x)
their relationship is: indeterminately sequenced.
The text regarding undefined behaviour and sequencing is:
If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, and they are not potentially concurrent (1.10), the behavior is undefined.
In this case, the relation is indeterminately sequenced, not unsequenced. So there is no undefined behaviour.
The result is instead unspecified according to whether x is sequenced before x = x + 1 or the other way around. So there are only two possible outcomes, 42 and 49.
In case anyone had qualms about the x in f(x), the following text applies:
When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function.
So the evaluation of that x is sequenced before x = x + 1. This is an example of an evlauation that falls under the case of "specifically sequenced before" in the bolded quote above.
Footnote: the behaviour was exactly the same in C++03, but the terminology was different. In C++03 we say that there is a sequence point upon entry and exit of every function call, therefore the write to x inside the function is separated from the read of x outside the function by at least one sequence point.
You need to distinguish:
a) Operator precedence and associativity, which controls the order in which the values of subexpressions are combined by their operators.
b) The sequence of subexpression evaluation. E.g. in the expression f(x)/g(x), the compiler can evaluate g(x) first and f(x) afterwards. Nonetheless, the resulting value must be computed by dividing respective sub-values in the right order, of course.
c) The sequence of side-effects of the subexpressions. Roughly speaking, for example, the compiler might, for sake of optimization, decide to write values to the affected variables only at the end of the expression or any other suitable place.
As a very rough approximation, you can say, that within a single expression, the order of evaluation (not associativity etc.) is more or less unspecified. If you need a specific order of evaluation, break down the expression into series of statements like this:
int a = f(x);
int b = g(x);
return a/b;
instead of
return f(x)/g(x);
For exact rules, see http://en.cppreference.com/w/cpp/language/eval_order
Order of evaluation of the operands of almost all C++ operators is
unspecified. The compiler can evaluate operands in any order, and may
choose another order when the same expression is evaluated again
As the order of evaluation is not always the same hence you may get unexpected results.
Order of evaluation
Today I came across some code that exhibits different behavior on
clang++ (3.7-git), g++ (4.9.2) and Visual Studio 2013. After some reduction
I came up with this snippet which highlights the issue:
#include <iostream>
using namespace std;
int len_ = -1;
char *buffer(int size_)
{
cout << "len_: " << len_ << endl;
return new char[size_];
}
int main(int argc, char *argv[])
{
int len = 10;
buffer(len+1)[len_ = len] = '\0';
cout << "len_: " << len_ << endl;
}
g++ (4.9.2) gives this output:
len_: -1
len_: 10
So g++ evaluates the argument to buffer, then buffer(..) itself and after that it evaluates the index argument to the array operator. Intuitively this makes sense to me.
clang (3.7-git) and Visual Studio 2013 both give:
len_: 10
len_: 10
I suppose clang and VS2013 evaluates everything possible before it decends into buffer(..). This makes less intuitive sense to me.
I guess the gist of my question is whether or not this is a clear case of undefined behavior.
Edit: Thanks for clearing this up, and unspecified behavior is the term I should have used.
This is unspecified behavior, len_ = len is indeterminately sequenced with respect to the execution of the body of buffer(), which means that one will be executed before the other but it is not specified which order but there is an ordering so evaluations can not overlap therefore no undefined behavior. This means gcc, clang and Visual Studio are all correct. On other hand unsequenced evaluations allow for overlapping evaluations which can lead to undefined behavior as noted below.
From the draft C++11 standard section 1.9 [intro.execution]:
[...]Every evaluation in the calling function (including other function calls) that is not otherwise specifically
sequenced before or after the execution of the body of the called function is indeterminately sequenced with
respect to the execution of the called function.9[...]
and indeterminately sequenced is covered a little before this and says:
[...]Evaluations A and B are indeterminately sequenced when either A
is sequenced before B or B is sequenced before A, but it is unspecified which. [ Note: Indeterminately
sequenced evaluations cannot overlap, but either could be executed first. —end note ]
which is different than unsequenced evaluations:
[...]If A is not sequenced before
B and B is not sequenced before A, then A and B are unsequenced. [ Note: The execution of unsequenced
evaluations can overlap. —end note ][...]
which can lead to undefined behavior (emphasis mine):
Except where noted, evaluations of operands of individual operators and of subexpressions of individual
expressions are unsequenced. [ Note: In an expression that is evaluated more than once during the execution
of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be
performed consistently in different evaluations. —end note ] The value computations of the operands of an
operator are sequenced before the value computation of the result of the operator. If a side effect on a scalar
object is unsequenced relative to either another side effect on the same scalar object or a value computation
using the value of the same scalar object, the behavior is undefined[...]
Pre C++11
Pre C++11 the order of evaluation of sub-expressions is also unspecified but it uses sequence points as opposed to ordering. In this case, there is a sequence point at function entry and function exit which ensures there is no undefined behavior. From section 1.9:
[...]The sequence points at function-entry and function-exit
(as described above) are features of the function calls as evaluated, whatever the syntax of the expression that calls the
function might be.
Nailing down order of evaluation
The different choices made by each compiler may seem unintuitive depending on your perspective and expectations. The subject of nailing down order of evaluation is the subject of EWG issue 158: N4228 Refining Expression Evaluation Order for Idiomatic C++, which is being considered for C++17 but seems controversial based on the reactions to a poll on the subject. The paper covers a much more complicated case from "The C++ Programming Language" 4th edition. Which shows even those with deep experience in C++ can get tripped up.
Well, no, it's not a case of undefined behaviour. It is a case of unspecified behaviour.
It is unspecified whether the expression len_ = len will be evaluated before or after buffer(len+1). From the output you have described, g++ evaluates buffer(len+1) first, and clang evaluates len_ = len first.
Both possibilities are correct, since the order of evaluation of those two sub-expressions is unspecified . Both expressions will be evaluated (so the behaviour does not qualify as being undefined) but the standard does not specify the order.