I've just stuck myself with the following question: should this cause undefined behaviour or not and why?
std::map<int, int> m;
m[10] += 1;
It compiles and runs perfectly but it doesn't prove anything.
It resembles a common UB example i = ++i + i++; since operator[] does have side effects but on the other hand assuming any order of evaluation (left to right and right to left) brings me to the same final state of the map
P.S. possibly related: http://en.cppreference.com/w/cpp/language/eval_order
edit
Sorry guys I should have written
m[10] = m[10] + 1;
There is nothing undefined about this. The operator[] returns an lvalue reference to the map entry (which it creates if necessary). You are then merely incrementing this lvalue expression, i.e. the underlying entry.
The rules for evaluation order state that for a modifying assign operation, the side effect is sequenced strictly after the evaluation of both the left (i.e. lvalue reference to the map entry) and right (i.e. the constant 1) operands. There is no ambiguity at all in this example.
UPDATE: In your updated example nothing changes. Again the side effect of modifying m[10] is sequenced strictly after the other operations (i.e. evaluating as an lvalue on the right, evaluating it on the right, and performing the addition).
The relevant sequencing rule, from cppreference:
8) The side effect (modification of the left argument) of the built-in
assignment operator and of all built-in compound assignment operators
is sequenced after the value computation (but not the side effects) of
both left and right arguments, and is sequenced before the value
computation of the assignment expression (that is, before returning
the reference to the modified object)
I am not quite sure what your worry is (and maybe you should clarify your question if that answer isn't sufficient), but m[10] += 1; doesn't get translated to m[10] = m[10] + 1; because m is user defined class type and overloaded operators don't get translated by the compiler, ever. For a and b objects with a user defined class type:
a+=b doesn't mean a = a + b (unless you make it so)
a!=b doesn't mean !(a==b) (unless you make it so)
Also, function calls are never duplicated.
So m[10] += 1; means call overloaded operator[] once; return type is a reference, so the expression is a lvalue; then apply the builtin operator += to the lvalue.
There is no order of evaluation issue. There isn't even multiple possible orders of evaluation!
Also, you need to remember that the std::map<>::operator[] doesn't behave like std::vector<>::operator[] (or std::deque's), because the map is a completely different abstraction: vector and deque are implementations of the Sequence concept (where position matters), but map is an associative container (where "key" matters, not position):
std::vector<>::operator[] takes a numerical index, and doesn't make sense if such index doesn't refer to an element of the vector.
std::map<>::operator[] takes a key (which can be any type satisfying basic constraints) and will create a (key,value) pair if none exists.
Note that for this reason, std::map<>::operator[] is inherently a modifying operation and thus non const, while std::vector<>::operator[] isn't inherently modifying but can allow modification via the returned reference, and is thus "transitively" const: v[i] will be a modifiable lvalue if v is a non-const vector and a const lvalue if v is a const vector.
So no worry, the code has perfectly well defined behavior.
Related
I know this question is asked often in its version of "i = i++ +1" in which i appears twice, but my question differs in that is is specifically ONLY about the right hand side of this expression, the definedness of which is not obvious to me. I am only referring to:
i++ + 1;
cppreference.com states here that:
2) The value computations (but not the side-effects) of the operands to any operator are sequenced before the value computation of the result of the operator (but not its side-effects).
I understand this to mean that the value computation is sequenced but no statement is made about the side-effect.
[...]
4) The value computation of the built-in post-increment and post-decrement operators is sequenced before its side-effect.
It does not, however, specify that the side-effect of (in this case) the left operand is sequenced in relation to the value computation of the expression.
It further states:
If a side effect on a scalar object is unsequenced relative to a value computation using the value of the same scalar object, the behavior is undefined.
Is this not the case here? The post-inc-operator's side effect on i is unsequenced relative to the value computation of the addition operator, which uses the same i.
Why is this expression not usually said to be undefined?
Is it because the addition operator is thought to incur a function call for which stricter sequencing guarantees are given?
What ensures that the postfix's side-effect occurs after the computation of +?
There is no such assurance. The postfix's side effect may occur either before or after the value computation of + .
The post-inc-operator's side effect on i is unsequenced relative to the value computation of the addition operator, which uses the same i.
No, the value computation of the addition operator uses the result of value computation of its operands. The operands of + are i++ (not i), and 1. As you covered in the question, the read of i is sequenced-before the value computation of i++, and therefore (transitivity) sequenced before the value computation of +.
The following things are guaranteed to happen in the following order:
Read of i.
Value computation of ++ (operand: result-of-step-1)
Value computation of + (operands result-of-step-2 and 1)
And the side-effect of i++ must occur after step 1 but it could be anywhere upto that constraint.
i++ + 1 is not undefined on account of the use of the postfix operator because it perpetrates only one side effect on one object, and that object's value is only referenced in that place. The i++ expression unambiguously produces the prior value of i, and that value is what is added to 1, no matter when i is actually updated.
(We don't know that i++ + 1 is well-defined, because things can go wrong for various other reasons: i being uninitialized or otherwise indeterminate or invalid, or numeric overflow or pointer overrun being perpetrated.)
Undefined behavior occurs if in the same evaluation phase we try to modify the same object twice: i++ + i++. This can be convoluted with pointers, because (*p)++ + (*q)++ increment the same object only if p and q point to the same location; otherwise it is fine.
Undefined behavior also occurs if in the same evaluation phase, we try to observe the value of an object that is modified elsewhere in the expression, like i++ + i. The right hand side of the + accesses i, but that is not sequenced with regard to the side effect of i++ on the left; the + operator doesn't impose a sequence point. In i++ + 1, the 1 doesn't try to access i, needless to say.
Here's what happens when i++ + 1 is evaluated:
The subexpression i++ is evaluated. It yields the previous value of i.
Evaluating i++ also has the side effect of incrementing the stored value of i -- but note that that incremented value is not used.
The subexpression 1 is evaluated, yielding the obvious value.
The + operator is evaluated, yielding the result of i++ plus the result of 1. This can happen only after the values of the left and right subexpressions are determined (but it can consistently happen before or after the side effect occurs).
The side effect of the ++ operator is only guaranteed to happen some time before the next sequence point. (That's in C99 terms. The C11 standard presents the same rules in a different way.) But since nothing else in the expression depends on that side effect, it doesn't matter when it occurs. There is no conflict, so there's no undefined behavior.
In i++ + i, the evaluation of i on the RHS will yield different results depending on whether the side effect has happened yet or not. And since the ordering is undefined, the standard throws up its hands and says the behavior is undefined. But in i++ + i, that problem doesn't occur.
"What ensures that the postfix's side-effect occurs after the computation of +?"
Nothing makes that specific guarantee. You must act as if you're using the original value of i, and at some point it needs to perform the side-effect, but as long as everything behaves properly, it doesn't matter how the compiler implements this or in what order. It can (and for certain scenarios, would) implement it as roughly equivalent to either:
auto tmp = i;
i = tmp + 1; // Could be done here, or after the next expression, doesn't matter since i isn't read again
tmp + 1; // produces actual value of i++ + 1
or
auto tmp = i + 1;
i = tmp; // Could be done here, or after the next expression, doesn't matter since tmp isn't changed again
(tmp - 1) + 1; // produces actual value of i++ + 1
or (for primitives or inlined operator overloads where it has enough information) optimize the expression to just:
++i; // Usually the same as i++ + 1 if compiler has enough knowledge
because postfix increment followed by adding one could be treated as prefix increment without adding one after.
Point is, it's up to the compiler to ensure the side-effect occurs sometime, which might be before or after the computation of +; the compiler just needs to make sure it has stored, or can recover, the original value of i.
The various contortions here might seem pointless (clearly ++i is the best if you can swing it, and i + 1; followed by ++i is simplest otherwise), but they're often necessary to work with the hardware atomics on a given architecture; if the architecture offers a fetch_then_add instruction, you'd want to implement it as:
auto tmp = fetch_then_add(i, 1); // Returns original value of i, while atomically adding 1
tmp + 1;
but if it only offers an add_then_fetch instruction, you'd want:
auto tmp = add_then_fetch(i, 1); // Returns incremented value of i
(tmp - 1) + 1;
As with many things, the C++ standard doesn't impose a preferred order because real hardware doesn't always cooperate; if it gets the job done and behaves as documented, it doesn't really matter what order it used.
Consider the following code snippet
int a,i;
a = 5;
(i++) = a;
(++i) = a;
cout<<i<<endl;
Line (++i) = a is compiling properly and giving 5 as output.
But (i++) = a is giving compilation error error: lvalue required as left operand of assignment.
I am not able to find the reason for such indifferent behavior. I would be grateful if someone explains this.
The expression i++ evaluates to the value of i prior to the increment operation. That value is a temporary (which is an rvalue) and you cannot assign to it.
++i works because that expression evaluates to i after it has been incremented, and i can be assigned to (it's an lvalue).
More on lvalues and rvalues on Wikipedia.
According to the C++ standard, prefix ++ is an lvalue (which
is different than C), post-fix no. More generally, C++ takes
the point of view that anything which changes an lvalue
parameter, and has as its value the value of that parameter,
results in an lvalue. So ++ i is an lvalue (since the
resulting value is the new value of i), but i ++ is not
(since the resulting value is not the new value, but the old).
All of this, of course, for the built-in ++ operators. If you
overload, it depends on the signatures of your overloads (but
a correctly designed overloaded ++ will behave like the
built-in ones).
Of course, neither (++ i) = a; nor (i ++) = a; in your
example are legal; both use the value of an uninitialized
variable (i), which is undefined behavior, and both modify i
twice without an intervening sequence point.
From the C++ (C++11) standard, §1.9.15 which discusses ordering of evaluation, is the following code example:
void g(int i, int* v) {
i = v[i++]; // the behavior is undefined
}
As noted in the code sample, the behavior is undefined.
(Note: The answer to another question with the slightly different construct i + i++, Why is a = i + i++ undefined and not unspecified behaviour, might apply here: The answer is essentially that the behavior is undefined for historical reasons, and not out of necessity. However, the standard seems to imply some justification for this being undefined - see quote immediately below. Also, that linked question indicates agreement that the behavior should be unspecified, whereas in this question I am asking why the behavior is not well-specified.)
The reasoning given by the standard for the undefined behavior is as follows:
If a side effect on a scalar object is unsequenced relative to either
another side effect on the same scalar object or a value computation
using the value of the same scalar object, the behavior is undefined.
In this example I would think that the subexpression i++ would be completely evaluated before the subexpression v[...] is evaluated, and that the result of evaluation of the subexpression is i (before the increment), but that the value of i is the incremented value after that subexpression has been completely evaluated. I would think that at that point (after the subexpression i++ has been completely evaluated), the evaluation v[...] takes place, followed by the assignment i = ....
Therefore, although the incrementing of i is pointless, I would nonetheless think that this should be defined.
Why is this undefined behavior?
I would think that the subexpression i++ would be completely evaluated before the subexpression v[...] is evaluated
But why would you think that?
One historical reason for this code being UB is to allow compiler optimizations to move side-effects around anywhere between sequence points. The fewer sequence points, the more potential opportunities to optimize but the more confused programmers. If the code says:
a = v[i++];
The intention of the standard is that the code emitted can be:
a = v[i];
++i;
which might be two instructions where:
tmp = i;
++i;
a = v[tmp];
would be more than two.
The "optimized code" breaks when a is i, but the standard permits the optimization anyway, by saying that behavior of the original code is undefined when a is i.
The standard easily could say that i++ must be evaluated before the assignment as you suggest. Then the behavior would be fully defined and the optimization would be forbidden. But that's not how C and C++ do business.
Also beware that many examples raised in these discussions make it easier to tell that there's UB around than it is in general. This leads to people saying that it's "obvious" the behavior should be defined and the optimization forbidden. But consider:
void g(int *i, int* v, int *dst) {
*dst = v[(*i)++];
}
The behavior of this function is defined when i != dst, and in that case you'd want all the optimization you can get (which is why C99 introduces restrict, to allow more optimizations than C89 or C++ do). In order to give you the optimization, behavior is undefined when i == dst. The C and C++ standards tread a fine line when it comes to aliasing, between undefined behavior that's not expected by the programmer, and forbidding desirable optimizations that fail in certain cases. The number of questions about it on SO suggests that the questioners would prefer a bit less optimization and a bit more defined behavior, but it's still not simple to draw the line.
Aside from whether the behavior is fully defined is the issue of whether it should be UB, or merely unspecified order of execution of certain well-defined operations corresponding to the sub-expressions. The reason C goes for UB is all to do with the idea of sequence points, and the fact that the compiler need not actually have a notion of the value of a modified object, until the next sequence point. So rather than constrain the optimizer by saying that "the" value changes at some unspecified point, the standard just says (to paraphrase): (1) any code that relies on the value of a modified object prior to the next sequence point, has UB; (2) any code that modifies a modified object has UB. Where a "modified object" is any object that would have been modified since the last sequence point in one or more of the legal orders of evaluation of the subexpressions.
Other languages (e.g. Java) go the whole way and completely define the order of expression side-effects, so there's definitely a case against C's approach. C++ just doesn't accept that case.
I'm going to design a pathological computer1. It is a multi-core, high-latency, single-thread system with in-thread joins that operates with byte-level instructions. So you make a request for something to happen, then the computer runs (in its own "thread" or "task") a byte-level set of instructions, and a certain number of cycles later the operation is complete.
Meanwhile, the main thread of execution continues:
void foo(int v[], int i){
i = v[i++];
}
becomes in pseudo-code:
input variable i // = 0x00000000
input variable v // = &[0xBAADF00D, 0xABABABABAB, 0x10101010]
task get_i_value: GET_VAR_VALUE<int>(i)
reg indx = WAIT(get_i_value)
task write_i++_back: WRITE(i, INC(indx))
task get_v_value: GET_VAR_VALUE<int*>(v)
reg arr = WAIT(get_v_value)
task get_v[i]_value = CALC(arr + sizeof(int)*indx)
reg pval = WAIT(get_v[i]_value)
task read_v[i]_value = LOAD_VALUE<int>(pval)
reg got_value = WAIT(read_v[i]_value)
task write_i_value_again = WRITE(i, got_value)
(discard, discard) = WAIT(write_i++_back, write_i_value_again)
So you'll notice that I didn't wait on write_i++_back until the very end, the same time as I was waiting on write_i_value_again (which value I loaded from v[]). And, in fact, those writes are the only writes back to memory.
Imagine if write to memory are the really slow part of this computer design, and they get batched up into a queue of things that get processed by a parallel memory modifying unit that does things on a per-byte basis.
So the write(i, 0x00000001) and write(i, 0xBAADF00D) execute unordered and in parallel. Each gets turned into byte-level writes, and they are randomly ordered.
We end up writing 0x00 then 0xBA to the high byte, then 0xAD and 0x00 to the next byte, then 0xF0 0x00 to the next byte, and finally 0x0D 0x01 to the low byte. The resulting value in i is 0xBA000001, which few would expect, yet would be a valid result to your undefined operation.
Now, all I did there was result in an unspecified value. We haven't crashed the system. But the compiler would be free to make it completely undefined -- maybe sending two such requests to the memory controller for the same address in the same batch of instructions actually crashes the system. That would still be a "valid" way to compile C++, and a "valid" execution environment.
Remember, this is a language where restricting the size of pointers to 8 bits is still a valid execution environment. C++ allows for compiling to rather wonkey targets.
1: As noted in #SteveJessop's comment below, the joke is that this pathological computer behaves a lot like a modern desktop computer, until you get down to the byte-level operations. Non-atomic int writing by a CPU isn't all that rare on some hardware (such as when the int isn't aligned the way the CPU wants it to be aligned).
The reason is not just historical. Example:
int f(int& i0, int& i1) {
return i0 + i1++;
}
Now, what happens with this call:
int i = 3;
int j = f(i, i);
It's certainly possible to put requirements on the code in f so that the result of this call is well defined (Java does this), but C and C++ don't impose constraints; this gives more freedom to optimizers.
You specifically refer to the C++11 standard so I'm going to answer with the C++11 answer. It is, however, very similar to the C++03 answer, but the definition of sequencing is different.
C++11 defines a sequenced before relation between evaluations on a single thread. It is asymmetric, transitive and pair-wise. If some evaluation A is not sequenced before some evaluation B and B is also not sequenced before A, then the two evaluations are unsequenced.
Evaluating an expression includes both value computations (working out the value of some expression) and side effects. One instance of a side effect is the modification of an object, which is the most important one for answering question. Other things also count as side effects. If a side effect is unsequenced relative to another side effect or value computation on the same object, then your program has undefined behaviour.
So that's the set up. The first important rule is:
Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.
So any full expression is fully evaluated before the next full expression. In your question, we're only dealing with one full expression, namely i = v[i++], so we don't need to worry about this. The next important rule is:
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced.
That means that in a + b, for example, the evaluation of a and b are unsequenced (they may be evaluated in any order). Now for our final important rule:
The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.
So for a + b, the sequenced before relationships can be represented by a tree where a directed arrow represents the sequenced before relationship:
a + b (value computation)
^ ^
| |
a b (value computation)
If two evaluations occur in separate branches of the tree, they are unsequenced, so this tree shows that the evaluations of a and b are unsequenced relative to each other.
Now, let's do the same thing to your i = v[i++] example. We make use of the fact that v[i++] is defined to be equivalent to *(v + (i++)). We also use some extra knowledge about the sequencing of postfix increment:
The value computation of the ++ expression is sequenced before the modification of the operand object.
So here we go (a node of the tree is a value computation unless specified as a side effect):
i = v[i++]
^ ^
| |
i★ v[i++] = *(v + (i++))
^
|
v + (i++)
^ ^
| |
v ++ (side effect on i)★
^
|
i
Here you can see that the side effect on i, i++, is in a separate branch to the usage of i in front of the assignment operator (I marked each of these evaluations with a ★). So we definitely have undefined behaviour! I highly recommend drawing these diagrams if you ever wonder if your sequencing of evaluations is going to cause you trouble.
So now we get the question about the fact that the value of i before the assignment operator doesn't matter, because we write over it anyway. But actually, in the general case, that's not true. We can override the assignment operator and make use of the value of the object before the assignment. The standard doesn't care that we don't use that value - the rules are defined such that having any value computation unsequenced with a side effect will be undefined behaviour. No buts. This undefined behaviour is there to allow the compiler to emit more optimized code. If we add sequencing for the assignment operator, this optimization cannot be employed.
In this example I would think that the subexpression i++ would be completely evaluated before the subexpression v[...] is evaluated, and that the result of evaluation of the subexpression is i (before the increment), but that the value of i is the incremented value after that subexpression has been completely evaluated.
The increment in i++ must be evaluated before indexing v and thus before assigning to i, but storing the value of that increment back to memory need not happen before. In the statement i = v[i++] there are two suboperations that modify i (i.e. will end up causing a store from a register into the variable i). The expression i++ is equivalent to x=i+1, i=x, and there is no requirement that both operations need to take place sequentially:
x = i+1;
y = v[i];
i = y;
i = x;
With that expansion, the result of i is unrelated to the value in v[i]. On a different expansion, the i = x assignment could take place before the i = y assignment, and the result would be i = v[i]
There two rules.
The first rule is about multiple writes which give rise to a "write-write hazard": the same object cannot be modified more than once between two sequence points.
The second rule is about "read-write hazards". It is this: if an object is modified in an expression, and also accessed, then all accesses to its value must be for the purpose of computing the new value.
Expressions like i++ + i++ and your expression i = v[i++] violate the first rule. They modify an object twice.
An expression like i + i++ violates the second rule. The subexpression i on the left observes the value of a modified object, without being involved in the calculation of its new value.
So, i = v[i++] violates a different rule (bad write-write) from i + i++ (bad read-write).
The rules are too simplistic, which gives rise to classes of puzzling expressions. Consider this:
p = p->next = q
This appears to have a sane data flow dependency that is free of hazards: the assignment p = cannot take place until the new value is known. The new value is the result of p->next = q. The the value q should not "race ahead" and get inside p, such that p->next is affected.
Yet, this expression breaks the second rule: p is modified, and also used for a purpose not related to computing its new value, namely determining the storage location where the value of q is placed!
So, perversely, compilers are allowed to partially evaluate p->next = q to determine that the result is q, and store that into p, and then go back and complete the p->next = assignment. Or so it would seem.
A key issue here is, what is the value of an assignment expression? The C standard says that the value of an assignment expression is that of the lvalue, after the assignment. But that is ambiguous: it could be interpreted as meaning "the value which the lvalue will have, once the assignment takes place" or as "the value which can be observed in the lvalue after the assignment has taken place". In C++ this is made clear by the wording "[i]n all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.", so p = p->next = q appears to be valid C++, but dubious C.
I would share your arguments if the example were v[++i], but since i++ modifies i as a side-effect, it is undefined as to when the value is modified. The standard could probably mandate a result one way or the other, but there's no true way of knowing what the value of i should be: (i + 1) or (v[i + 1]).
Think about the sequences of machine operations necessary for each of the following assignment statements, assuming the given declarations are in effect:
extern int *foo(void);
extern int *p;
*p = *foo();
*foo() = *p;
If the evaluation of the subscript on the left side and the value on the right side are unsequenced, the most efficient ways to process the two function calls would likely be something like:
[For *p = *foo()]
call foo (which yields result in r0 and trashes r1)
load r0 from address held in r0
load r1 from address held in p
store r0 to address held in r1
[For *foo() = *p]
call foo (which yields result in r0 and trashes r1)
load r1 from address held in p
load r1 from address held in r1
store r1 to address held in r0
In either case, if p or *p were read into a register before the call to foo, then unless "foo" promises not to disturb that register, the compiler would need to add an extra step to save its value before calling "foo", and another extra step to restore the value afterward. That extra step might be avoided by using a register that "foo" won't disturb, but that would only help if there were a such a register which didn't hold a value needed by the surrounding code.
Letting the compiler read the value of "p" before or after the function call, at its leisure, will allow both patterns above to be handled efficiently. Requiring that the address of the left-hand operand of "=" always be evaluated before the right hand side would likely make the first assignment above less efficient than it otherwise could be, and requiring that the address of the left-hand operand be evaluated after the right-hand side would make the second assignment less efficient.
What does the following code print to the console?
map<int,int> m;
m[0] = m.size();
printf("%d", m[0]);
Possible answers:
The behavior of the code is not defined since it is not defined which statement m[0] or m.size() is being executed first by the compiler. So it could print 1 as well as 0.
It prints 0 because the right hand side of the assignment operator is executed first.
It prints 1 because the operator[] has the highest priority of the complete statement m[0] = m.size(). Because of this the following sequence of events occurs:
m[0] creates a new element in the map
m.size() gets called which is now 1
m[0] gets assigned the previously returned (by m.size()) 1
The real answer?, which is unknown to me^^
I believe it's unspecified whether 0 or 1 is stored in m[0], but it's not undefined behavior.
The LHS and the RHS can occur in either order, but they're both function calls, so they both have a sequence point at the start and end. There's no danger of the two of them, collectively, accessing the same object without an intervening sequence point.
The assignment is actual int assignment, not a function call with associated sequence points, since operator[] returns T&. That's briefly worrying, but it's not modifying an object that is accessed anywhere else in this statement, so that's safe too. It's accessed within operator[], of course, where it is initialized, but that occurs before the sequence point on return from operator[], so that's OK. If it wasn't, m[0] = 0; would be undefined too!
However, the order of evaluation of the operands of operator= is not specified by the standard, so the actual result of the call to size() might be 0 or 1 depending which order occurs.
The following would be undefined behavior, though. It doesn't make function calls and so there's nothing to prevent size being accessed (on the RHS) and modified (on the LHS) without an intervening sequence point:
int values[1];
int size = 0;
(++size, values[0] = 0) = size;
/* fake m[0] */ /* fake m.size() */
It does print 1, and without raising a warning(!) using gcc. It should raise a warning because it is undefined.
The precedence class of both operator[] and operator. is 2 whereas the precedence class of operator= is 16.
This means that it is well-defined that m[0] and m.size() will be executed before the assignment. However, it is not defined which one executes first.
There is no sequence point between the call to operator [] and the call to clear in this statement. Consequently, the behaviour should be undefined.
Given that C++17 is pretty much here, I think it's worth mentioning that this code now exhibits well defined behavior under the new standard. For this case of = being the built-in assignment to an integer:
[expr.ass]/1:
The assignment operator (=) and the compound assignment operators all
group right-to-left. All require a modifiable lvalue as their left
operand and return an lvalue referring to the left operand. The result
in all cases is a bit-field if the left operand is a bit-field. In all
cases, the assignment is sequenced after the value computation of the
right and left operands, and before the value computation of the
assignment expression. The right operand is sequenced before the left
operand. With respect to an indeterminately-sequenced function call,
the operation of a compound assignment is a single evaluation.
Which leaves us with only one option, and that is #2.
Sorry for opening this topic again, but thinking about this topic itself has started giving me an Undefined Behavior. Want to move into the zone of well-defined behavior.
Given
int i = 0;
int v[10];
i = ++i; //Expr1
i = i++; //Expr2
++ ++i; //Expr3
i = v[i++]; //Expr4
I think of the above expressions (in that order) as
operator=(i, operator++(i)) ; //Expr1 equivalent
operator=(i, operator++(i, 0)) ; //Expr2 equivalent
operator++(operator++(i)) ; //Expr3 equivalent
operator=(i, operator[](operator++(i, 0)); //Expr4 equivalent
Now coming to behaviors here are the important quotes from C++ 0x.
$1.9/12- "Evaluation of an expression
(or a sub-expression) in general
includes both value computations
(including determining the identity of
an object for lvalue evaluation and
fetchinga value previously assigned to
an object for rvalue evaluation) and
initiation of side effects."
$1.9/15- "If a side effect on a scalar
object is unsequenced relative to
either another side effect on the same
scalar object or a value
computation using the value of the
same scalar object, the behavior is
undefined."
[ Note: Value computations and side
effects associated with different
argument expressions are unsequenced.
—end note ]
$3.9/9- "Arithmetic types (3.9.1),
enumeration types, pointer types,
pointer to member types (3.9.2),
std::nullptr_t, and cv-qualified
versions of these types (3.9.3) are
collectively called scalar types."
In Expr1, the evaluation of the expression i (first argument), is unsequenced with respect to the evaluation of the expession operator++(i) (which has a side effect).
Hence Expr1 has undefined behavior.
In Expr2, the evaluation of the expression i (first argument), is unsequenced with respect to the evaluation of the expession operator++(i, 0) (which has a side effect)'.
Hence Expr2 has undefined behavior.
In Expr3, the evaluation of the lone argument operator++(i) is required to be complete before the outer operator++ is called.
Hence Expr3 has well defined behavior.
In Expr4, the evaluation of the expression i (first argument) is unsequenced with respect to the evaluation of the operator[](operator++(i, 0) (which has a side effect).
Hence Expr4 has undefined behavior.
Is this understanding correct?
P.S. The method of analyzing the expressions as in OP is not correct. This is because, as #Potatoswatter, notes - "clause 13.6 does not apply. See the disclaimer in 13.6/1, "These candidate functions participate in the operator overload resolution process as described in 13.3.1.2 and are used for no other purpose." They are just dummy declarations; no function-call semantics exist with respect to built-in operators."
Native operator expressions are not equivalent to overloaded operator expressions. There is a sequence point at the binding of values to function arguments, which makes the operator++() versions well-defined. But that doesn't exist for the native-type case.
In all four cases, i changes twice within the full-expression. Since no ,, ||, or && appear in the expressions, that's instant UB.
§5/4:
Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.
Edit for C++0x (updated)
§1.9/15:
The value computations of the operands of an operator are sequenced before the value computation of the result of the operator. If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
Note however that a value computation and a side effect are two distinct things. If ++i is equivalent to i = i+1, then + is the value computation and = is the side effect. From 1.9/12:
Evaluation of an expression (or a sub-expression) in general includes both value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and initiation of side effects.
So although the value computations are more strongly sequenced in C++0x than C++03, the side effects are not. Two side effects in the same expression, unless otherwise sequenced, produce UB.
Value computations are ordered by their data dependencies anyway and, side effects absent, their order of evaluation is unobservable, so I'm not sure why C++0x goes to the trouble of saying anything, but that just means I need to read more of the papers by Boehm and friends wrote.
Edit #3:
Thanks Johannes for coping with my laziness to type "sequenced" into my PDF reader search bar. I was going to bed and getting up on the last two edits anyway… right ;v) .
§5.17/1 defining the assignment operators says
In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.
Also §5.3.2/1 on the preincrement operator says
If x is not of type bool, the expression ++x is equivalent to x+=1 [Note: see … addition (5.7) and assignment operators (5.17) …].
By this identity, ++ ++ x is shorthand for (x +=1) +=1. So, let's interpret that.
Evaluate the 1 on the far RHS and descend into the parens.
Evaluate the inner 1 and the value (prvalue) and address (glvalue) of x.
Now we need the value of the += subexpression.
We're done with the value computations for that subexpression.
The assignment side effect must be sequenced before the value of assignment is available!
Assign the new value to x, which is identical to the glvalue and prvalue result of the subexpression.
We're out of the woods now. The whole expression has now been reduced to x +=1.
So, then 1 and 3 are well-defined and 2 and 4 are undefined behavior, which you would expect.
The only other surprise I found by searching for "sequenced" in N3126 was 5.3.4/16, where the implementation is allowed to call operator new before evaluating constructor arguments. That's cool.
Edit #4: (Oh, what a tangled web we weave)
Johannes notes again that in i == ++i; the glvalue (a.k.a. the address) of i is ambiguously dependent on ++i. The glvalue is certainly a value of i, but I don't think 1.9/15 is intended to include it for the simple reason that the glvalue of a named object is constant, and cannot actually have dependencies.
For an informative strawman, consider
( i % 2? i : j ) = ++ i; // certainly undefined
Here, the glvalue of the LHS of = is dependent on a side-effect on the prvalue of i. The address of i is not in question; the outcome of the ?: is.
Perhaps a good counterexample is
int i = 3, &j = i;
j = ++ i;
Here j has a glvalue distinct from (but identical to) i. Is this well-defined, yet i = ++i is not? This represents a trivial transformation that a compiler could apply to any case.
1.9/15 should say
If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the prvalue of the same scalar object, the behavior is undefined.
In thinking about expressions like those mentioned, I find it useful to imagine a machine where memory has interlocks so that reading a memory location as part of a read-modify-write sequence will cause any attempted read or write, other than the concluding write of the sequence, to be stalled until the sequence completes. Such a machine would hardly be an absurd concept; indeed, such a design could simplify many multi-threaded code scenarios. On the other hand, an expression like "x=y++;" could fail on such a machine if 'x' and 'y' were references to the same variable, and the compiler's generated code did something like read-and-lock reg1=y; reg2=reg1+1; write x=reg1; write-and-unlock y=reg2. That would be a very reasonable code sequence on processors where writing a newly-computed value would impose a pipeline delay, but the write to x would lock up the processor if y were aliased to the same variable.