Why is *ptr = (*ptr)++ Undefined Behavior - c++

I am trying to learn how to explain the cause of(if any) of undefined behavior in the following cases(given below).
int i = 0, *ptr = &i;
i = ++i; //is this UB? If yes then why according to C++11
*ptr = (*ptr)++; //i think this is UB but i am unable to explain exactly why is this so
*ptr = ++(*ptr); //i think this is not UB but can't explain why
I have looked at many SO posts describing UB for different pointer cases similar to the cases above, but still i am unable to explain why exactly(like using which point(s) from the standard we can prove that they will result in UB) they result in UB.
I am looking for explanations according to C++11(or C++14) but not C++17 and not Pre-C++11.

Undefined behavior stems from this:
C++11 [intro.execution]/15 Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced... If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
C++17 [intro.execution]/17 Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced... If a side effect on a memory location (4.4) is unsequenced relative to either another side effect on the same memory location or a value computation using the value of any object in the same memory location, and they are not potentially concurrent (4.7), the behavior is undefined.
This text is similar. The main difference lies in "except where noted" part; in C++17, the order of evaluation of operands is specified for more operators than in C++11. Thus:
C++17 [expr.ass]/1 In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression. The right operand is sequenced before the left operand.
C++11 lacks the bolded part. This part is what makes i = i++ well-defined in C++17, but undefined in C++11. That's because for postfix increment, the side effect is not part of a value computation of the expression:
C++11 and C++17 [expr.post.incr]/1 The value computation of the ++ expression is sequenced before the modification of the operand object.
So "the assignment is sequenced after the value computation of the right and left operands" is not by itself sufficient: the assignment is sequenced after the value computation of i++, and the side effect is also sequenced after that same value computation, but nothing says how they are sequenced relative to each other. Therefore, they are unsequenced, and they are both modifying the same object (here, i). This exhibits undefined behavior.
The addition of "the right operand is sequenced before the left operand" in C++17 means that the side effect of i++ is sequenced before the value computation of i, and both are sequenced before the assignment.
On the other hand, for pre-increment the side effect is necessarily part of the evaluation of the expression:
C++11 and C++17 [expr.pre.incr]/1 ... The result is the updated operand; it is an lvalue ...
So the value computation of ++i involves incrementing i first, and then applying an lvalue-to-rvalue conversion to obtain the updated value. This value computation is sequenced before the assignment in both C++11 and C++17, and so the two side effects on i are sequenced relative to each other; no undefined behavior.
Nothing changes in this analysis if i is replaced with (*ptr). That's just another way to refer to the same object or memory location.

The C++ Standard is based upon the C Standard, whose authors didn't need any particular "reason" to say that implementations may process a construct in whatever fashion would be most useful to their customers [which is what they intended the phrase "Undefined Behavior" to mean]. Many platforms can cheaply guarantee, for small primitive types, that race conditions involving a read and conflicting write to the same object will always yield either old or new data, and that race conditions involving conflicting writes will result every individual subsequent read seeing one of the written values. Rather than trying to identify all of the cases where implementations should or should not be expected to uphold guarantee, the Standard allows implementations to, at their leisure, process code "in a documented manner characteristic of the environment". Because it's not practical for all implementations to offer such guarantees in all possible scenarios, and because the range of scenarios where such guarantees would be practical would be different on different platforms, the authors of the Standard allowed implementations to weigh the pros and cons of offering various behavioral guarantees on their particular target platforms, rather than trying to write precise rules that would be appropriate for all possible implementations.
Note also that if one were to do something like:
*p = (*q)++;
return q[0] + q[i]; // where 'i' is some object of type `int`.
when p and q are equal and i is zero, a compiler might quite plausibly generate code where the assignment would undo the effect of the increment, but which would return the sum of the old value of q, plus 1, plus the actual stored value of q (which would be the old value, rather than the incremented value). Although this would be a logical consequence of the specified race-condition semantics, trying to specify it precisely would have been sufficiently awkward that the Standard simply allows implementations to specify the behavior as tightly or loosely as they see fit.

Related

Strange behavior of multiplying a pre-increment variable to itself in c++

sorry if my question is duplicate or not worth answering.
I have been the following code that produces the result that I am not understanding how.
int x=5;
int y;
y = ++x * ++x;
cout<<x <<endl;
cout<<y;
According to my little understanding of programming, the value of y should be 42, but, the output on the computer is 49. Kindly help what will be the output of variable y.
I am executing the code in Dev-C++.
Thanks in advance.
The short answer is that you have an undefined behavior in there.
The exact answer depends on which standard you use.
Pre C++11, we had the notion of sequence points. Intuitively, they are points in which all the values have been properly stored, such as at the end of statement (at the semicolon). The standard says that
Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an
expression.
, which means that between 2 sequence points (for the sake of simplicity, read as between 2 semicolons), the value of a variable cannot be changed more than once. You change the value twice, using the increment operator.
C++11 removed the notion of sequence points with relationships of sequenced before, sequenced after or unsequenced, referring to the order in which the expressions are evaluated.
In arithmetic expressions,
If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or [...] the behaviour is undefined.
and there is no sequencing between the operators of an arithmetic expression. Therefore, it is still a case of undefined behavior, just for another reason.
This means in practice that the compiler can choose what to do, and, in your case, it produces the results you observe. You should try to avoid undefined behaviour in your programs as much as possible. Below are some references that expand on the subject:
https://en.cppreference.com/w/cpp/language/eval_order
Undefined behavior and sequence points

c++11 Order of evaluation (undefined behavior)

vec[ival++] <= vec[ival]
This expression has undefined behavior, because the order of evaluation of operator's (<=) operands is undefined.
How can we rewrite that expression to avoid the undefined behavior?
I've found an answer that appears to work:
vec[ival] <= vec[ival + 1]
If that is the right way to avoid the undefined behavior, why does rewriting it that way avoid the undefined behavior?
Adding any reference about how to fix that expression would be great.
Yes, your first example has undefined behavior because we have an unsequenced modification and access of a memory location, which is undefined behavior. This is covered in draft C++ standard [intro.execution]p10 (emphasis mine):
Except where noted, evaluations of operands of individual operators
and of subexpressions of individual expressions are unsequenced.
[ Note: In an expression that is evaluated more than once during the
execution of a program, unsequenced and indeterminately sequenced
evaluations of its subexpressions need not be performed consistently
in different evaluations. — end note  ] The value computations of the
operands of an operator are sequenced before the value computation of
the result of the operator. If a side effect on a memory location
([intro.memory]) is unsequenced relative to either another side effect
on the same memory location or a value computation using the value of
any object in the same memory location, and they are not potentially
concurrent ([intro.multithread]), the behavior is undefined. [ Note:
The next subclause imposes similar, but more complex restrictions on
potentially concurrent computations. — end note  ] [ Example:
void g(int i) {
i = 7, i++, i++; // i becomes 9
i = i++ + 1; // the value of i is incremented
i = i++ + i; // the behavior is undefined
i = i + 1; // the value of i is incremented
}
— end example  ]
If we check out the section of relational operators which covers <= [expr.rel] it does not specify an order of evaluation, so we are covered by intro.exection and thus we have undefined behavior.
Having unspecified order of evaluation is not sufficient for undefined behavior as the example in Order of evaluation of assignment statement in C++ demonstrates.
Your second example avoids the undefined behavior since you are not introducing a side effect to ival, you are just reading the memory location twice.
I believe that is a reasonable way to solve the problem, it is readable and not surprising. An alternate method could include introducing a second variable, e.g. index and prev_index. It is hard to come with a fast rule given such a small code snippet.
It avoids undefined behavior because you are not changing the value of ival. The issue you're seeing in the first sample is that we can't determine what the values of ival are at the times that they're used. In the second sample, there's no confusion.
Let's start with the worst problem first, and that is the Undefined Behavior. C++ uses sequencing rules. Statements are executed in seqeuence. Usually that's top-to-bottom, but if statements, for statements, function calls and the like can change that.
Now withing a statement there still might be a further sequence of execution, but I'm very intentionally writing might. By defaults, the various parts of a single statement are not sequenced relative to each other. That's why you can get the varying order of execution. But worse, if you change and use a single object without sequencing, you have Undefined Behavior. That is bad - anything might happen.
The proposed solution (iVal + 1) doesn't change iVal anymore, but generates a new value. That is entirely safe. Still, it leaves iVal unchanged.
You may want to check on std::adjacent_find(). Chances are that the loop you're trying to write is already in the Standard Library.
The first problem is that as the initial code exhibited undefined behavior, under the C++ standard there is no "fix". The behavior of that line of code is not specified by the C++ standard; to know what it is supposed to do you have to have another source of information.
Formally, that expression can be rewritten as system("format c:"), as the C++ standard does not mandate any behavior from a program that exhibits undefined behavior.
But in practice, when you run into something like that, you have to read the original programmer's mind.
Well, you can solve anything with lambdas.
[&]{ bool ret = vec[ival] <= vec[ival+1]; ++ival; return ret; }()
Second,
vec[ival] <= vec[ival+1]
is not the same, because it lacks the side effect of ival being 1 greater after the expression is evaluated.

C/C++ unions and undefined behaviour

Is the following undefined behaviour?
union {
int foo;
float bar;
} baz;
baz.foo = 3.14 * baz.bar;
I remember that writing and reading from the same underlying memory between two sequence points is UB, but I am not certain.
I remember that writing and reading from the same underlying memory between two sequence points is UB, but I am not certain.
Reading and writing to the same memory location in the same expression does not invoke undefined behavior until and unless that location is modified more than once between two sequence points or the side effect is unsequenced relative to the value computation using the value at the same location.
C11: 6.5 Expressions:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. [...]
The expression
baz.foo = 3.14 * baz.bar;
has well defined behaviour if bar is initialized before. The reason is that the side effect to baz.foo is sequenced relative to the value computations of the objects baz.foo and baz.bar.
C11: 6.5.16/3 Assignment operators:
[...] The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.
Disclaimer: This answer addresses C++.
You're accessing an object whose lifetime hasn't begun yet - baz.bar - which induces UB by [basic.life]/(6.1).
Assuming bar has been brought to life (e.g. by initializing it), your code is fine; before the assignment, foo need not be alive as no operation is performed that depends on its value, and during it, the active member is changed by reusing the memory and effectively initializing it. The current rules aren't clear about the latter; see CWG #1116. However, the status quo is that such assignments are indeed setting the target member as active (=alive).
Note that the assignment is sequenced (i.e. guaranteed to happen) after the value computation of the operands - see [expr.ass]/1.
Answering for C, not C++
I thought this was Defined Behavior, but then I read the following paragraph from ISO C2x (which I guess is also present in older C standards, but didn't check):
6.5.16.1/3 (Assignment operators::Simple Assignment::Semantics):
If the value being stored in an object is read from another object
that overlaps in any way the storage of the first object, then the
overlap shall be exact and the two objects shall have qualified or
unqualified versions of a compatible type; otherwise, the behavior is
undefined.
So, let's consider the following:
union {
int a;
const int b;
} db;
union {
int a;
float b;
} ub1;
union {
uint32_t a;
int32_t b;
} ub2;
Then, it is Defined Behavior to do:
db.a = db.b + 1;
But it is Undefined Behavior to do:
ub1.a = ub1.b + 1;
or
ub2.a = ub2.b + 1;
The definition of compatible types is in 6.2.7/1 (Compatible type and composite type). See also: __builtin_types_compatible_p().
The Standard uses the phrase "Undefined Behavior", among other things, as a catch-all for situations where many implementations would process a construct in at least somewhat predictable fashion (e.g. yielding a not-necessarily-predictable value without side effects), but where the authors of the Standard thought it impractical to try to anticipate everything that implementations might do. It wasn't intended as an invitation for implementations to behave gratuitously nonsensically, nor as an indication that code was erroneous (the phrase "non-portable or erroneous" was very much intended to include constructs that might fail on some machines, but would be correct on code which was not intended to be suitable for use with those machines).
On some platforms like the 8051, if a compiler were given a construct like someInt16 += *someUnsignedCharPtr << 4; the most efficient way to process it if it didn't have to accommodate the possibility that the pointer might point to the lower byte of someInt16 would be to fetch *someUnsignedCharPtr, shift it left four bits, add it to the LSB of someInt16 (capturing the carry), reload *someUnsignedCharPtr, shift it right four bits, and add it along with the earlier carry to the MSB of someInt16. Loading the value from *someUnsignedCharPtr twice would be faster than loading it, storing its value to a temporary location before doing the shift, and then having to load its value from that temporary location. If, however, someUnsignedCharPtr were to point to the lower byte of someInt16, then the modification of that lower byte before the second load of someUnsignedCharPtr would corrupt the upper bits of that byte which would, after shifing, be added to the upper byte of someInt16.
The Standard would allow a compiler to generate such code, even though character pointers are exempt from aliasing rules, because it does not require that compilers handle all situations where unsequenced reads and writes affect regions of storage that partially overlap. If such accesses were performed usinng a union instead of a character pointer, a compiler might recognize that the character-type access would always overlap the least significant byte of the 16-bit value, but I don't think the authors of the Standard wanted to require that compilers invest the time and effort that might be necessary to handle such obscure cases meaningfully.

Why is f(i = -1, i = -1) undefined behavior?

I was reading about order of evaluation violations, and they give an example that puzzles me.
1) If a side effect on a scalar object is un-sequenced relative to another side effect on the same scalar object, the behavior is undefined.
// snip
f(i = -1, i = -1); // undefined behavior
In this context, i is a scalar object, which apparently means
Arithmetic types (3.9.1), enumeration types, pointer types, pointer to member types (3.9.2), std::nullptr_t, and cv-qualified versions of these types (3.9.3) are collectively called scalar types.
I don’t see how the statement is ambiguous in that case. It seems to me that regardless of if the first or second argument is evaluated first, i ends up as -1, and both arguments are also -1.
Can someone please clarify?
UPDATE
I really appreciate all the discussion. So far, I like #harmic’s answer a lot since it exposes the pitfalls and intricacies of defining this statement in spite of how straight forward it looks at first glance. #acheong87 points out some issues that come up when using references, but I think that's orthogonal to the unsequenced side effects aspect of this question.
SUMMARY
Since this question got a ton of attention, I will summarize the main points/answers. First, allow me a small digression to point out that "why" can have closely related yet subtly different meanings, namely "for what cause", "for what reason", and "for what purpose". I will group the answers by which of those meanings of "why" they addressed.
for what cause
The main answer here comes from Paul Draper, with Martin J contributing a similar but not as extensive answer. Paul Draper's answer boils down to
It is undefined behavior because it is not defined what the behavior is.
The answer is overall very good in terms of explaining what the C++ standard says. It also addresses some related cases of UB such as f(++i, ++i); and f(i=1, i=-1);. In the first of the related cases, it's not clear if the first argument should be i+1 and the second i+2 or vice versa; in the second, it's not clear if i should be 1 or -1 after the function call. Both of these cases are UB because they fall under the following rule:
If a side effect on a scalar object is unsequenced relative to another side effect on the same scalar object, the behavior is undefined.
Therefore, f(i=-1, i=-1) is also UB since it falls under the same rule, despite that the intention of the programmer is (IMHO) obvious and unambiguous.
Paul Draper also makes it explicit in his conclusion that
Could it have been defined behavior? Yes. Was it defined? No.
which brings us to the question of "for what reason/purpose was f(i=-1, i=-1) left as undefined behavior?"
for what reason / purpose
Although there are some oversights (maybe careless) in the C++ standard, many omissions are well-reasoned and serve a specific purpose. Although I am aware that the purpose is often either "make the compiler-writer's job easier", or "faster code", I was mainly interested to know if there is a good reason leave f(i=-1, i=-1) as UB.
harmic and supercat provide the main answers that provide a reason for the UB. Harmic points out that an optimizing compiler that might break up the ostensibly atomic assignment operations into multiple machine instructions, and that it might further interleave those instructions for optimal speed. This could lead to some very surprising results: i ends up as -2 in his scenario! Thus, harmic demonstrates how assigning the same value to a variable more than once can have ill effects if the operations are unsequenced.
supercat provides a related exposition of the pitfalls of trying to get f(i=-1, i=-1) to do what it looks like it ought to do. He points out that on some architectures, there are hard restrictions against multiple simultaneous writes to the same memory address. A compiler could have a hard time catching this if we were dealing with something less trivial than f(i=-1, i=-1).
davidf also provides an example of interleaving instructions very similar to harmic's.
Although each of harmic's, supercat's and davidf' examples are somewhat contrived, taken together they still serve to provide a tangible reason why f(i=-1, i=-1) should be undefined behavior.
I accepted harmic's answer because it did the best job of addressing all meanings of why, even though Paul Draper's answer addressed the "for what cause" portion better.
other answers
JohnB points out that if we consider overloaded assignment operators (instead of just plain scalars), then we can run into trouble as well.
Since the operations are unsequenced, there is nothing to say that the instructions performing the assignment cannot be interleaved. It might be optimal to do so, depending on CPU architecture. The referenced page states this:
If A is not sequenced before B and B is not sequenced before A, then
two possibilities exist:
evaluations of A and B are unsequenced: they may be performed in any order and may overlap (within a single thread of execution, the
compiler may interleave the CPU instructions that comprise A and B)
evaluations of A and B are indeterminately-sequenced: they may be performed in any order but may not overlap: either A will be complete
before B, or B will be complete before A. The order may be the
opposite the next time the same expression is evaluated.
That by itself doesn't seem like it would cause a problem - assuming that the operation being performed is storing the value -1 into a memory location. But there is also nothing to say that the compiler cannot optimize that into a separate set of instructions that has the same effect, but which could fail if the operation was interleaved with another operation on the same memory location.
For example, imagine that it was more efficient to zero the memory, then decrement it, compared with loading the value -1 in. Then this:
f(i=-1, i=-1)
might become:
clear i
clear i
decr i
decr i
Now i is -2.
It is probably a bogus example, but it is possible.
First, "scalar object" means a type like a int, float, or a pointer (see What is a scalar Object in C++?).
Second, it may seem more obvious that
f(++i, ++i);
would have undefined behavior. But
f(i = -1, i = -1);
is less obvious.
A slightly different example:
int i;
f(i = 1, i = -1);
std::cout << i << "\n";
What assignment happened "last", i = 1, or i = -1? It's not defined in the standard. Really, that means i could be 5 (see harmic's answer for a completely plausible explanation for how this chould be the case). Or you program could segfault. Or reformat your hard drive.
But now you ask: "What about my example? I used the same value (-1) for both assignments. What could possibly be unclear about that?"
You are correct...except in the way the C++ standards committee described this.
If a side effect on a scalar object is unsequenced relative to another side effect on the same scalar object, the behavior is undefined.
They could have made a special exception for your special case, but they didn't. (And why should they? What use would that ever possibly have?) So, i could still be 5. Or your hard drive could be empty. Thus the answer to your question is:
It is undefined behavior because it is not defined what the behavior is.
(This deserves emphasis because many programmers think "undefined" means "random", or "unpredictable". It doesn't; it means not defined by the standard. The behavior could be 100% consistent, and still be undefined.)
Could it have been defined behavior? Yes. Was it defined? No. Hence, it is "undefined".
That said, "undefined" doesn't mean that a compiler will format your hard drive...it means that it could and it would still be a standards-compliant compiler. Realistically, I'm sure g++, Clang, and MSVC will all do what you expected. They just wouldn't "have to".
A different question might be Why did the C++ standards committee choose to make this side-effect unsequenced?. That answer will involve history and opinions of the committee. Or What is good about having this side-effect unsequenced in C++?, which permits any justification, whether or not it was the actual reasoning of the standards committee. You could ask those questions here, or at programmers.stackexchange.com.
A practical reason to not make an exception from the rules just because the two values are the same:
// config.h
#define VALUEA 1
// defaults.h
#define VALUEB 1
// prog.cpp
f(i = VALUEA, i = VALUEB);
Consider the case this was allowed.
Now, some months later, the need arises to change
#define VALUEB 2
Seemingly harmless, isn't it? And yet suddenly prog.cpp wouldn't compile anymore.
Yet, we feel that compilation should not depend on the value of a literal.
Bottom line: there is no exception to the rule because it would make successful compilation depend on the value (rather the type) of a constant.
EDIT
#HeartWare pointed out that constant expressions of the form A DIV B are not allowed in some languages, when B is 0, and cause compilation to fail. Hence changing of a constant could cause compilation errors in some other place. Which is, IMHO, unfortunate. But it is certainly good to restrict such things to the unavoidable.
The confusion is that storing a constant value into a local variable is not one atomic instruction on every architecture the C is designed to be run on. The processor the code runs on matters more than the compiler in this case. For example, on ARM where each instruction can not carry a complete 32 bits constant, storing an int in a variable needs more that one instruction. Example with this pseudo code where you can only store 8 bits at a time and must work in a 32 bits register, i is a int32:
reg = 0xFF; // first instruction
reg |= 0xFF00; // second
reg |= 0xFF0000; // third
reg |= 0xFF000000; // fourth
i = reg; // last
You can imagine that if the compiler wants to optimize it may interleave the same sequence twice, and you don't know what value will get written to i; and let's say that he is not very smart:
reg = 0xFF;
reg |= 0xFF00;
reg |= 0xFF0000;
reg = 0xFF;
reg |= 0xFF000000;
i = reg; // writes 0xFF0000FF == -16776961
reg |= 0xFF00;
reg |= 0xFF0000;
reg |= 0xFF000000;
i = reg; // writes 0xFFFFFFFF == -1
However in my tests gcc is kind enough to recognize that the same value is used twice and generates it once and does nothing weird. I get -1, -1
But my example is still valid as it is important to consider that even a constant may not be as obvious as it seems to be.
Behavior is commonly specified as undefined if there is some conceivable reason why a compiler which was trying to be "helpful" might do something which would cause totally unexpected behavior.
In the case where a variable is written multiple times with nothing to ensure that the writes happen at distinct times, some kinds of hardware might allow multiple "store" operations to be performed simultaneously to different addresses using a dual-port memory. However, some dual-port memories expressly forbid the scenario where two stores hit the same address simultaneously, regardless of whether or not the values written match. If a compiler for such a machine notices two unsequenced attempts to write the same variable, it might either refuse to compile or ensure that the two writes cannot get scheduled simultaneously. But if one or both of the accesses is via a pointer or reference, the compiler might not always be able to tell whether both writes might hit the same storage location. In that case, it might schedule the writes simultaneously, causing a hardware trap on the access attempt.
Of course, the fact that someone might implement a C compiler on such a platform does not suggest that such behavior shouldn't be defined on hardware platforms when using stores of types small enough to be processed atomically. Trying to store two different values in unsequenced fashion could cause weirdness if a compiler isn't aware of it; for example, given:
uint8_t v; // Global
void hey(uint8_t *p)
{
moo(v=5, (*p)=6);
zoo(v);
zoo(v);
}
if the compiler in-lines the call to "moo" and can tell it doesn't modify
"v", it might store a 5 to v, then store a 6 to *p, then pass 5 to "zoo",
and then pass the contents of v to "zoo". If "zoo" doesn't modify "v",
there should be no way the two calls should be passed different values,
but that could easily happen anyway. On the other hand, in cases where
both stores would write the same value, such weirdness could not occur and
there would on most platforms be no sensible reason for an implementation
to do anything weird. Unfortunately, some compiler writers don't need any
excuse for silly behaviors beyond "because the Standard allows it", so even
those cases aren't safe.
C++17 defines stricter evaluation rules. In particular, it sequences function arguments (although in unspecified order).
N5659 §4.6:15
Evaluations A and B are indeterminately sequenced when either A is sequenced before B or B is sequenced before A,
but it is unspecified which. [ Note: Indeterminately sequenced evaluations cannot overlap, but either could
be executed first. —end note ]
N5659 § 8.2.2:5
The
initialization of a parameter, including every associated value computation and side effect, is indeterminately
sequenced with respect to that of any other parameter.
It allows some cases which would be UB before:
f(i = -1, i = -1); // value of i is -1
f(i = -1, i = -2); // value of i is either -1 or -2, but not specified which one
The fact that the result would be the same in most implementations in this case is incidental; the order of evaluation is still undefined. Consider f(i = -1, i = -2): here, order matters. The only reason it doesn't matter in your example is the accident that both values are -1.
Given that the expression is specified as one with an undefined behaviour, a maliciously compliant compiler might display an inappropriate image when you evaluate f(i = -1, i = -1) and abort the execution - and still be considered completely correct. Luckily, no compilers I am aware of do so.
It looks to me like the only rule pertaining to sequencing of function argument expression is here:
3) When calling a function (whether or not the function is inline, and whether or not explicit function call syntax is used), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function.
This does not define sequencing between argument expressions, so we end up in this case:
1) If a side effect on a scalar object is unsequenced relative to another side effect on the same scalar object, the behavior is undefined.
In practice, on most compilers, the example you quoted will run fine (as opposed to "erasing your hard disk" and other theoretical undefined behavior consequences).
It is, however, a liability, as it depends on specific compiler behaviour, even if the two assigned values are the same. Also, obviously, if you tried to assign different values, the results would be "truly" undefined:
void f(int l, int r) {
return l < -1;
}
auto b = f(i = -1, i = -2);
if (b) {
formatDisk();
}
The assignment operator could be overloaded, in which case the order could matter:
struct A {
bool first;
A () : first (false) {
}
const A & operator = (int i) {
first = !first;
return * this;
}
};
void f (A a1, A a2) {
// ...
}
// ...
A i;
f (i = -1, i = -1); // the argument evaluated first has ax.first == true
Actually, there's a reason not to depend on the fact that compiler will check that i is assigned with the same value twice, so that it's possible to replace it with single assignment. What if we have some expressions?
void g(int a, int b, int c, int n) {
int i;
// hey, compiler has to prove Fermat's theorem now!
f(i = 1, i = (ipow(a, n) + ipow(b, n) == ipow(c, n)));
}
This is just answering the "I'm not sure what "scalar object" could mean besides something like an int or a float".
I would interpret the "scalar object" as a abbreviation of "scalar type object", or just "scalar type variable". Then, pointer, enum (constant) are of scalar type.
This is a MSDN article of Scalar Types.

C + C++ = undefined behavior? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How do we explain the result of the expression (++x)+(++x)+(++x)?
Undefined Behavior and Sequence Points
I have the problem, when the code
U = C + C++;
Runs in different way for standart types and for my own types.
I have an example http://ideone.com/4S1uA where I have different values for int and my class Int, which should represent the way real Int works.
Is it possible to make my class behave the same way, as the standard int works? Has this code undefined behavior?
WHY it is undefiend behaivior? C++ has an operation priorities, so the c++ should be evaluated first, as it change the value of a, so for addition as first argument should be passed new value of a and as the second the old value. And it's works this way for class Int, but not for standart int.
Has this code undefined behavior?
Yes. The order in which the operands are evaluated, with respect to the side effect, is undefined.
Section 6.5(2) of the standard says:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an unsequenced side
effect occurs in any of the orderings.
Since int is a scalar type, and since the side effect here is unsequenced, the behavior is undefined.
You should write your code like this:
U = 2*C;
C++;
Yes, that is undefined behavior. You can't access a variable twice in a statement that also modifies it because the order in which the expression 'C' and the expression 'C++' are evaluated is not defined.
The concept involved here is one of sequence points. To quote the opening sentence from the Wikipedia article:
A sequence point in imperative programming defines any point in a computer program's execution at which it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have yet been performed.
In C, the + operator does not create a sequence point. Therefore the order of side effects is not defined. However, in C++, an overloaded operator + is a function call, which does create a sequence point. This creates different behavior with respect to side effects. Note that while the order in which function arguments are evaluated is not specified, all side effects are completed before the function enters. So if C + C++ involves an overloaded + operator, then the C++ side effect will have been applied to the left argument of + before the + function executes. This is unlike the case for int values, where the left side may or may not be evaluated before the side effect of the right side is complete.