Is this code behavior defined? - c++

What does the following code print to the console?
map<int,int> m;
m[0] = m.size();
printf("%d", m[0]);
Possible answers:
The behavior of the code is not defined since it is not defined which statement m[0] or m.size() is being executed first by the compiler. So it could print 1 as well as 0.
It prints 0 because the right hand side of the assignment operator is executed first.
It prints 1 because the operator[] has the highest priority of the complete statement m[0] = m.size(). Because of this the following sequence of events occurs:
m[0] creates a new element in the map
m.size() gets called which is now 1
m[0] gets assigned the previously returned (by m.size()) 1
The real answer?, which is unknown to me^^

I believe it's unspecified whether 0 or 1 is stored in m[0], but it's not undefined behavior.
The LHS and the RHS can occur in either order, but they're both function calls, so they both have a sequence point at the start and end. There's no danger of the two of them, collectively, accessing the same object without an intervening sequence point.
The assignment is actual int assignment, not a function call with associated sequence points, since operator[] returns T&. That's briefly worrying, but it's not modifying an object that is accessed anywhere else in this statement, so that's safe too. It's accessed within operator[], of course, where it is initialized, but that occurs before the sequence point on return from operator[], so that's OK. If it wasn't, m[0] = 0; would be undefined too!
However, the order of evaluation of the operands of operator= is not specified by the standard, so the actual result of the call to size() might be 0 or 1 depending which order occurs.
The following would be undefined behavior, though. It doesn't make function calls and so there's nothing to prevent size being accessed (on the RHS) and modified (on the LHS) without an intervening sequence point:
int values[1];
int size = 0;
(++size, values[0] = 0) = size;
/* fake m[0] */ /* fake m.size() */

It does print 1, and without raising a warning(!) using gcc. It should raise a warning because it is undefined.
The precedence class of both operator[] and operator. is 2 whereas the precedence class of operator= is 16.
This means that it is well-defined that m[0] and m.size() will be executed before the assignment. However, it is not defined which one executes first.

There is no sequence point between the call to operator [] and the call to clear in this statement. Consequently, the behaviour should be undefined.

Given that C++17 is pretty much here, I think it's worth mentioning that this code now exhibits well defined behavior under the new standard. For this case of = being the built-in assignment to an integer:
[expr.ass]/1:
The assignment operator (=) and the compound assignment operators all
group right-to-left. All require a modifiable lvalue as their left
operand and return an lvalue referring to the left operand. The result
in all cases is a bit-field if the left operand is a bit-field. In all
cases, the assignment is sequenced after the value computation of the
right and left operands, and before the value computation of the
assignment expression. The right operand is sequenced before the left
operand. With respect to an indeterminately-sequenced function call,
the operation of a compound assignment is a single evaluation.
Which leaves us with only one option, and that is #2.

Related

Does the modification of the LHS always happen strictly after the RHS has been evaluated during assignment for built-in type? [duplicate]

map<int, int> mp;
printf("%d ", mp.size());
mp[10]=mp.size();
printf("%d\n", mp[10]);
This code yields an answer that is not very intuitive:
0 1
I understand why it happens - the left side of the assignment returns reference to mp[10]'s underlying value and at the same time creates aforementioned value, and only then is the right side evaluated, using the newly computed size() of the map.
Is this behaviour stated anywhere in C++ standard? Or is the order of evaluation undefined?
Result was obtained using g++ 5.2.1.
Yes, this is covered by the standard and it is unspecified behavior. This particular case is covered in a recent C++ standards proposal: N4228: Refining Expression Evaluation Order for Idiomatic C++ which seeks to refine the order of evaluation rules to make it well specified for certain cases.
It describes this problem as follows:
Expression evaluation order is a recurring discussion topic in the C++
community. In a nutshell, given an expression such as f(a, b,
c), the order in which the sub-expressions f, a, b, c are evaluated is left unspecified by the standard. If any two of these sub-expressions happen to modify the same object without intervening sequence points, the behavior of the program is undefined. For instance, the expression f(i++, i) where i is an
integer variable leads to undefined behavior , as does v[i]
= i++. Even when the behavior is not undefined, the result of evaluating an expression can still be anybody’s guess. Consider
the following program fragment:
#include <map>
int main() {
std::map<int, int> m;
m[0] = m.size(); // #1
}
What should the map object m look like after evaluation of the
statement marked #1? { {0, 0 } } or {{0, 1 } } ?
We know that unless specified the evaluations of sub-expressions are unsequenced, this is from the draft C++11 standard section 1.9 Program execution which says:
Except where noted, evaluations of operands of individual operators
and of subexpressions of individual expressions are unsequenced.[...]
and all the section 5.17 Assignment and compound assignment operators [expr.ass] says is:
[...]In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.[...]
So this section does not nail down the order of evaluation but we know this is not undefined behavior since both operator [] and size() are function calls and section 1.9 tells us(emphasis mine):
[...]When calling a function (whether or not the function is inline), every value computation and side effect
associated with any argument expression, or with the postfix expression designating the called function, is
sequenced before execution of every expression or statement in the body of the called function. [ Note: Value
computations and side effects associated with different argument expressions are unsequenced. —end note ]
Every evaluation in the calling function (including other function calls) that is not otherwise specifically
sequenced before or after the execution of the body of the called function is indeterminately sequenced with
respect to the execution of the called function.9[...]
Note, I cover the second interesting example from the N4228 proposal in the question Does this code from “The C++ Programming Language” 4th edition section 36.3.6 have well-defined behavior?.
Update
It seems like a revised version of N4228 was accepted by the Evolution Working Group at the last WG21 meeting but the paper(P0145R0) is not yet available. So this could possibly no longer be unspecified in C++17.
Update 2
Revision 3 of p0145 made this specified and update [expr.ass]p1:
The assignment operator (=) and the compound assignment operators all group right-to-left.
All require a modifiable lvalue as their left operand; their result is an lvalue referring to the left operand.
The result in all cases is a bit-field if the left operand is a bit-field.
In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression. The right operand is sequenced before the left operand. ...
From the C++11 standard (emphasis mine):
5.17 Assignment and compound assignment operators
1 The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modifiable lvalue as their left operand and return an lvalue referring to the left operand. The result in all cases is a bit-field if the left operand is a bit-field. In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.
Whether the left operand is evaluated first or the right operand is evaluated first is not specified by the language. A compiler is free to choose to evaluate either operand first. Since the final result of your code depends on the order of evaluation of the operands, I would say it is unspecified behavior rather than undefined behavior.
1.3.25 unspecified behavior
behavior, for a well-formed program construct and correct data, that depends on the implementation
I'm sure that the standard does not specify for an expression x = y; which order x or y is evaluated in the C++ standard (this is the reason why you can't do *p++ = *p++ for example, because p++ is not done in a defined order).
In other words, to guarantee order x = y; in a defined order, you need to do break it up into two sequence points.
T tmp = y;
x = tmp;
(Of course, in this particular case, one might presume the compiler prefers to do operator[] before size() because it can then store the value directly into the result of operator[] instead of keeping it in a temporary place, to store it later after operator[] has been evaluated - but I'm pretty sure the compiler doesn't NEED to do it in that order)
Let's take a look at what your code breaks down to:
mp.operator[](10).operator=(mp.size());
which pretty much tells the story that in the first part an entry to 10 is created and in the second part the size of the container is assigned to the integer reference in position of 10.
But now you get into the order of evaluation problem which is unspecified. Here is a much simpler example .
When should map::size() get called, before or after map::operator(int const &); ?
Nobody really knows.

the definition of lvalue is different between C and C++? [duplicate]

I have been fooling around with some code and saw something that I don't understand the "why" of.
int i = 6;
int j;
int *ptr = &i;
int *ptr1 = &j
j = i++;
//now j == 6 and i == 7. Straightforward.
What if you put the operator on the left side of the equals sign?
++ptr = ptr1;
is equivalent to
(ptr = ptr + 1) = ptr1;
whereas
ptr++ = ptr1;
is equivalent to
ptr = ptr + 1 = ptr1;
The postfix runs a compilation error and I get it. You've got a constant "ptr + 1" on the left side of an assignment operator. Fair enough.
The prefix one compiles and WORKS in C++. Yes, I understand it's messy and you're dealing with unallocated memory, but it works and compiles. In C this does not compile, returning the same error as the postfix "lvalue required as left operand of assignment". This happens no matter how it's written, expanded out with two "=" operators or with the "++ptr" syntax.
What is the difference between how C handles such an assignment and how C++ handles it?
In both C and C++, the result of x++ is an rvalue, so you can't assign to it.
In C, ++x is equivalent to x += 1 (C standard §6.5.3.1/p2; all C standard cites are to WG14 N1570). In C++, ++x is equivalent to x += 1 if x is not a bool (C++ standard §5.3.2 [expr.pre.incr]/p1; all C++ standard cites are to WG21 N3936).
In C, the result of an assignment expression is an rvalue (C standard §6.5.16/p3):
An assignment operator stores a value in the object designated by the
left operand. An assignment expression has the value of the left
operand after the assignment, but is not an lvalue.
Because it's not an lvalue, you can't assign to it: (C standard §6.5.16/p2 - note that this is a constraint)
An assignment operator shall have a modifiable lvalue as its left
operand.
In C++, the result of an assignment expression is an lvalue (C++ standard §5.17 [expr.ass]/p1):
The assignment operator (=) and the compound assignment operators all
group right-to-left. All require a modifiable lvalue as their left
operand and return an lvalue referring to the left operand.
So ++ptr = ptr1; is a diagnosable constraint violation in C, but does not violate any diagnosable rule in C++.
However, pre-C++11, ++ptr = ptr1; has undefined behavior, as it modifies ptr twice between two adjacent sequence points.
In C++11, the behavior of ++ptr = ptr1 becomes well defined. It's clearer if we rewrite it as
(ptr += 1) = ptr1;
Since C++11, the C++ standard provides that (§5.17 [expr.ass]/p1)
In all cases, the assignment is sequenced after the value computation
of the right and left operands, and before the value computation of
the assignment expression. With respect to an
indeterminately-sequenced function call, the operation of a compound
assignment is a single evaluation.
So the assignment performed by the = is sequenced after the value computation of ptr += 1 and ptr1. The assignment performed by the += is sequenced before the value computation of ptr += 1, and all value computations required by the += are necessarily sequenced before that assignment. Thus, the sequencing here is well-defined and there is no undefined behavior.
In C the result of pre and post increment are rvalues and we can not assign to an rvalue, we need an lvalue(also see: Understanding lvalues and rvalues in C and C++) . We can see by going to the draft C11 standard section 6.5.2.4 Postfix increment and decrement operators which says (emphasis mine going forward):
The result of the postfix ++ operator is the value of the
operand. [...] See the discussions of additive operators and compound
assignment for information on constraints, types, and conversions and
the effects of operations on pointers. [...]
So the result of post-increment is a value which is synonymous for rvalue and we can confirm this by going to section 6.5.16 Assignment operators which the paragraph above points us to for further understanding of constraints and results, it says:
[...] An assignment expression has the value of the left operand after the
assignment, but is not an lvalue.[...]
which further confirms the result of post-increment is not an lvalue.
For pre-increment we can see from section 6.5.3.1 Prefix increment and decrement operators which says:
[...]See the discussions of additive operators and compound assignment for
information on constraints, types, side effects, and conversions and
the effects of operations on pointers.
also points back to 6.5.16 like post-increment does and therefore the result of pre-increment in C is also not an lvalue.
In C++ post-increment is also an rvalue, more specifically a prvalue we can confirm this by going to section 5.2.6 Increment and decrement which says:
[...]The result is a prvalue. The type of the result is the cv-unqualified
version of the type of the operand[...]
With respect to pre-increment C and C++ differ. In C the result is an rvalue while in C++ the result is a lvalue which explains why ++ptr = ptr1; works in C++ but not C.
For C++ this is covered in section 5.3.2 Increment and decrement which says:
[...]The result is the updated operand; it is an lvalue, and it is a
bit-field if the operand is a bit-field.[...]
To understand whether:
++ptr = ptr1;
is well defined or not in C++ we need two different approaches one for pre C++11 and one for C++11.
Pre C++11 this expression invokes undefined behavior, since it is modifying the object more than once within the same sequence point. We can see this by going to a Pre C++11 draft standard section 5 Expressions which says:
Except where noted, the order of evaluation of operands of individual
operators and subexpressions of individual expressions, and the order
in which side effects take place, is unspecified.57) Between the
previous and next sequence point a scalar object shall have its stored
value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be accessed only to determine the
value to be stored. The requirements of this paragraph shall be met
for each allowable ordering of the subexpressions of a full
expression; otherwise the behavior is undefined. [ Example:
i = v[i ++]; / / the behavior is undefined
i = 7 , i++ , i ++; / / i becomes 9
i = ++ i + 1; / / the behavior is undefined
i = i + 1; / / the value of i is incremented
—end example ]
We are incrementing ptr and then subsequently assigning to it, which is two modifications and in this case the sequence point occurs at the end of the expression after the ;.
For C+11, we should go to defect report 637: Sequencing rules and example disagree which was the defect report that resulted in:
i = ++i + 1;
becoming well defined behavior in C++11 whereas prior to C++11 this was undefined behavior. The explanation in this report is one of best I have even seen and reading it many times was enlightening and helped me understand many concepts in a new light.
The logic that lead to this expression becoming well defined behavior goes as follows:
The assignment side-effect is required to be sequenced after the value computations of both its LHS and RHS (5.17 [expr.ass] paragraph 1).
The LHS (i) is an lvalue, so its value computation involves computing the address of i.
In order to value-compute the RHS (++i + 1), it is necessary to first value-compute the lvalue expression ++i and then do an lvalue-to-rvalue conversion on the result. This guarantees that the incrementation side-effect is sequenced before the computation of the addition operation, which in turn is sequenced before the assignment side effect. In other words, it yields a well-defined order and final value for this expression.
The logic is somewhat similar for:
++ptr = ptr1;
The value computations of the LHS and RHS are sequenced before the assignment side-effect.
The RHS is an lvalue, so its value computation involves computing the address of ptr1.
In order to value-compute the LHS (++ptr), it is necessary to first value-compute the lvalue expression ++ptr and then do an lvalue-to-rvalue conversion on the result. This guarantees that the incrementation side-effect is sequenced before the assignment side effect. In other words, it yields a well-defined order and final value for this expression.
Note
The OP said:
Yes, I understand it's messy and you're dealing with unallocated
memory, but it works and compiles.
Pointers to non-array objects are considered arrays of size one for additive operators, I am going to quote the draft C++ standard but C11 has almost the exact same text. From section 5.7 Additive operators:
For the purposes of these operators, a pointer to a nonarray object
behaves the same as a pointer to the first element of an array of
length one with the type of the object as its element type.
and further tells us pointing one past the end of an array is valid as long as you don't dereference the pointer:
[...]If both the pointer operand and the result point to elements of
the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.
so:
++ptr ;
is still a valid pointer.

std::map operator [] -- undefined behaviour?

I've just stuck myself with the following question: should this cause undefined behaviour or not and why?
std::map<int, int> m;
m[10] += 1;
It compiles and runs perfectly but it doesn't prove anything.
It resembles a common UB example i = ++i + i++; since operator[] does have side effects but on the other hand assuming any order of evaluation (left to right and right to left) brings me to the same final state of the map
P.S. possibly related: http://en.cppreference.com/w/cpp/language/eval_order
edit
Sorry guys I should have written
m[10] = m[10] + 1;
There is nothing undefined about this. The operator[] returns an lvalue reference to the map entry (which it creates if necessary). You are then merely incrementing this lvalue expression, i.e. the underlying entry.
The rules for evaluation order state that for a modifying assign operation, the side effect is sequenced strictly after the evaluation of both the left (i.e. lvalue reference to the map entry) and right (i.e. the constant 1) operands. There is no ambiguity at all in this example.
UPDATE: In your updated example nothing changes. Again the side effect of modifying m[10] is sequenced strictly after the other operations (i.e. evaluating as an lvalue on the right, evaluating it on the right, and performing the addition).
The relevant sequencing rule, from cppreference:
8) The side effect (modification of the left argument) of the built-in
assignment operator and of all built-in compound assignment operators
is sequenced after the value computation (but not the side effects) of
both left and right arguments, and is sequenced before the value
computation of the assignment expression (that is, before returning
the reference to the modified object)
I am not quite sure what your worry is (and maybe you should clarify your question if that answer isn't sufficient), but m[10] += 1; doesn't get translated to m[10] = m[10] + 1; because m is user defined class type and overloaded operators don't get translated by the compiler, ever. For a and b objects with a user defined class type:
a+=b doesn't mean a = a + b (unless you make it so)
a!=b doesn't mean !(a==b) (unless you make it so)
Also, function calls are never duplicated.
So m[10] += 1; means call overloaded operator[] once; return type is a reference, so the expression is a lvalue; then apply the builtin operator += to the lvalue.
There is no order of evaluation issue. There isn't even multiple possible orders of evaluation!
Also, you need to remember that the std::map<>::operator[] doesn't behave like std::vector<>::operator[] (or std::deque's), because the map is a completely different abstraction: vector and deque are implementations of the Sequence concept (where position matters), but map is an associative container (where "key" matters, not position):
std::vector<>::operator[] takes a numerical index, and doesn't make sense if such index doesn't refer to an element of the vector.
std::map<>::operator[] takes a key (which can be any type satisfying basic constraints) and will create a (key,value) pair if none exists.
Note that for this reason, std::map<>::operator[] is inherently a modifying operation and thus non const, while std::vector<>::operator[] isn't inherently modifying but can allow modification via the returned reference, and is thus "transitively" const: v[i] will be a modifiable lvalue if v is a non-const vector and a const lvalue if v is a const vector.
So no worry, the code has perfectly well defined behavior.

Order of evaluation of assignment statement in C++

map<int, int> mp;
printf("%d ", mp.size());
mp[10]=mp.size();
printf("%d\n", mp[10]);
This code yields an answer that is not very intuitive:
0 1
I understand why it happens - the left side of the assignment returns reference to mp[10]'s underlying value and at the same time creates aforementioned value, and only then is the right side evaluated, using the newly computed size() of the map.
Is this behaviour stated anywhere in C++ standard? Or is the order of evaluation undefined?
Result was obtained using g++ 5.2.1.
Yes, this is covered by the standard and it is unspecified behavior. This particular case is covered in a recent C++ standards proposal: N4228: Refining Expression Evaluation Order for Idiomatic C++ which seeks to refine the order of evaluation rules to make it well specified for certain cases.
It describes this problem as follows:
Expression evaluation order is a recurring discussion topic in the C++
community. In a nutshell, given an expression such as f(a, b,
c), the order in which the sub-expressions f, a, b, c are evaluated is left unspecified by the standard. If any two of these sub-expressions happen to modify the same object without intervening sequence points, the behavior of the program is undefined. For instance, the expression f(i++, i) where i is an
integer variable leads to undefined behavior , as does v[i]
= i++. Even when the behavior is not undefined, the result of evaluating an expression can still be anybody’s guess. Consider
the following program fragment:
#include <map>
int main() {
std::map<int, int> m;
m[0] = m.size(); // #1
}
What should the map object m look like after evaluation of the
statement marked #1? { {0, 0 } } or {{0, 1 } } ?
We know that unless specified the evaluations of sub-expressions are unsequenced, this is from the draft C++11 standard section 1.9 Program execution which says:
Except where noted, evaluations of operands of individual operators
and of subexpressions of individual expressions are unsequenced.[...]
and all the section 5.17 Assignment and compound assignment operators [expr.ass] says is:
[...]In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.[...]
So this section does not nail down the order of evaluation but we know this is not undefined behavior since both operator [] and size() are function calls and section 1.9 tells us(emphasis mine):
[...]When calling a function (whether or not the function is inline), every value computation and side effect
associated with any argument expression, or with the postfix expression designating the called function, is
sequenced before execution of every expression or statement in the body of the called function. [ Note: Value
computations and side effects associated with different argument expressions are unsequenced. —end note ]
Every evaluation in the calling function (including other function calls) that is not otherwise specifically
sequenced before or after the execution of the body of the called function is indeterminately sequenced with
respect to the execution of the called function.9[...]
Note, I cover the second interesting example from the N4228 proposal in the question Does this code from “The C++ Programming Language” 4th edition section 36.3.6 have well-defined behavior?.
Update
It seems like a revised version of N4228 was accepted by the Evolution Working Group at the last WG21 meeting but the paper(P0145R0) is not yet available. So this could possibly no longer be unspecified in C++17.
Update 2
Revision 3 of p0145 made this specified and update [expr.ass]p1:
The assignment operator (=) and the compound assignment operators all group right-to-left.
All require a modifiable lvalue as their left operand; their result is an lvalue referring to the left operand.
The result in all cases is a bit-field if the left operand is a bit-field.
In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression. The right operand is sequenced before the left operand. ...
From the C++11 standard (emphasis mine):
5.17 Assignment and compound assignment operators
1 The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modifiable lvalue as their left operand and return an lvalue referring to the left operand. The result in all cases is a bit-field if the left operand is a bit-field. In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression.
Whether the left operand is evaluated first or the right operand is evaluated first is not specified by the language. A compiler is free to choose to evaluate either operand first. Since the final result of your code depends on the order of evaluation of the operands, I would say it is unspecified behavior rather than undefined behavior.
1.3.25 unspecified behavior
behavior, for a well-formed program construct and correct data, that depends on the implementation
I'm sure that the standard does not specify for an expression x = y; which order x or y is evaluated in the C++ standard (this is the reason why you can't do *p++ = *p++ for example, because p++ is not done in a defined order).
In other words, to guarantee order x = y; in a defined order, you need to do break it up into two sequence points.
T tmp = y;
x = tmp;
(Of course, in this particular case, one might presume the compiler prefers to do operator[] before size() because it can then store the value directly into the result of operator[] instead of keeping it in a temporary place, to store it later after operator[] has been evaluated - but I'm pretty sure the compiler doesn't NEED to do it in that order)
Let's take a look at what your code breaks down to:
mp.operator[](10).operator=(mp.size());
which pretty much tells the story that in the first part an entry to 10 is created and in the second part the size of the container is assigned to the integer reference in position of 10.
But now you get into the order of evaluation problem which is unspecified. Here is a much simpler example .
When should map::size() get called, before or after map::operator(int const &); ?
Nobody really knows.

The difference between C and C++ regarding the ++ operator

I have been fooling around with some code and saw something that I don't understand the "why" of.
int i = 6;
int j;
int *ptr = &i;
int *ptr1 = &j
j = i++;
//now j == 6 and i == 7. Straightforward.
What if you put the operator on the left side of the equals sign?
++ptr = ptr1;
is equivalent to
(ptr = ptr + 1) = ptr1;
whereas
ptr++ = ptr1;
is equivalent to
ptr = ptr + 1 = ptr1;
The postfix runs a compilation error and I get it. You've got a constant "ptr + 1" on the left side of an assignment operator. Fair enough.
The prefix one compiles and WORKS in C++. Yes, I understand it's messy and you're dealing with unallocated memory, but it works and compiles. In C this does not compile, returning the same error as the postfix "lvalue required as left operand of assignment". This happens no matter how it's written, expanded out with two "=" operators or with the "++ptr" syntax.
What is the difference between how C handles such an assignment and how C++ handles it?
In both C and C++, the result of x++ is an rvalue, so you can't assign to it.
In C, ++x is equivalent to x += 1 (C standard §6.5.3.1/p2; all C standard cites are to WG14 N1570). In C++, ++x is equivalent to x += 1 if x is not a bool (C++ standard §5.3.2 [expr.pre.incr]/p1; all C++ standard cites are to WG21 N3936).
In C, the result of an assignment expression is an rvalue (C standard §6.5.16/p3):
An assignment operator stores a value in the object designated by the
left operand. An assignment expression has the value of the left
operand after the assignment, but is not an lvalue.
Because it's not an lvalue, you can't assign to it: (C standard §6.5.16/p2 - note that this is a constraint)
An assignment operator shall have a modifiable lvalue as its left
operand.
In C++, the result of an assignment expression is an lvalue (C++ standard §5.17 [expr.ass]/p1):
The assignment operator (=) and the compound assignment operators all
group right-to-left. All require a modifiable lvalue as their left
operand and return an lvalue referring to the left operand.
So ++ptr = ptr1; is a diagnosable constraint violation in C, but does not violate any diagnosable rule in C++.
However, pre-C++11, ++ptr = ptr1; has undefined behavior, as it modifies ptr twice between two adjacent sequence points.
In C++11, the behavior of ++ptr = ptr1 becomes well defined. It's clearer if we rewrite it as
(ptr += 1) = ptr1;
Since C++11, the C++ standard provides that (§5.17 [expr.ass]/p1)
In all cases, the assignment is sequenced after the value computation
of the right and left operands, and before the value computation of
the assignment expression. With respect to an
indeterminately-sequenced function call, the operation of a compound
assignment is a single evaluation.
So the assignment performed by the = is sequenced after the value computation of ptr += 1 and ptr1. The assignment performed by the += is sequenced before the value computation of ptr += 1, and all value computations required by the += are necessarily sequenced before that assignment. Thus, the sequencing here is well-defined and there is no undefined behavior.
In C the result of pre and post increment are rvalues and we can not assign to an rvalue, we need an lvalue(also see: Understanding lvalues and rvalues in C and C++) . We can see by going to the draft C11 standard section 6.5.2.4 Postfix increment and decrement operators which says (emphasis mine going forward):
The result of the postfix ++ operator is the value of the
operand. [...] See the discussions of additive operators and compound
assignment for information on constraints, types, and conversions and
the effects of operations on pointers. [...]
So the result of post-increment is a value which is synonymous for rvalue and we can confirm this by going to section 6.5.16 Assignment operators which the paragraph above points us to for further understanding of constraints and results, it says:
[...] An assignment expression has the value of the left operand after the
assignment, but is not an lvalue.[...]
which further confirms the result of post-increment is not an lvalue.
For pre-increment we can see from section 6.5.3.1 Prefix increment and decrement operators which says:
[...]See the discussions of additive operators and compound assignment for
information on constraints, types, side effects, and conversions and
the effects of operations on pointers.
also points back to 6.5.16 like post-increment does and therefore the result of pre-increment in C is also not an lvalue.
In C++ post-increment is also an rvalue, more specifically a prvalue we can confirm this by going to section 5.2.6 Increment and decrement which says:
[...]The result is a prvalue. The type of the result is the cv-unqualified
version of the type of the operand[...]
With respect to pre-increment C and C++ differ. In C the result is an rvalue while in C++ the result is a lvalue which explains why ++ptr = ptr1; works in C++ but not C.
For C++ this is covered in section 5.3.2 Increment and decrement which says:
[...]The result is the updated operand; it is an lvalue, and it is a
bit-field if the operand is a bit-field.[...]
To understand whether:
++ptr = ptr1;
is well defined or not in C++ we need two different approaches one for pre C++11 and one for C++11.
Pre C++11 this expression invokes undefined behavior, since it is modifying the object more than once within the same sequence point. We can see this by going to a Pre C++11 draft standard section 5 Expressions which says:
Except where noted, the order of evaluation of operands of individual
operators and subexpressions of individual expressions, and the order
in which side effects take place, is unspecified.57) Between the
previous and next sequence point a scalar object shall have its stored
value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be accessed only to determine the
value to be stored. The requirements of this paragraph shall be met
for each allowable ordering of the subexpressions of a full
expression; otherwise the behavior is undefined. [ Example:
i = v[i ++]; / / the behavior is undefined
i = 7 , i++ , i ++; / / i becomes 9
i = ++ i + 1; / / the behavior is undefined
i = i + 1; / / the value of i is incremented
—end example ]
We are incrementing ptr and then subsequently assigning to it, which is two modifications and in this case the sequence point occurs at the end of the expression after the ;.
For C+11, we should go to defect report 637: Sequencing rules and example disagree which was the defect report that resulted in:
i = ++i + 1;
becoming well defined behavior in C++11 whereas prior to C++11 this was undefined behavior. The explanation in this report is one of best I have even seen and reading it many times was enlightening and helped me understand many concepts in a new light.
The logic that lead to this expression becoming well defined behavior goes as follows:
The assignment side-effect is required to be sequenced after the value computations of both its LHS and RHS (5.17 [expr.ass] paragraph 1).
The LHS (i) is an lvalue, so its value computation involves computing the address of i.
In order to value-compute the RHS (++i + 1), it is necessary to first value-compute the lvalue expression ++i and then do an lvalue-to-rvalue conversion on the result. This guarantees that the incrementation side-effect is sequenced before the computation of the addition operation, which in turn is sequenced before the assignment side effect. In other words, it yields a well-defined order and final value for this expression.
The logic is somewhat similar for:
++ptr = ptr1;
The value computations of the LHS and RHS are sequenced before the assignment side-effect.
The RHS is an lvalue, so its value computation involves computing the address of ptr1.
In order to value-compute the LHS (++ptr), it is necessary to first value-compute the lvalue expression ++ptr and then do an lvalue-to-rvalue conversion on the result. This guarantees that the incrementation side-effect is sequenced before the assignment side effect. In other words, it yields a well-defined order and final value for this expression.
Note
The OP said:
Yes, I understand it's messy and you're dealing with unallocated
memory, but it works and compiles.
Pointers to non-array objects are considered arrays of size one for additive operators, I am going to quote the draft C++ standard but C11 has almost the exact same text. From section 5.7 Additive operators:
For the purposes of these operators, a pointer to a nonarray object
behaves the same as a pointer to the first element of an array of
length one with the type of the object as its element type.
and further tells us pointing one past the end of an array is valid as long as you don't dereference the pointer:
[...]If both the pointer operand and the result point to elements of
the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.
so:
++ptr ;
is still a valid pointer.