"is not required" == undefined behavior? - c++

My question is mainly about terminology and how to interpret the standard.
[expr.rel]#4:
The result of comparing unequal pointers to objects is defined in terms of a partial order consistent with the following rules:
(4.1) If two pointers point to different elements of the same array,
or to subobjects thereof, the pointer to the element with the higher
subscript is required to compare greater.
(4.2) If two pointers point
to different non-static data members of the same object, or to
subobjects of such members, recursively, the pointer to the later
declared member is required to compare greater provided the two
members have the same access control ([class.access]), neither member
is a subobject of zero size, and their class is not a union.
(4.3) Otherwise, neither pointer is required to compare greater than the
other.
I am little confused as how to interpret (4.3). Does that mean that this
#include <iostream>
int main() {
int x;
int y;
std::cout << (&x < &y);
std::cout << (&x < &y);
}
is...
valid C++ code and the output is either 11 or 00.
invalid code, because it has undefined behaivour
?
In other words, I know that (4.3) does apply here, but I am not sure about the implications. When the standard says "it can be either A or B" is this the same as saying "it is undefined" ?

The wording has changed in various editions of the C++ standard, and in the recent draft cited in the question. (See my comments on the question for the gory details.)
C++11 says:
Other pointer comparisons are unspecified.
C++17 says:
Otherwise, neither pointer compares greater than the other.
The latest draft, cited in the question, says:
Otherwise, neither pointer is required to compare greater than the other.
That change was made in response to an issue saying ""compares greater" term is needlessly confusing".
If you look at the surrounding context in the draft standard, it's clear that in the remaining cases the result is unspecified. Quoting from [expr.rel] (text in italics is my summary):
The result of comparing unequal pointers to objects is defined in
terms of a partial order consistent with the following rules:
[pointers to elements of the same array]
[pointers to members of the same object]
[remaining cases] Otherwise, neither pointer is required to compare greater than the other.
If two operands p and q compare equal, p<=q and
p>=q both yield true and p<q and p>q both yield
false. Otherwise, if a pointer p compares greater than a pointer q, p>=q, p>q, q<=p, and q<p all
yield true and p<=q, p<q, q>=p, and q>p
all yield false. Otherwise, the result of each of the operators
is unspecified.
So the result of the < operator in such cases is unspecified, but it does not have undefined behavior. It can be either true or false, but I don't believe it's required to be consistent. The program's output could be any of 00, 01, 10, or 11.

For the provided code, this case applies:
(4.3) Otherwise, neither pointer is required to compare greater than the other.
There is no mention of UB, and so a strict reading of "neither is required" suggests that the result of the comparison could be different every time it's evaluated.
This means the program could validly output any of the following results:
00
01
10
11

valid C++ code
Yes.
Nowhere does the standard say that this is UB or ill-formed, and neither is this case lacking a rule describing the behaviour because the quoted 4.3 applies.
and the output is either 11 or 00
I'm not sure that 10 or 01 are technically guaranteed to not be output 1.
Given that neither pointer is required to compare greater than the other, the result of the comparison can be either true of false. There appears to not be an explicit requirement for the result to be the same for each invocation on same operands in this case.
1 But I consider this unlikely in practice. I also think that leaving such possibility open is not intentional. Rather, the intention is to allow for deterministic, but not necessarily total order.
P.S.
auto comp = std::less<>;
std::cout << comp(&x, &y);
std::cout << comp(&x, &y);
would be guaranteed to be either 11 or 00 because std::less (like its friends) is guaranteed to impose a strict total order for pointers.

x and y are not part of the same array, per (4.1). And they are not members of the same object, per (4.2). So, you fall into (4.3), which means if you try to compare them to each other, the result of the comparison is indeterminate, it could be true or false. If it were undefined behavior instead, the standard would likely state that explicitly.

Related

Do all transient allocations have unique address?

While reading comments of a C++ Weekly video about the constexpr new support in C++20 I found the comment that alleges that C++20 allows UB in constexpr context.
At first I was convinced that comment is right, but more I thought about it more and more I began to suspect that C++20 wording contains some clever language that makes this defined behavior.
Either that all transient allocations return unique addresses or maybe some more general notion in C++ that makes 2 distinct allocation pointers always(even in nonconstexpr context) compare false even if at runtime in reality it is possible that allocator would give you back same address(since you deleted the first allocation).
As a bonus weirdness: you can only use == for comparison, <, > fail...
Here is the program with alleged UB in constexpr:
#include <iostream>
static constexpr bool f()
{
auto p = new int(1);
delete p;
auto q = new int(2);
delete q;
return p == q;
}
int main()
{
constexpr bool res1 = f();
std::cout << res1 << std::endl; // May output 0 or 1
}
godbolt
The result here is implementation-defined. res1 could be false, true, or ill-formed, based on how the implementation wants to define it. And this is just as true for equality comparison as it is for relational comparison.
Both [expr.eq] (for equality) and [expr.rel] (for relational) start by doing an lvalue-to-rvalue conversion on the pointers (because we have to actually read what the value is to do a comparison). [conv.lval]/3 says that the result of that conversion is:
Otherwise, if the object to which the glvalue refers contains an invalid pointer value ([basic.stc.dynamic.deallocation], [basic.stc.dynamic.safety]), the behavior is implementation-defined.
That is the case here: both pointers contain an invalid pointer value, as per [basic.stc.general]/4:
When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of that region of storage become invalid pointer values. Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.
with a footnote reading:
Some implementations might define that copying an invalid pointer value causes a system-generated runtime fault.
So the value we get out of the lvalue-to-rvalue conversion is... implementation-defined. It could be implementation-defined in a way that causes those two pointers to compare equal. It could be implementation-defined in a way that causes those two pointers to compare not equal (as apparently all implementations do). Or it could even be implementation-defined in a way that causes the comparison between those two pointers to be unspecified or undefined behavior.
Notably, [expr.const]/5 (the main rule governing constant expressions), despite rejecting undefined behavior and explicitly rejecting any comparison whose result is unspecified ([expr.const]/5.23), says nothing about a comparison whose result is implementation-defined.
There's no undefined behavior here. Anything goes. Which is admittedly very weird during constant evaluation, where we'd expect to see a stricter set of rules.
Notably, with p < q, it appears that gcc and clang reject the comparison as being not a constant expression (which is... an allowed result) while msvc considers both p < q and p > q to be constant expressions whose value is false (which is... also an allowed result).

Why does C++11 contain an odd clause about comparing void pointers?

While checking the references for another question, I noticed an odd clause in C++11, at [expr.rel] ¶3:
Pointers to void (after pointer conversions) can be compared, with a result defined as follows: If both
pointers represent the same address or are both the null pointer value, the result is true if the operator is
<= or >= and false otherwise; otherwise the result is unspecified.
This seems to mean that, once two pointers have been casted to void *, their ordering relation is no longer guaranteed; for example, this:
int foo[] = {1, 2, 3, 4, 5};
void *a = &foo[0];
void *b = &foo[1];
std::cout<<(a < b);
would seem to be unspecified.
Interestingly, this clause wasn't there in C++03 and disappeared in C++14, so if we take the example above and apply the C++14 wording to it, I'd say that ¶3.1
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
would apply, as a and b point to elements of the same array, even though they have been casted to void *. Notice that the wording of ¶3.1 was there pretty much the same in C++11, but seemed to be overridden by the void * clause.
Am I right in my understanding? What was the point of that oddball clause added in C++11 and immediately removed? Or maybe it's still there, but moved to/implied by some other part of the standard?
TL;DR:
in C++98/03 the clause was not present, and the standard did not specify relational operators for void pointers (core issue 879, see end of this post);
the odd clause about comparing void pointers was added in C++11 to resolve it, but this in turn gave rise to two other core issues 583 & 1512 (see below);
the resolution of these issues required the clause to be removed and be replaced with the wording found in C++14 standard, which allows for "normal" void * comparison.
Core Issue 583: Relational pointer comparisons against the null pointer constant
Relational pointer comparisons against the null pointer constant Section: 8.9 [expr.rel]
In C, this is ill-formed (cf C99 6.5.8):
void f(char* s) {
if (s < 0) { }
} ...but in C++, it's not. Why? Who would ever need to write (s > 0) when they could just as well write (s != 0)?
This has been in the language since the ARM (and possibly earlier);
apparently it's because the pointer conversions (7.11 [conv.ptr]) need
to be performed on both operands whenever one of the operands is of
pointer type. So it looks like the "null-ptr-to-real-pointer-type"
conversion is hitching a ride with the other pointer conversions.
Proposed resolution (April, 2013):
This issue is resolved by the resolution of issue 1512.
Core Issue 1512: Pointer comparison vs qualification conversions
Pointer comparison vs qualification conversions Section: 8.9 [expr.rel]
According to 8.9 [expr.rel] paragraph 2, describing pointer
comparisons,
Pointer conversions (7.11 [conv.ptr]) and qualification conversions
(7.5 [conv.qual]) are performed on pointer operands (or on a pointer
operand and a null pointer constant, or on two null pointer constants,
at least one of which is non-integral) to bring them to their
composite pointer type. This would appear to make the following
example ill-formed,
bool foo(int** x, const int** y) {
return x < y; // valid ? } because int** cannot be converted to const int**, according to the rules of 7.5 [conv.qual] paragraph 4.
This seems too strict for pointer comparison, and current
implementations accept the example.
Proposed resolution (November, 2012):
Relevant excerpts from resolution of the above issues are found in the paper: Pointer comparison vs qualification conversions (revision 3).
The following also resolves core issue 583.
Change in 5.9 expr.rel paragraphs 1 to 5:
In this section the following statement (the odd clause in C++11) has been expunged:
Pointers to void (after pointer conversions) can be compared, with a result defined as follows: If both pointers represent the same address or are both the null pointer value, the result is true if the operator is <= or >= and false otherwise; otherwise the result is unspecified
And the following statements have been added:
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the array, the latter pointer compares greater.
So in the final working draft of C++14 (n4140) section [expr.rel]/3, the above statements are found as they were stated at the time of the resolution.
Digging for the reason why this odd clause was added led me to a much earlier issue 879: Missing built-in comparison operators for pointer types.
The proposed resolution of this issue (in July, 2009) led to the addition of this clause which was voted into WP in October, 2009.
And that is how it came to be included in the C++11 standard.

Pointer arithmetic in C++ giving wrong output [duplicate]

Recently in an interview there was a following objective type question.
int a = 0;
cout << a++ << a;
Answers:
a. 10
b. 01
c. undefined behavior
I answered choice b, i.e. output would be "01".
But to my surprise later I was told by an interviewer that the correct answer is option c: undefined.
Now, I do know the concept of sequence points in C++. The behavior is undefined for the following statement:
int i = 0;
i += i++ + i++;
but as per my understanding for the statement cout << a++ << a , the ostream.operator<<() would be called twice, first with ostream.operator<<(a++) and later ostream.operator<<(a).
I also checked the result on VS2010 compiler and its output is also '01'.
You can think of:
cout << a++ << a;
As:
std::operator<<(std::operator<<(std::cout, a++), a);
C++ guarantees that all side effects of previous evaluations will have been performed at sequence points. There are no sequence points in between function arguments evaluation which means that argument a can be evaluated before argument std::operator<<(std::cout, a++) or after. So the result of the above is undefined.
C++17 update
In C++17 the rules have been updated. In particular:
In a shift operator expression E1<<E2 and E1>>E2, every value computation and side-effect of E1 is sequenced before every value computation and side effect of E2.
Which means that it requires the code to produce result b, which outputs 01.
See P0145R3 Refining Expression Evaluation Order for Idiomatic C++ for more details.
Technically, overall this is Undefined Behavior.
But, there are two important aspects to the answer.
The code statement:
std::cout << a++ << a;
is evaluated as:
std::operator<<(std::operator<<(std::cout, a++), a);
The standard does not define the order of evaluation of arguments to an function.
So Either:
std::operator<<(std::cout, a++) is evaluated first or
ais evaluated first or
it might be any implementation defined order.
This order is Unspecified[Ref 1] as per the standard.
[Ref 1]C++03 5.2.2 Function call
Para 8
The order of evaluation of arguments is unspecified. All side effects of argument expression evaluations take effect before the function is entered. The order of evaluation of the postfix expression and the argument expression list is unspecified.
Further, there is no sequence point between evaluation of arguments to a function but a sequence point exists only after evaluation of all arguments[Ref 2].
[Ref 2]C++03 1.9 Program execution [intro.execution]:
Para 17:
When calling a function (whether or not the function is inline), there is a sequence point after the evaluation of all function arguments (if any) which takes place before execution of any expressions or statements in the function body.
Note that, here the value of c is being accessed more than once without an intervening sequence point, regarding this the standard says:
[Ref 3]C++03 5 Expressions [expr]:
Para 4:
....
Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored. The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full
expression; otherwise the behavior is undefined.
The code modifies c more than once without intervening sequence point and it is not being accessed to determine the value of the stored object. This is clear violation of the above clause and hence the result as mandated by the standard is Undefined Behavior[Ref 3].
Sequence points only define a partial ordering. In your case, you have
(once overload resolution is done):
std::cout.operator<<( a++ ).operator<<( a );
There is a sequence point between the a++ and the first call to
std::ostream::operator<<, and there is a sequence point between the
second a and the second call to std::ostream::operator<<, but there
is no sequence point between a++ and a; the only ordering
constraints are that a++ be fully evaluated (including side effects)
before the first call to operator<<, and that the second a be fully
evaluated before the second call to operator<<. (There are also
causual ordering constraints: the second call to operator<< cannot
preced the first, since it requires the results of the first as an
argument.) §5/4 (C++03) states:
Except where noted, the order of
evaluation of operands of individual operators and subexpressions of
individual expressions, and the order in which side effects take place,
is unspecified. Between the previous and next sequence point a scalar
object shall have its stored value modified at most once by the
evaluation of an expression. Furthermore, the prior value shall be
accessed only to determine the value to be stored. The requirements of
this paragraph shall be met for each allowable ordering of the
subexpressions of a full expression; otherwise the behavior is
undefined.
One of the allowable orderings of your expression is a++, a, first
call to operator<<, second call to operator<<; this modifies the
stored value of a (a++), and accesses it other than to determine
the new value (the second a), the behavior is undefined.
The correct answer is to question the question. The statement is unacceptable because a reader cannot see a clear answer. Another way to look at it is that we have introduced side-effects (c++) that make the statement much harder to interpret. Concise code is great, providing it's meaning is clear.

Is value of x*f(x) unspecified if f modifies x?

I've looked at a bunch of questions regarding sequence points, and haven't been able to figure out if the order of evaluation for x*f(x) is guaranteed if f modifies x, and is this different for f(x)*x.
Consider this code:
#include <iostream>
int fx(int &x) {
x = x + 1;
return x;
}
int f1(int &x) {
return fx(x)*x; // Line A
}
int f2(int &x) {
return x*fx(x); // Line B
}
int main(void) {
int a = 6, b = 6;
std::cout << f1(a) << " " << f2(b) << std::endl;
}
This prints 49 42 on g++ 4.8.4 (Ubuntu 14.04).
I'm wondering whether this is guaranteed behavior or unspecified.
Specifically, in this program, fx gets called twice, with x=6 both times, and returns 7 both times. The difference is that Line A computes 7*7 (taking the value of x after fx returns) while Line B computes 6*7 (taking the value of x before fx returns).
Is this guaranteed behavior? If yes, what part of the standard specifies this?
Also: If I change all the functions to use int *x instead of int &x and make corresponding changes to places they're called from, I get C code which has the same issues. Is the answer any different for C?
In terms of evaluation sequence, it is easier to think of x*f(x) as if it was:
operator*(x, f(x));
so that there are no mathematical preconceptions on how multiplication is supposed to work.
As #dan04 helpfully pointed out, the standard says:
Section 1.9.15: “Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced.”
This means that the compiler is free to evaluate these arguments in any order, the sequence point being operator* call. The only guarantee is that before the operator* is called, both arguments have to be evaluated.
In your example, conceptually, you could be certain that at least one of the arguments will be 7, but you cannot be certain that both of them will. To me, this would be enough to label this behaviour as undefined; however, #user2079303 answer explains well why it is not technically the case.
Regardless of whether the behaviour is undefined or indeterminate, you cannot use such an expression in a well-behaved program.
The evaluation order of arguments is not specified by the standard, so the behaviour that you see is not guaranteed.
Since you mention sequence points, I'll consider the c++03 standard which uses that term while the later standards have changed wording and abandoned the term.
ISO/IEC 14882:2003(E) §5 /4:
Except where noted, the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified...
There is also discussion on whether this is undefined behaviour or is the order merely unspecified. The rest of that paragraph sheds some light (or doubt) on that.
ISO/IEC 14882:2003(E) §5 /4:
... Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored. The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full expression; otherwise the behavior is undefined.
x is indeed modified in f and it's value is read as an operand in the same expression where f is called. And it's not specified whether x reads the modified or non-modified value. That might scream Undefined Behaviour! to you, but hold your horses, because the standard also states:
ISO/IEC 14882:2003(E) §1.9 /17:
... When calling a function (whether or not the function is inline), there is a sequence point after the evaluation of all function arguments (if any) which takes place before execution of any expressions or statements in the function body. There is also a sequence point after the copying of a returned value and before the execution of any expressions outside the function 11) ...
So, if f(x) is evaluated first, then there is a sequence point after copying the returned value. So the above rule about UB does not apply because the read of x is not between the next and previous sequence point. The x operand will have the modified value.
If x is evaluated first, then there is a sequence point after evaluating the arguments of f(x) Again, the rule about UB does not apply. In this case x operand will have the non-modified value.
In summary, the order is unspecified but there is no undefined behaviour. It's a bug, but the outcome is predictable to some degree. The behaviour is the same in the later standards, even though the wording changed. I'll not delve into those since it's already covered well in other good answers.
Since you ask about similar situation in C
C89 (draft) 3.3/3:
Except as indicated by the syntax 27 or otherwise specified later (for the function-call operator () , && , || , ?: , and comma operators), the order of evaluation of subexpressions and the order in which side effects take place are both unspecified.
The function call exception is already mentioned here. Following is the paragraph that implies the undefined behaviour if there were no sequence points:
C89 (draft) 3.3/2:
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored.26
And here are the sequence points defined:
C89 (draft) A.2
The following are the sequence points described in 2.1.2.3
The call to a function, after the arguments have been evaluated (3.3.2.2).
...
... the expression in a return statement (3.6.6.4).
The conclusions are the same as in C++.
A quick note on something I don't see covered explicitly by the other answers:
if the order of evaluation for x*f(x) is guaranteed if f modifies x, and is this different for f(x)*x.
Consider, as in Maksim's answer
operator*(x, f(x));
now there are only two ways of evaluating both arguments before the call as required:
auto lhs = x; // or auto rhs = f(x);
auto rhs = f(x); // or auto lhs = x;
return lhs * rhs
So, when you ask
I'm wondering whether this is guaranteed behavior or unspecified.
the standard doesn't specify which of those two behaviours the compiler must choose, but it does specify those are the only valid behaviours.
So, it's neither guaranteed nor entirely unspecified.
Oh, and:
I've looked at a bunch of questions regarding sequence points, and haven't been able to figure out if the order of evaluation ...
sequence points are a used in the C language standard's treatment of this, but not in the C++ standard.
In the expression x * y, the terms x and y are unsequenced. This is one of the three possible sequencing relations, which are:
A sequenced-before B: A must be evaluated, with all side-effects complete, before B begins evaluationg
A and B indeterminately-sequenced: one of the two following cases is true: A is sequenced-before B, or B is sequenced-before A. It is unspecified which of those two cases holds.
A and B unsequenced: There is no sequencing relation defined between A and B.
It is important to note that these are pair-wise relations. We cannot say "x is unsequenced". We can only say that two operations are unsequenced with respect to each other.
Also important is that these relations are transitive; and the latter two relations are symmetric.
unspecified is a technical term which means that the Standard specifies a set number of possible results. This is different to undefined behaviour which means that the Standard does not cover the behaviour at all. See here for further reading.
Moving onto the code x * f(x). This is identical to f(x) * x, because as discussed above, x and f(x) are unsequenced, with respect to each other, in both cases.
Now we come to the point where several people seem to be coming unstuck. Evaluating the expression f(x) is unsequenced with respect to x. However, it does not follow that any statements inside the function body of f are also unsequenced with respect to x. In fact, there are sequencing relations surrounding any function call, and those relations cannot be ignored.
Here is the text from C++14:
When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. [Note: Value computations and side effects associated with different argument expressions are unsequenced. —end note ]
Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with
respect to the execution of the called function.
with footnote:
In other words, function executions do not interleave with each other.
The bolded text clearly states that for the two expressions:
A: x = x + 1; inside f(x)
B: evaluating the first x in the expression x * f(x)
their relationship is: indeterminately sequenced.
The text regarding undefined behaviour and sequencing is:
If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, and they are not potentially concurrent (1.10), the behavior is undefined.
In this case, the relation is indeterminately sequenced, not unsequenced. So there is no undefined behaviour.
The result is instead unspecified according to whether x is sequenced before x = x + 1 or the other way around. So there are only two possible outcomes, 42 and 49.
In case anyone had qualms about the x in f(x), the following text applies:
When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function.
So the evaluation of that x is sequenced before x = x + 1. This is an example of an evlauation that falls under the case of "specifically sequenced before" in the bolded quote above.
Footnote: the behaviour was exactly the same in C++03, but the terminology was different. In C++03 we say that there is a sequence point upon entry and exit of every function call, therefore the write to x inside the function is separated from the read of x outside the function by at least one sequence point.
You need to distinguish:
a) Operator precedence and associativity, which controls the order in which the values of subexpressions are combined by their operators.
b) The sequence of subexpression evaluation. E.g. in the expression f(x)/g(x), the compiler can evaluate g(x) first and f(x) afterwards. Nonetheless, the resulting value must be computed by dividing respective sub-values in the right order, of course.
c) The sequence of side-effects of the subexpressions. Roughly speaking, for example, the compiler might, for sake of optimization, decide to write values to the affected variables only at the end of the expression or any other suitable place.
As a very rough approximation, you can say, that within a single expression, the order of evaluation (not associativity etc.) is more or less unspecified. If you need a specific order of evaluation, break down the expression into series of statements like this:
int a = f(x);
int b = g(x);
return a/b;
instead of
return f(x)/g(x);
For exact rules, see http://en.cppreference.com/w/cpp/language/eval_order
Order of evaluation of the operands of almost all C++ operators is
unspecified. The compiler can evaluate operands in any order, and may
choose another order when the same expression is evaluated again
As the order of evaluation is not always the same hence you may get unexpected results.
Order of evaluation

Strange std::map behaviour

The following test program
#include <map>
#include <iostream>
using namespace std;
int main(int argc, char **argv)
{
map<int,int> a;
a[1]=a.size();
for(map<int,int>::const_iterator it=a.begin(); it!=a.end(); ++it)
cout << "first " << (*it).first << " second " << (*it).second << endl;
}
leads to different output when compiled on g++ 4.8.1 (Ubuntu 12.04 LTS):
g++ xxx.cpp
./a.out
first 1 second 1
and on Visual Studio 2012 (Windows 7) (Standard Win32 Console Application Project):
ConsoleApplication1.exe
first 1 second 0
Which compiler is right? Am I doing something wrong?
This is actually a well-formed program that has two equally valid execution paths, so both compilers are right.
a[1] = a.size()
In this expression, the evaluation of the two operands of = are unsequenced.
§1.9/15 [intro.execution] Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced.
However, function calls are not interleaved, so the calls to operator[] and size are actually indeterminately sequenced, rather than unsequenced.
§1.9/15 [intro.execution] Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or after the execution of the body of the called function is indeterminately sequenced with respect to the execution of the called function.
This means that the function calls may happen in one of two orders:
operator[] then size
size then operator[]
If a key doesn't exist and you call operator[] with that key, it will be added to the map, thereby changing the size of the map. So in the first case, the key will be added, the size will be retrieved (which is 1 now), and 1 will be assigned to that key. In the second case, the size will be retrieved (which is 0), the key will be added, and 0 will be assigned to that key.
Note, this is not a situation that brings about undefined behaviour. Undefined behaviour occurs when two modifications or a modification and a read of the same scalar object are unsequenced.
§1.9/15 [intro.execution] If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
In this situation, they are not unsequenced but indeterminately sequenced.
So what we do have is two equally valid orderings of the execution of the program. Either could happen and both give valid output. This is unspecified behaviour.
§1.3.25 [defns.unspecified]
unspecified behavior
behavior, for a well-formed program construct and correct data, that depends on the implementation
So to answer your questions:
Which compiler is right?
Both of them are.
Am I doing something wrong?
Probably. It's unlikely that you would want to write code that has two execution paths like this. Unspecified behaviour can be okay, unlike undefined behaviour, because it can be resolved to a single observable output, but it's not worth having in the first place if you can avoid it. Instead, don't write code that has this kind of ambiguity. Depending on what exactly you want correct path to be, you can do either of the following:
auto size = a.size();
a[1] = size; // value is 0
Or:
a[1];
a[1] = a.size(); // value is 1
If you want the result to be 1 and you know the key doesn't yet exist, you could of course do the first code but assign size + 1.
In this case, where a[1] returns a primitive type, please refer to this answer. In the case in which the std::map's value type is an user defined type and operator=(T, std::size_t) is defined for that type, the expression:
a[1] = a.size();
can be converted to the corresponding less-syntactic-sugar version:
a[1] = a.size();
a.operator[](1) = a.size();
operator=(a.operator[](1), a.size());
And, as we all know from the §8.3.6/9:
The order of evaluation of function arguments is unspecified.
which leads to the fact that the result of the above expression is unspecified.
We have, of course, two cases:
If the a.operator[](1) is evaluated first, the size of the map is incremented by 1 leading to the first output (first 1 second 1).
If the a.size() is evaluated first, the output you'll get is the second one (first 1 second 0).
This is known as a sequence-point issue which means certain operations may be performed in any order chosen by the compiler.
If one has side-effects on the other, it is called "unspecified behaviour" a bit like "undefined behaviour" however where the result must be one of a fixed subset of outcomes, so here it must be either 0 or 1 and can't be any other value. In reality you should usually avoid doing it.
In your particular case. performing operator [] on a map changes its size (if that element does not yet exist). Thus it has a side effect on the right hand side of what it is assigning to it.