Initialization by null pointer constant: which behaviour is correct? - c++

int main() {
const int x = 0;
int* y = x; // line 3
int* z = x+x; // line 4
}
Quoth the standard (C++11 §4.10/1)
A null pointer constant is an integral constant expression (5.19) prvalue of integer type that evaluates to
zero or a prvalue of type std::nullptr_t. A null pointer constant can be converted to a pointer type; ...
There are four possibilities:
Line 4 is OK, but line 3 isn't. This is because x and x+x are both constant expressions that evaluate to 0, but only x+x is a prvalue. It appears that gcc takes this interpretation (live demo)
Lines 3 and 4 are both OK. Although x is an lvalue, the lvalue-to-rvalue conversion is applied, giving a prvalue constant expression equal to 0. The clang on my system (clang-3.0) accepts both lines 3 and 4.
Lines 3 and 4 are both not OK. clang-3.4 errors on both lines (live demo).
Line 3 is OK, but line 4 isn't. (Included for the sake of completeness even though no compiler I tried exhibits this behaviour.)
Who is right? Does it depend on which version of the standard we are considering?

The wording in the standard changed as a result of DR 903. The new wording is
A null pointer constant is an integer literal (2.14.2) with value zero or a prvalue of type std::nullptr_t.
Issue 903 involves a curious corner case where it is impossible to produce the "correct" overload resolution in certain cases where a template parameter is a (possibly 0) integer constant.
Apparently a number of possible resolutions were considered, but
There was a strong consensus among the CWG that only the literal 0 should be considered a null pointer constant, not any arbitrary zero-valued constant expression as is currently specified.
So, yes, it depends on whether the compiler has implemented the resolution to DR 903 or not.

Related

Is member access on a null pointer defined in C++?

Is address computation on a null pointer defined behavior in C++? Here's a simple example program.
struct A { int x; };
int main() {
A* p = nullptr;
&(p->x); // is this undefined behavior?
return 0;
}
Thanks.
EDIT Subscripting is covered in this other question.
&(p->x); // is this undefined behavior?
Standard is a bit vague regarding this:
[expr.ref] ... The expression E1->E2 is converted to the equivalent form (*(E1)).E2;
[expr.unary.op] The unary * operator ... the result is an lvalue referring to the object ... to which the expression points.
There is no explicit mention of UB in the section. The quoted rule does appear to conflict with the fact that the null pointer doesn't point to any object. This could be interpreted that yes, behaviour is undefined.
[expr.unary.op] The result of the unary & operator is a pointer to its operand. ... if the operand is an lvalue of type T, the resulting expression is a prvalue of type “pointer to T” whose result is a pointer to the designated object ([intro.memory]).
Again, no designated object exists. Note that at no point is the operand lvalue converted to an rvalue, which would definitely have been UB.
Back in 2000 there was CWG issue to clarify whether indirection through null is undefined. The proposed resolution (2004), that would clarify that indirection through null is not UB, appears to not have been added to the standard so far.
However whether it is or isn't UB doesn't matter much since you don't need to do this. At the very least, the resulting pointer will be invalid and thus useless.
If you were planning to convert the pointer to an integer to get the offset of the member, there is no need to do this because you can instead us the offsetof macro from the standard library, which doesn't have UB.
&(p[1]); // undefined?
Here, behaviour is quite clearly undefined:
[expr.sub] ... The expression E1[E2] is identical (by definition) to *((E1)+(E2)), except that in the case of an array operand, the result is an lvalue if that operand is an lvalue and an xvalue otherwise.
[expr.add] When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0 (does not apply)
Otherwise, if P points to an array element (does not apply)
Otherwise, the behavior is undefined.
&(p[0]); // undefined?
As per previous rules, the first option applies:
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
And now we are back to the question of whether indirection through this null is UB. See the beginning of the answer.
Still, doesn't really matter. There is no need to write this, since this is simply unnecessarily complicated way to write sizeof(int) * i (with i being 1 and 0 respectively).

int numeral -> pointer conversion rules

Consider the following code.
void f(double p) {}
void f(double* p) {}
int main()
{ f(1-1); return 0; }
MSVC 2017 doesn't compile that. It figures there is an ambiguous overloaded call, as 1-1 is the same as 0 and therefore can be converted into double*. Other tricks, like 0x0, 0L, or static_cast<int>(0), do not work either. Even declaring a const int Zero = 0 and calling f(Zero) produces the same error. It only works properly if Zero is not const.
It looks like the same issue applies to GCC 5 and below, but not GCC 6. I am curious if this is a part of C++ standard, a known MSVC bug, or a setting in the compiler. A cursory Google did not yield results.
MSVC considers 1-1 to be a null pointer constant. This was correct by the standard for C++03, where all integral constant expressions with value 0 were null pointer constants, but it was changed so that only zero integer literals are null pointer constants for C++11 with CWG issue 903. This is a breaking change, as you can see in your example and as is also documented in the standard, see [diff.cpp03.conv] of the C++14 standard (draft N4140).
MSVC applies this change only in conformance mode. So your code will compile with the /permissive- flag, but I think the change was implemented only in MSVC 2019, see here.
In the case of GCC, GCC 5 defaults to C++98 mode, while GCC 6 and later default to C++14 mode, which is why the change in behavior seems to depend on the GCC version.
If you call f with a null pointer constant as argument, then the call is ambiguous, because the null pointer constant can be converted to a null pointer value of any pointer type and this conversion has same rank as the conversion of int (or any integral type) to double.
The compiler works correctly, in accordance to [over.match] and [conv], more specifically [conv.fpint] and [conv.ptr].
A standard conversion sequence is [blah blah] Zero or one [...] floating-integral conversions, pointer conversions, [...].
and
A prvalue of an integer type or of an unscoped enumeration type can be converted to a prvalue of a floating-point type. The result is exact if possible [blah blah]
and
A null pointer constant is an integer literal with value zero or [...]. A null pointer constant can be converted to a pointer type; the result is the null pointer value of that type [blah blah]
Now, overload resolution is to choose the best match among all candidate functions (which, as a fun feature, need not even be accessible at the call location!). The best match is the one with exact parameters or, alternatively, the fewest possible conversions. Zero or one standard conversions may happen (... for every parameter), and zero is "better" than one.
(1-1) is an integer literal with value 0.
You can convert the zero integer literal to each of either double or double* (or nullptr_t), with exactly one conversion. So, assuming that more than one of these functions is declared (as is the case in the example), there exists more than a single candidate, and all candidates are equally good, there exists no best match. It's ambiguous, and the compiler is right about complaining.

MSVC function matching with const enum value 0

I was bitten by an unintended C++ function match by MSVC. I can reduce it to the following test case:
#include <iostream>
enum Code { aaa, bbb };
struct MyVal {
Code c;
MyVal(Code c): c(c) { }
};
void test(int i, MyVal val) {
std::cout << "case " << i << ": value " << val.c << std::endl;
}
void test(int i, double* f) {
std::cout << "case " << i << ": WRONG" << std::endl;
}
const Code v1 = aaa;
Code v2 = aaa;
const Code v3 = bbb;
int main() {
const Code w1 = aaa;
Code w2 = aaa;
const Code w3 = bbb;
test(1, v1); // unexpected MSVC WRONG
test(2, v2);
test(3, v3);
test(4, aaa);
test(5, w1); // unexpected MSVC WRONG
test(6, w2);
test(7, w3);
return 0;
}
I expected that all 7 invocations of test would match the first overload, and GCC (live example) and Clang (live example) match this as intended:
case 1: value 0
case 2: value 0
case 3: value 1
case 4: value 0
case 5: value 0
case 6: value 0
case 7: value 1
But MSVC (live example) matches cases 1 and 5 to the "wrong" overload (I found this behavior in MSVC 2013 and 2015):
case 1: WRONG
case 2: value 0
case 3: value 1
case 4: value 0
case 5: WRONG
case 6: value 0
case 7: value 1
It seems that the conversion to a pointer is preferred by MSVC for a const enum variable with (accidental) value 0. I would have expected this behavior with a literal 0, but not with an enum variable.
My questions: Is the MSVC behavior standard-conformant? (Perhaps for an older version of C++?) If not, is this a known extension or bug?
You don't name any standards, but let's see what the differences are:
[C++11: 4.10/1]: A null pointer constant is an integral constant expression (5.19) prvalue of integer type that evaluates to zero or a prvalue of type std::nullptr_t. A null pointer constant can be converted to a pointer type; the result is the null pointer value of that type and is distinguishable from every other value of object pointer or function pointer type. Such a conversion is called a null pointer conversion. Two null pointer values of the same type shall compare equal. The conversion of a null pointer constant to a pointer to cv-qualified type is a single conversion, and not the sequence of a pointer conversion followed by a qualification. [..]
[C++11: 5.19/3]: A literal constant expression is a prvalue core constant expression of literal type, but not pointer type. An integral constant expression is a literal constant expression of integral or unscoped enumeration type. [..]
And:
[C++03: 4.10/1]: A null pointer constant is an integral constant expression (5.19) rvalue of integer type that evaluates to zero. A null pointer constant can be converted to a pointer type; the result is the null pointer value of that type and is distinguishable from every other value of pointer to object or pointer to function type. Two null pointer values of the same type shall compare equal. The conversion of a null pointer constant to a pointer to cv-qualified type is a single conversion, and not the sequence of a pointer conversion followed by a qualification conversion (4.4).
[C++03: 5.19/2]: Other expressions are considered constant-expressions only for the purpose of non-local static object initialization (3.6.2). Such constant expressions shall evaluate to one of the following:
a null pointer value (4.10),
a null member pointer value (4.11),
an arithmetic constant expression,
an address constant expression,
a reference constant expression,
an address constant expression for a complete object type, plus or minus an integral constant expression, or
a pointer to member constant expression.
The key here is that the standard language changed between C++03 and C++11, with the latter introducing the requirement that a null pointer constant of this form be a literal.
(They always needed to actually be constants and evaluate to 0, so you can remove v2, v3, w2 and w3 from your testcase.)
A null pointer constant can convert to a double* more easily than going through your user-defined conversion, so…
I believe MSVS is implementing the C++03 rules.
Amusingly, though, if I put GCC in C++03 mode, its behaviour isn't changed, which is technically non-compliant. I suspect the change in the language stemmed from the behaviour of common implementations at the time, rather than the other way around. I can see some evidence that GCC was [allegedly] non-conforming in this regard as early as 2004, so it may also just be that the standard wording change fortuitously un-bugged what had been a GCC bug.

C++11 backwards compatibility (conversion of null integer constant to pointer)

The C++ standard allows the implicit conversion of zero integer constant to pointer of any type.
The following code is invalid, because the value v is not constant here:
float* foo()
{
int v = 0;
return v; // Error
}
But the following code is correct:
float* foo()
{
const int v = 0;
return v; // Ok in C++98 mode, error in C++11 mode
}
The question is: why gcc and clang (tried different versions) compile the code correctly in c++98/03 mode but return warning/error when compiled in c++11/14 mode (-std=c++11)? I tried to find the changes in C++11 working draft PDF, but got no success.
Intel compiler 16.0 and VS2015 compilers show no errors and warnings in both cases.
GCC and Clang behave differently with -std=c++11 because C++11 changed the definition of a null pointer constant, and then C++14 changed it again, see Core DR 903 which changed the rules in C++14 so that only literals are null pointer constants.
In C++03 4.10 [conv.ptr] said:
A null pointer constant is an integral constant expression (5.19) rvalue of integer type that evaluates to zero.
That allows all sorts of of expressions, as long as they are constant and evaluate to zero. Enumerations, false, (5 - 5) etc. etc. ... this used to cause lots of problems in C++03 code.
In C++11 it says:
A null pointer constant is an integral constant expression (5.19) prvalue of integer type that evaluates to zero or a prvalue of type std::nullptr_t.
And in C++14 it says:
A null pointer constant is an integer literal (2.14.2) with value zero or a prvalue of type std::nullptr_t.
This is a much more restrictive rule, and makes far more sense.

Why is it okay to compare a pointer with '\0'? (but not 'A')

I found a bug in my code where I compared the pointer with '\0'.
Wondering why the compiler didn't warn me about this bug I tried the following.
#include <cassert>
struct Foo
{
char bar[5];
};
int main()
{
Foo f;
Foo* p = &f;
p->bar[0] = '\0';
assert(p->bar == '\0'); // #1. I forgot [] Now, comparing pointer with NULL and fails.
assert(p->bar == 'A'); // #2. error: ISO C++ forbids comparison between pointer and integer
assert(p->bar[0] == '\0'); // #3. What I intended, PASSES
return 0;
}
What is special about '\0' which makes #1 legal and #2 illegal?
Please add a reference or quotation to your answer.
What makes it legal and well defined is the fact that '\0' is a null pointer constant so it can be converted to any pointer type to make a null pointer value.
ISO/IEC 14882:2011 4.10 [conv.ptr] / 1:
A null pointer constant is an integral constant expression prvalue of integer type that evaluates to zero or a prvalue of type std::nullptr_t. A null pointer constant can be converted to a pointer type; the result is the null pointer value of that type and is distinguishable from every other value of object pointer or function pointer type. Such a conversion is called a null pointer conversion.
'\0' meets the requirements of "integral constant expression prvalue of integer type that evaluates to zero" because char is an integer type and \0 has the value zero.
Other integers can only be explicitly converted to a pointer type via a reinterpret_cast and the result is only meaningful if the integer was the result of converting a valid pointer to an integer type of sufficient size.
'\0' is simply a different way of writing 0. I would guess that this is legal comparing pointers to 0 makes sense, no matter how you wrote the 0, while there is almost never any valid meaning to comparing a pointer to any other non-pointer type.
This is a design error of C++. The rule says that any integer constant expression with value zero can be considered as the null pointer constant.
This idiotic highly questionable decision allows to use as null pointer '\0' (as you found) but also things like (1==2) or even !!!!!!!!!!!1 (an example similar to one that is present on "The C++ programming language", no idea if Stroustrup thinks this is indeed a "cool" feature).
This ambiguity IMO even creates a loophole in the syntax definition when mixed with ternary operator semantic and implicit conversions rules: I remember finding a case in which out of three compilers one was not compiling and the other two were compiling with different semantic ... and after wasting a day on reading the standard and asking experts on c.c.l.c++.m I was not able to decide which of the three compilers was right.