The following flawed code for a null pointer check compiles with some compilers but not with others (see godbolt):
bool f()
{
char c;
return &c > nullptr;
}
The offensive part is the relational comparison between a pointer and nullptr.
The comparison compiles with
gcc before 11.1
MSVC 19.latest with /std:c++17, but not with /std:c++20.
But no version of clang (I checked down to 4.0) compiles it.
The error newer gccs produce ("ordered comparison of pointer with integer zero ('char*' and 'std::nullptr_t')"1 is a bit different than the one from clang ("invalid operands to binary expression ('char *' and 'std::nullptr_t')").
The C++20 ISO standard says about relational operators applied to pointers in 7.6.9/3ff:
If both operands are pointers, pointer conversions (7.3.12) [...] are performed to
bring them to their composite pointer type (7.2.2). After conversions, the operands shall have the same type.
The result of comparing unequal pointers to objects is defined in terms of a partial order consistent with the following rules:
(4.1) — If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript is required to compare greater.
(4.2) — If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member is required to compare greater provided the two members have the same access control (11.9), neither member is a subobject of zero size, and their class is not a union.
(4.3) — Otherwise, neither pointer is required to compare greater than the other.
("Pointer to object" in this context means only that the type is not a pointer to function, not that the pointer value refers to an actual object.)
(4.1) and (4.2) clearly don't apply here, which leaves (4.3). Does (4.3), "neither pointer is required", mean the behavior is undefined, and the code is invalid? By comparison, the 2012 standard contained the wording in 5.9/2 "[...] if only one of [two pointers of the same type p and q] is null, the results
of p<q, p>q, p<=q, and p>=q are unspecified."
1 The wording doesn't seem correct. nullptr_t is not, if I read the standard correctly, an integral type.
7.6.9 states, "The lvalue-to-rvalue ([conv.lval]), array-to-pointer ([conv.array]), and function-to-pointer ([conv.func]) standard conversions are performed on the operands. .... The converted operands shall have arithmetic, enumeration, or pointer type."
None of the specified conversions is applicable to the literal nullptr. Also, it does not have arithmetic, enumeration, or pointer type. Therefore, the comparison is ill-formed.
Related
After reading about the standard C11 version of Martin Uecker's ICE_P predicate, I tried to implement it in pure C++. The C11 version, making use of _Generic selection is as follows:
#define ICE_P(x) _Generic((1? (void *) ((x)*0) : (int *) 0), int*: 1, void*: 0)
The obvious approach for C++ is to replace _Generic by a template and decltype, such as:
template<typename T> struct is_ice_helper;
template<> struct is_ice_helper<void*> { enum { value = false }; };
template<> struct is_ice_helper<int*> { enum { value = true }; };
#define ICE_P(x) (is_ice_helper<decltype(1? (void *) ((x)*0) : (int *) 0)>::value)
However, it fails the simplest test. Why can't it detect integer constant expressions?
The issue is subtle. The specification for determining the composite type of the conditional expression's pointer operands are similar in C++ to the ones in C, so it starts off looking promising:
(N4659) [expr.cond]
7 Lvalue-to-rvalue, array-to-pointer, and function-to-pointer
standard conversions are performed on the second and third operands.
After those conversions, one of the following shall hold:
[...]
One or both of the second and third operands have pointer type; pointer conversions, function pointer conversions, and qualification
conversions are performed to bring them to their composite pointer
type (Clause [expr]). The result is of the composite pointer type.
[...]
The reduction to the composite pointer type is specified as follows:
(N4659) [expr]
5 The composite pointer type of two operands p1 and p2 having
types T1 and T2, respectively, where at least one is a pointer or
pointer to member type or std::nullptr_t, is:
if both p1 and p2 are null pointer constants, std::nullptr_t;
if either p1 or p2 is a null pointer constant, T2 or T1, respectively;
if T1 or T2 is “pointer to cv1 void” and the other type is “pointer to cv2 T”, where T is an object type or void, “pointer to cv12 void”,
where cv12 is the union of cv1 and cv2;
[...]
So the result of our ICE_P macro is determined by which of the bullets above we land one after checking each in order. Given how we defined is_ice_helper, we know that the composite type is not nullptr_t, otherwise we'd hit the first bullet, and will get an error due to the missing template specialization. So we must be hitting bullet number 3, making the predicate report false. It all seems to hinge on the definition of a null pointer constant.
(N4659) [conv.ptr] (emphasis mine)
1 A null pointer constant is an integer literal with value
zero or a prvalue of type std::nullptr_t. A null pointer
constant can be converted to a pointer type; the result is the null
pointer value of that type and is distinguishable from every other
value of object pointer or function pointer type. Such a conversion is
called a null pointer conversion. Two null pointer values of the same
type shall compare equal. The conversion of a null pointer constant to
a pointer to cv-qualified type is a single conversion, and not the
sequence of a pointer conversion followed by a qualification
conversion. A null pointer constant of integral type can be converted
to a prvalue of type std::nullptr_t.
Since (int*)0 is not a null pointer constant by the definition above, we do not qualify for the first bullet of [expr]/5. The composite type is not std::nullptr_t. Neither is (void *) ((x)*0) a null pointer constant, nor can it be turned into one. Removing the cast (something the definition doesn't allow) leaves us with (x)*0. This is a integer constant expression with value zero. But it is not an integer literal with value zero! The definition of a null pointer constant in C++ deviates from the one in C!
(N1570) 6.3.2.3 Pointers
3 An integer constant expression with the value 0, or such an
expression cast to type void *, is called a null pointer constant. If
a null pointer constant is converted to a pointer type, the resulting
pointer, called a null pointer, is guaranteed to compare unequal to a
pointer to any object or function.
C allows arbitrary constant expressions with value zero to form a null pointer constant, while C++ requires integer literals. Given C++'s rich support for computing constant expressions of a variety of literal types, this seems like a needless restriction. And one that makes the above approach to ICE_P a non-starter in C++.
While checking the references for another question, I noticed an odd clause in C++11, at [expr.rel] ¶3:
Pointers to void (after pointer conversions) can be compared, with a result defined as follows: If both
pointers represent the same address or are both the null pointer value, the result is true if the operator is
<= or >= and false otherwise; otherwise the result is unspecified.
This seems to mean that, once two pointers have been casted to void *, their ordering relation is no longer guaranteed; for example, this:
int foo[] = {1, 2, 3, 4, 5};
void *a = &foo[0];
void *b = &foo[1];
std::cout<<(a < b);
would seem to be unspecified.
Interestingly, this clause wasn't there in C++03 and disappeared in C++14, so if we take the example above and apply the C++14 wording to it, I'd say that ¶3.1
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
would apply, as a and b point to elements of the same array, even though they have been casted to void *. Notice that the wording of ¶3.1 was there pretty much the same in C++11, but seemed to be overridden by the void * clause.
Am I right in my understanding? What was the point of that oddball clause added in C++11 and immediately removed? Or maybe it's still there, but moved to/implied by some other part of the standard?
TL;DR:
in C++98/03 the clause was not present, and the standard did not specify relational operators for void pointers (core issue 879, see end of this post);
the odd clause about comparing void pointers was added in C++11 to resolve it, but this in turn gave rise to two other core issues 583 & 1512 (see below);
the resolution of these issues required the clause to be removed and be replaced with the wording found in C++14 standard, which allows for "normal" void * comparison.
Core Issue 583: Relational pointer comparisons against the null pointer constant
Relational pointer comparisons against the null pointer constant Section: 8.9 [expr.rel]
In C, this is ill-formed (cf C99 6.5.8):
void f(char* s) {
if (s < 0) { }
} ...but in C++, it's not. Why? Who would ever need to write (s > 0) when they could just as well write (s != 0)?
This has been in the language since the ARM (and possibly earlier);
apparently it's because the pointer conversions (7.11 [conv.ptr]) need
to be performed on both operands whenever one of the operands is of
pointer type. So it looks like the "null-ptr-to-real-pointer-type"
conversion is hitching a ride with the other pointer conversions.
Proposed resolution (April, 2013):
This issue is resolved by the resolution of issue 1512.
Core Issue 1512: Pointer comparison vs qualification conversions
Pointer comparison vs qualification conversions Section: 8.9 [expr.rel]
According to 8.9 [expr.rel] paragraph 2, describing pointer
comparisons,
Pointer conversions (7.11 [conv.ptr]) and qualification conversions
(7.5 [conv.qual]) are performed on pointer operands (or on a pointer
operand and a null pointer constant, or on two null pointer constants,
at least one of which is non-integral) to bring them to their
composite pointer type. This would appear to make the following
example ill-formed,
bool foo(int** x, const int** y) {
return x < y; // valid ? } because int** cannot be converted to const int**, according to the rules of 7.5 [conv.qual] paragraph 4.
This seems too strict for pointer comparison, and current
implementations accept the example.
Proposed resolution (November, 2012):
Relevant excerpts from resolution of the above issues are found in the paper: Pointer comparison vs qualification conversions (revision 3).
The following also resolves core issue 583.
Change in 5.9 expr.rel paragraphs 1 to 5:
In this section the following statement (the odd clause in C++11) has been expunged:
Pointers to void (after pointer conversions) can be compared, with a result defined as follows: If both pointers represent the same address or are both the null pointer value, the result is true if the operator is <= or >= and false otherwise; otherwise the result is unspecified
And the following statements have been added:
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the array, the latter pointer compares greater.
So in the final working draft of C++14 (n4140) section [expr.rel]/3, the above statements are found as they were stated at the time of the resolution.
Digging for the reason why this odd clause was added led me to a much earlier issue 879: Missing built-in comparison operators for pointer types.
The proposed resolution of this issue (in July, 2009) led to the addition of this clause which was voted into WP in October, 2009.
And that is how it came to be included in the C++11 standard.
Code sample:
struct name
{
int a, b;
};
int main()
{
&(((struct name *)NULL)->b);
}
Does this cause undefined behaviour? We could debate whether it "dereferences null", however C11 doesn't define the term "dereference".
6.5.3.2/4 clearly says that using * on a null pointer causes undefined behaviour; however it doesn't say the same for -> and also it does not define a -> b as being (*a).b ; it has separate definitions for each operator.
The semantics of -> in 6.5.2.3/4 says:
A postfix expression followed by the -> operator and an identifier designates a member
of a structure or union object. The value is that of the named member of the object to
which the first expression points, and is an lvalue.
However, NULL does not point to an object, so the second sentence seems underspecified.
Also relevant might be 6.5.3.2/1:
Constraints:
The operand of the unary & operator shall be either a function designator, the result of a
[] or unary * operator, or an lvalue that designates an object that is not a bit-field and is
not declared with the register storage-class specifier.
However I feel that the bolded text is defective and should read lvalue that potentially designates an object , as per 6.3.2.1/1 (definition of lvalue) -- C99 messed up the definition of lvalue, so C11 had to rewrite it and perhaps this section got missed.
6.3.2.1/1 does say:
An lvalue is an expression (with an object type other than void) that potentially
designates an object; if an lvalue does not designate an object when it is evaluated, the
behavior is undefined
however the & operator does evaluate its operand. (It doesn't access the stored value but that is different).
This long chain of reasoning seems to suggest that the code causes UB however it is fairly tenuous and it's not clear to me what the writers of the Standard intended. If in fact they intended anything, rather than leaving it up to us to debate :)
From a lawyer point of view, the expression &(((struct name *)NULL)->b); should lead to UB, since you could not find a path in which there would be no UB. IMHO the root cause is that at a moment you apply the -> operator on an expression that does not point to an object.
From a compiler point of view, assuming the compiler programmer was not overcomplicated, it is clear that the expression returns the same value as offsetof(name, b) would, and I'm pretty sure that provided it is compiled without error any existing compiler will give that result.
As written, we could not blame a compiler that would note that in the inner part you use operator -> on an expression than cannot point to an object (since it is null) and issue a warning or an error.
My conclusion is that until there is a special paragraph saying that provided it is only to take its address it is legal do dereference a null pointer, this expression is not legal C.
Yes, this use of -> has undefined behavior in the direct sense of the English term undefined.
The behavior is only defined if the first expression points to an object and not defined (=undefined) otherwise. In general you shouldn't search more in the term undefined, it means just that: the standard doesn't provide a meaning for your code. (Sometimes it points explicitly to such situations that it doesn't define, but this doesn't change the general meaning of the term.)
This is a slackness that is introduced to help compiler builders to deal with things. They may defined a behavior, even for the code that you are presenting. In particular, for a compiler implementation it is perfectly fine to use such code or similar for the offsetof macro. Making this code a constraint violation would block that path for compiler implementations.
Let's start with the indirection operator *:
6.5.3.2 p4:
The unary * operator denotes indirection. If the operand points to a function, the result is
a function designator; if it points to an object, the result is an lvalue designating the
object. If the operand has type "pointer to type", the result has type "type". If an
invalid value has been assigned to the pointer, the behavior of the unary * operator is
undefined. 102)
*E, where E is a null pointer, is undefined behavior.
There is a footnote that states:
102) Thus, &*E is equivalent to E (even if E is a null pointer), and &(E1[E2]) to ((E1)+(E2)). It is
always true that if E is a function designator or an lvalue that is a valid operand of the unary &
operator, *&E is a function designator or an lvalue equal to E. If *P is an lvalue and T is the name of
an object pointer type, *(T)P is an lvalue that has a type compatible with that to which T points.
Which means that &*E, where E is NULL, is defined, but the question is whether the same is true for &(*E).m, where E is a null pointer and its type is a struct that has a member m?
C Standard doesn't define that behavior.
If it were defined, new problems would arise, one of which is listed below. C Standard is correct to keep it undefined, and provides a macro offsetof that handles the problem internally.
6.3.2.3 Pointers
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant. 66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
This means that an integer constant expression with the value 0 is converted to a null pointer constant.
But the value of a null pointer constant is not defined as 0. The value is implementation defined.
7.19 Common definitions
The macros are
NULL
which expands to an implementation-defined null pointer constant
This means C allows an implementation where the null pointer will have a value where all bits are set and using member access on that value will result in an overflow which is undefined behavior
Another problem is how do you evaluate &(*E).m? Do the brackets apply and is * evaluated first. Keeping it undefined solves this problem.
First, let's establish that we need a pointer to an object:
6.5.2.3 Structure and union members
4 A postfix expression followed by the -> operator and an identifier designates a member
of a structure or union object. The value is that of the named member of the object to
which the first expression points, and is an lvalue.96) If the first expression is a pointer to
a qualified type, the result has the so-qualified version of the type of the designated
member.
Unfortunately, no null pointer ever points to an object.
6.3.2.3 Pointers
3 An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
Result: Undefined Behavior.
As a side-note, some other things to chew over:
6.3.2.3 Pointers
4 Conversion of a null pointer to another pointer type yields a null pointer of that type.
Any two null pointers shall compare equal.
5 An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.67)
6 Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. The result need not be in the range of values of any integer
type.
67) The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.
So even if the UB should happen to be benign this time, it might still result in some totally unexpected number.
Nothing in the C standard would impose any requirements on what a system could do with the expression. It would, when the standard was written, have been perfectly reasonable for it to to cause the following sequence of events at runtime:
Code loads a null pointer into the addressing unit
Code asks the addressing unit to add the offset of field b.
The addressing unit trigger a trap when attempting to add an integer to a null pointer (which should for robustness be a run-time trap, even though many systems don't catch it)
The system starts executing essentially random code after being dispatched through a trap vector that was never set because code to set it would have wasted been a waste of memory, as addressing traps shouldn't occur.
The very essence of what Undefined Behavior meant at the time.
Note that most of the compilers that have appeared since the early days of C would regard the address of a member of an object located at a constant address as being a compile-time constant, but I don't think such behavior was mandated then, nor has anything been added to the standard which would mandate that compile-time address calculations involving null pointers be defined in cases where run-time calculations would not.
No. Let's take this apart:
&(((struct name *)NULL)->b);
is the same as:
struct name * ptr = NULL;
&(ptr->b);
The first line is obviously valid and well defined.
In the second line, we calculate the address of a field relative to the address 0x0 which is perfectly legal as well. The Amiga, for example, had the pointer to the kernel in the address 0x4. So you could use a method like this to call kernel functions.
In fact, the same approach is used on the C macro offsetof (wikipedia):
#define offsetof(st, m) ((size_t)(&((st *)0)->m))
So the confusion here revolves around the fact that NULL pointers are scary. But from a compiler and standard point of view, the expression is legal in C (C++ is a different beast since you can overload the & operator).
Given a pointer p:
char *p ; // Could be any type
assuming p is properly initialized is the following well-formed:
if (p > 0) // or p > nullptr
More generally is it well-formed to use a relational operator when one operand is a pointer and the other is a null pointer constant?
In C++14 this code is ill-formed but prior to the C++14 this was well-formed code(but the result is unspecified), as defect report 583: Relational pointer comparisons against the null pointer constant notes:
In C, this is ill-formed (cf C99 6.5.8):
void f(char* s) {
if (s < 0) { }
}
...but in C++, it's not. Why? Who would ever need to write (s > 0)
when they could just as well write (s != 0)?
This has been in the language since the ARM (and possibly earlier);
apparently it's because the pointer conversions (4.10 [conv.ptr]) need
to be performed on both operands whenever one of the operands is of
pointer type. So it looks like the "null-ptr-to-real-pointer-type"
conversion is hitching a ride with the other pointer conversions.
In C++14 this was made ill-formed when N3624 was applied to the draft C++14 standard, which is a revision of N3478. The proposed resolution to 583 notes:
This issue is resolved by the resolution of issue 1512.
and issue 1512 proposed resolution is N3478(N3624 is a revision of N3478):
The proposed wording is found in document N3478.
Changes to section 5.9 from C++11 to C++14
Section 5.9 Relational operators changed a lot between the C++11 draft standard and the C++14 draft standard, the following highlights the most relevant differences (emphasis mine going forward), from paragraph 1:
The operands shall have arithmetic, enumeration, or pointer type, or
type std::nullptr_t.
changes to:
The operands shall have arithmetic, enumeration, or pointer type
So the type std::nullptr_t is no longer a valid operand but that still leaves 0 which is a null pointer constant and therefore can be converted(section 4.10) to a pointer type.
This is covered by paragraph 2 which in C++11 says:
[...]Pointer conversions (4.10) and qualification conversions (4.4)
are performed on pointer operands (or on a pointer operand and a null
pointer constant, or on two null pointer constants, at least one of
which is non-integral) to bring them to their composite pointer type.
If one operand is a null pointer constant, the composite pointer type
is std::nullptr_t if the other operand is also a null pointer constant
or, if the other operand is a pointer, the type of the other
operand.[...]
this explicitly provides an exception for a null pointer constant operand, changes to the following in C++14:
The usual arithmetic conversions are performed on operands of
arithmetic or enumeration type. If both operands are pointers, pointer
conversions (4.10) and qualification conversions (4.4) are performed
to bring them to their composite pointer type (Clause 5). After
conversions, the operands shall have the same type.
In which there is no case that allows 0 to be converted to a pointer type. Both operands must be pointers in order for pointer conversions to be applied and it is required that the operands have the same type after conversions. Which is not satisfied in the case where one operand is a pointer type and the other is a null pointer constant 0.
What if both operands are pointers but one is a null pointer value?
R Sahu asks, is the following code well-formed?:
char* p = "";
char* q = nullptr;
if ( p > q ) {}
Yes, in C++14 this code is well formed, both p and q are pointers but the result of the comparison is unspecified. The defined comparisons for two pointers is set out in paragraph 3 and says:
Comparing pointers to objects is defined as follows:
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher
subscript compares greater.
If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the
array, the latter pointer compares greater.
If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the
pointer to the later declared member compares greater provided the two
members have the same access control (Clause 11) and provided their
class is not a union.
Null pointers values are not defined here and later on in paragraph 4 it says:
[...]Otherwise, the result of each of the operators is unspecified.
In C++11 it specifically makes the results unspecified in paragraph 3:
If two pointers p and q of the same type point to different objects
that are not members of the same object or elements of the same array
or to different functions, or if only one of them is null, the results
of p<q, p>q, p<=q, and p>=q are unspecified.
I ran into a new warning message after a compiler upgrade.
warning: ordered comparison of pointer with integer zero [-Wextra]
if (inx > 0)
As it turns out inx is a pointer. Normally I would expect to see this old code compared against 0, or NULL. This got me to thinking about signed and unsigned values, and possible risk.
A bit of research suggests:
A pointer greater than zero in c++, what does mean?
Can a pointer (address) ever be negative?
memory address positive or negative value in c?
malloc returns negative value
These seem to suggest that an address (returned by malloc) can never be zero
Which took me to my old copy of the standard.
4.10 Pointer conversions
1 A null pointer constant is an integral constant expression (5.19) prvalue of integer type that evaluates to zero
or a prvalue of type std::nullptr_t. A null pointer constant can be converted to a pointer type; the result
is the null pointer value of that type and is distinguishable from every other value of pointer to object or
pointer to function type. Such a conversion is called a null pointer conversion. Two null pointer values of the
same type shall compare equal. The conversion of a null pointer constant to a pointer to cv-qualified type is
a single conversion, and not the sequence of a pointer conversion followed by a qualification conversion (4.4).
A null pointer constant of integral type can be converted to a prvalue of type std::nullptr_t.
It specifically states that two null pointers compare equal.
With that in mind, is that little piece of code undefined behavior? or is there another piece to the puzzle I am missing?
It's not undefined behaviour, but the result is unspecified if inx is not null.
C++11 5.9/2: If two pointers p and q of the same type point to different objects that are not members of the same object or elements of the same array or to different functions, or if only one of them is null, the results of p<q, p>q, p<=q, and p>=q are unspecified.
So you can be sure that the the conditional code won't execute if inx is null - but not that it will if it's not null. The comparison should probably be inx != 0, which is well defined to be true if and only if inx is non-null.
You're looking at pointer conversions, but you should be looking at pointer comparisons.
Specifically, comparisons between pointers that don't refer to (subobjects of) the same array or object.
Section 5.9, paragraphs 3 and 4, this wording is found in C++14 drafts.
Comparing pointers to objects is defined as follows:
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the array, the latter pointer compares greater.
If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member compares greater provided the two members have the same access control (Clause 11) and provided their class is not a union.
If two operands p and q compare equal (5.10), p<=q and p>=q both yield true and p<q and p>q both yield
false. Otherwise, if a pointer p compares greater than a pointer q, p>=q, p>q, q<=p, and q<p all yield true
and p<=q, p<q, q>=p, and q>p all yield false. Otherwise, the result of each of the operators is unspecified.
In your case, no "pointer compares greater than" relationship is defined, and therefore the operators act according to their "otherwise" behavior, giving unspecified results. This comparison won't directly crash the program, but it could take either branch through the if, assuming that inx is non-null.