Why does C++11 contain an odd clause about comparing void pointers? - c++

While checking the references for another question, I noticed an odd clause in C++11, at [expr.rel] ¶3:
Pointers to void (after pointer conversions) can be compared, with a result defined as follows: If both
pointers represent the same address or are both the null pointer value, the result is true if the operator is
<= or >= and false otherwise; otherwise the result is unspecified.
This seems to mean that, once two pointers have been casted to void *, their ordering relation is no longer guaranteed; for example, this:
int foo[] = {1, 2, 3, 4, 5};
void *a = &foo[0];
void *b = &foo[1];
std::cout<<(a < b);
would seem to be unspecified.
Interestingly, this clause wasn't there in C++03 and disappeared in C++14, so if we take the example above and apply the C++14 wording to it, I'd say that ¶3.1
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
would apply, as a and b point to elements of the same array, even though they have been casted to void *. Notice that the wording of ¶3.1 was there pretty much the same in C++11, but seemed to be overridden by the void * clause.
Am I right in my understanding? What was the point of that oddball clause added in C++11 and immediately removed? Or maybe it's still there, but moved to/implied by some other part of the standard?

TL;DR:
in C++98/03 the clause was not present, and the standard did not specify relational operators for void pointers (core issue 879, see end of this post);
the odd clause about comparing void pointers was added in C++11 to resolve it, but this in turn gave rise to two other core issues 583 & 1512 (see below);
the resolution of these issues required the clause to be removed and be replaced with the wording found in C++14 standard, which allows for "normal" void * comparison.
Core Issue 583: Relational pointer comparisons against the null pointer constant
Relational pointer comparisons against the null pointer constant Section: 8.9 [expr.rel]
In C, this is ill-formed (cf C99 6.5.8):
void f(char* s) {
if (s < 0) { }
} ...but in C++, it's not. Why? Who would ever need to write (s > 0) when they could just as well write (s != 0)?
This has been in the language since the ARM (and possibly earlier);
apparently it's because the pointer conversions (7.11 [conv.ptr]) need
to be performed on both operands whenever one of the operands is of
pointer type. So it looks like the "null-ptr-to-real-pointer-type"
conversion is hitching a ride with the other pointer conversions.
Proposed resolution (April, 2013):
This issue is resolved by the resolution of issue 1512.
Core Issue 1512: Pointer comparison vs qualification conversions
Pointer comparison vs qualification conversions Section: 8.9 [expr.rel]
According to 8.9 [expr.rel] paragraph 2, describing pointer
comparisons,
Pointer conversions (7.11 [conv.ptr]) and qualification conversions
(7.5 [conv.qual]) are performed on pointer operands (or on a pointer
operand and a null pointer constant, or on two null pointer constants,
at least one of which is non-integral) to bring them to their
composite pointer type. This would appear to make the following
example ill-formed,
bool foo(int** x, const int** y) {
return x < y; // valid ? } because int** cannot be converted to const int**, according to the rules of 7.5 [conv.qual] paragraph 4.
This seems too strict for pointer comparison, and current
implementations accept the example.
Proposed resolution (November, 2012):
Relevant excerpts from resolution of the above issues are found in the paper: Pointer comparison vs qualification conversions (revision 3).
The following also resolves core issue 583.
Change in 5.9 expr.rel paragraphs 1 to 5:
In this section the following statement (the odd clause in C++11) has been expunged:
Pointers to void (after pointer conversions) can be compared, with a result defined as follows: If both pointers represent the same address or are both the null pointer value, the result is true if the operator is <= or >= and false otherwise; otherwise the result is unspecified
And the following statements have been added:
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the array, the latter pointer compares greater.
So in the final working draft of C++14 (n4140) section [expr.rel]/3, the above statements are found as they were stated at the time of the resolution.
Digging for the reason why this odd clause was added led me to a much earlier issue 879: Missing built-in comparison operators for pointer types.
The proposed resolution of this issue (in July, 2009) led to the addition of this clause which was voted into WP in October, 2009.
And that is how it came to be included in the C++11 standard.

Related

p > nullptr: Undefined behavior?

The following flawed code for a null pointer check compiles with some compilers but not with others (see godbolt):
bool f()
{
char c;
return &c > nullptr;
}
The offensive part is the relational comparison between a pointer and nullptr.
The comparison compiles with
gcc before 11.1
MSVC 19.latest with /std:c++17, but not with /std:c++20.
But no version of clang (I checked down to 4.0) compiles it.
The error newer gccs produce ("ordered comparison of pointer with integer zero ('char*' and 'std::nullptr_t')"1 is a bit different than the one from clang ("invalid operands to binary expression ('char *' and 'std::nullptr_t')").
The C++20 ISO standard says about relational operators applied to pointers in 7.6.9/3ff:
If both operands are pointers, pointer conversions (7.3.12) [...] are performed to
bring them to their composite pointer type (7.2.2). After conversions, the operands shall have the same type.
The result of comparing unequal pointers to objects is defined in terms of a partial order consistent with the following rules:
(4.1) — If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript is required to compare greater.
(4.2) — If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member is required to compare greater provided the two members have the same access control (11.9), neither member is a subobject of zero size, and their class is not a union.
(4.3) — Otherwise, neither pointer is required to compare greater than the other.
("Pointer to object" in this context means only that the type is not a pointer to function, not that the pointer value refers to an actual object.)
(4.1) and (4.2) clearly don't apply here, which leaves (4.3). Does (4.3), "neither pointer is required", mean the behavior is undefined, and the code is invalid? By comparison, the 2012 standard contained the wording in 5.9/2 "[...] if only one of [two pointers of the same type p and q] is null, the results
of p<q, p>q, p<=q, and p>=q are unspecified."
1 The wording doesn't seem correct. nullptr_t is not, if I read the standard correctly, an integral type.
7.6.9 states, "The lvalue-to-rvalue ([conv.lval]), array-to-pointer ([conv.array]), and function-to-pointer ([conv.func]) standard conversions are performed on the operands. .... The converted operands shall have arithmetic, enumeration, or pointer type."
None of the specified conversions is applicable to the literal nullptr. Also, it does not have arithmetic, enumeration, or pointer type. Therefore, the comparison is ill-formed.

Do all transient allocations have unique address?

While reading comments of a C++ Weekly video about the constexpr new support in C++20 I found the comment that alleges that C++20 allows UB in constexpr context.
At first I was convinced that comment is right, but more I thought about it more and more I began to suspect that C++20 wording contains some clever language that makes this defined behavior.
Either that all transient allocations return unique addresses or maybe some more general notion in C++ that makes 2 distinct allocation pointers always(even in nonconstexpr context) compare false even if at runtime in reality it is possible that allocator would give you back same address(since you deleted the first allocation).
As a bonus weirdness: you can only use == for comparison, <, > fail...
Here is the program with alleged UB in constexpr:
#include <iostream>
static constexpr bool f()
{
auto p = new int(1);
delete p;
auto q = new int(2);
delete q;
return p == q;
}
int main()
{
constexpr bool res1 = f();
std::cout << res1 << std::endl; // May output 0 or 1
}
godbolt
The result here is implementation-defined. res1 could be false, true, or ill-formed, based on how the implementation wants to define it. And this is just as true for equality comparison as it is for relational comparison.
Both [expr.eq] (for equality) and [expr.rel] (for relational) start by doing an lvalue-to-rvalue conversion on the pointers (because we have to actually read what the value is to do a comparison). [conv.lval]/3 says that the result of that conversion is:
Otherwise, if the object to which the glvalue refers contains an invalid pointer value ([basic.stc.dynamic.deallocation], [basic.stc.dynamic.safety]), the behavior is implementation-defined.
That is the case here: both pointers contain an invalid pointer value, as per [basic.stc.general]/4:
When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of that region of storage become invalid pointer values. Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.
with a footnote reading:
Some implementations might define that copying an invalid pointer value causes a system-generated runtime fault.
So the value we get out of the lvalue-to-rvalue conversion is... implementation-defined. It could be implementation-defined in a way that causes those two pointers to compare equal. It could be implementation-defined in a way that causes those two pointers to compare not equal (as apparently all implementations do). Or it could even be implementation-defined in a way that causes the comparison between those two pointers to be unspecified or undefined behavior.
Notably, [expr.const]/5 (the main rule governing constant expressions), despite rejecting undefined behavior and explicitly rejecting any comparison whose result is unspecified ([expr.const]/5.23), says nothing about a comparison whose result is implementation-defined.
There's no undefined behavior here. Anything goes. Which is admittedly very weird during constant evaluation, where we'd expect to see a stricter set of rules.
Notably, with p < q, it appears that gcc and clang reject the comparison as being not a constant expression (which is... an allowed result) while msvc considers both p < q and p > q to be constant expressions whose value is false (which is... also an allowed result).

"is not required" == undefined behavior?

My question is mainly about terminology and how to interpret the standard.
[expr.rel]#4:
The result of comparing unequal pointers to objects is defined in terms of a partial order consistent with the following rules:
(4.1) If two pointers point to different elements of the same array,
or to subobjects thereof, the pointer to the element with the higher
subscript is required to compare greater.
(4.2) If two pointers point
to different non-static data members of the same object, or to
subobjects of such members, recursively, the pointer to the later
declared member is required to compare greater provided the two
members have the same access control ([class.access]), neither member
is a subobject of zero size, and their class is not a union.
(4.3) Otherwise, neither pointer is required to compare greater than the
other.
I am little confused as how to interpret (4.3). Does that mean that this
#include <iostream>
int main() {
int x;
int y;
std::cout << (&x < &y);
std::cout << (&x < &y);
}
is...
valid C++ code and the output is either 11 or 00.
invalid code, because it has undefined behaivour
?
In other words, I know that (4.3) does apply here, but I am not sure about the implications. When the standard says "it can be either A or B" is this the same as saying "it is undefined" ?
The wording has changed in various editions of the C++ standard, and in the recent draft cited in the question. (See my comments on the question for the gory details.)
C++11 says:
Other pointer comparisons are unspecified.
C++17 says:
Otherwise, neither pointer compares greater than the other.
The latest draft, cited in the question, says:
Otherwise, neither pointer is required to compare greater than the other.
That change was made in response to an issue saying ""compares greater" term is needlessly confusing".
If you look at the surrounding context in the draft standard, it's clear that in the remaining cases the result is unspecified. Quoting from [expr.rel] (text in italics is my summary):
The result of comparing unequal pointers to objects is defined in
terms of a partial order consistent with the following rules:
[pointers to elements of the same array]
[pointers to members of the same object]
[remaining cases] Otherwise, neither pointer is required to compare greater than the other.
If two operands p and q compare equal, p<=q and
p>=q both yield true and p<q and p>q both yield
false. Otherwise, if a pointer p compares greater than a pointer q, p>=q, p>q, q<=p, and q<p all
yield true and p<=q, p<q, q>=p, and q>p
all yield false. Otherwise, the result of each of the operators
is unspecified.
So the result of the < operator in such cases is unspecified, but it does not have undefined behavior. It can be either true or false, but I don't believe it's required to be consistent. The program's output could be any of 00, 01, 10, or 11.
For the provided code, this case applies:
(4.3) Otherwise, neither pointer is required to compare greater than the other.
There is no mention of UB, and so a strict reading of "neither is required" suggests that the result of the comparison could be different every time it's evaluated.
This means the program could validly output any of the following results:
00
01
10
11
valid C++ code
Yes.
Nowhere does the standard say that this is UB or ill-formed, and neither is this case lacking a rule describing the behaviour because the quoted 4.3 applies.
and the output is either 11 or 00
I'm not sure that 10 or 01 are technically guaranteed to not be output 1.
Given that neither pointer is required to compare greater than the other, the result of the comparison can be either true of false. There appears to not be an explicit requirement for the result to be the same for each invocation on same operands in this case.
1 But I consider this unlikely in practice. I also think that leaving such possibility open is not intentional. Rather, the intention is to allow for deterministic, but not necessarily total order.
P.S.
auto comp = std::less<>;
std::cout << comp(&x, &y);
std::cout << comp(&x, &y);
would be guaranteed to be either 11 or 00 because std::less (like its friends) is guaranteed to impose a strict total order for pointers.
x and y are not part of the same array, per (4.1). And they are not members of the same object, per (4.2). So, you fall into (4.3), which means if you try to compare them to each other, the result of the comparison is indeterminate, it could be true or false. If it were undefined behavior instead, the standard would likely state that explicitly.

Is &arr[size] valid?

Let's say I have a function, called like this:
void mysort(int *arr, std::size_t size)
{
std::sort(&arr[0], &arr[size]);
}
int main()
{
int a[] = { 42, 314 };
mysort(a, 2);
}
My question is: does the code of mysort (more specifically, &arr[size]) have defined behaviour?
I know it would be perfectly valid if replaced by arr + size; pointer arithmetic allows pointing past-the-end normally. However, my question is specifically about the use of & and [].
Per C++11 5.2.1/1, arr[size] is equivalent to *(arr + size).
Quoting 5.3.1/1, the rules for unary *:
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an
object type, or a pointer to a function type and the result is an lvalue referring to the object or function
to which the expression points. If the type of the expression is “pointer to T,” the type of the result is “T.”
[ Note: a pointer to an incomplete type (other than cv void) can be dereferenced. The lvalue thus obtained
can be used in limited ways (to initialize a reference, for example); this lvalue must not be converted to a
prvalue, see 4.1. —end note ]
Finally, 5.3.1/3 giving the rules for &:
The result of the unary & operator is a pointer to its operand. The operand shall be an lvalue ... if the type of
the expression is T, the result has type “pointer to T” and is a prvalue that is the address of the designated object (1.7) or a pointer to the designated function.
(Emphasis and ellipses mine).
I can't quite make up my mind about this. I know for sure that forcing an lvalue-to-rvalue conversion on arr[size] would be Undefined. But no such conversion happens in the code. arr + size does not point to an object; but while the paragraphs above talk about objects, they never seem to explicitly call out the necessity for an object to actually exist at that location (unlike e.g. the lvalue-to-rvalue conversion in 4.1/1).
So, the questio is: is mysort, the way it's called, valid or not?
(Note that I'm quoting C++11 above, but if this is handled more explicitly in a later standard/draft, I would be perfectly happy with that).
It's not valid. You bolded "result is an lvalue referring to the object or function to which the expression points" in your question. That's exactly the problem. array + size is a valid pointer value that does not point to an object. Therefore, your quote about *(array + size) does not specify what the result refers to, and that then means there is no requirement for &*(array + size) to give the same value as array + size.
In C, this was considered a defect and fixed so that the spec now says in &*ptr, neither & nor * gets evaluated. C++ hasn't yet received fixed wording. It's the subject of a very old still active DR: DR #232. The intent is that it is valid, just as it is in C, but the standard doesn't say so.
In the context of normal C++ arrays, yes. It is legal to form the address of the one-past-the-end element of the array. It is not legal to read or write to what it is pointing at, however (after all, there is no actual element there). So when you do the &arr[size], the arr[size] forms what you might think of as a reference to the one-past-the-end element, but has not tried to actually access that element yet. Then the & gets you the address of that element. Since nothing has tried to actually follow that pointer, nothing bad has happened.
This isn't by accident, this makes pointers into arrays behave like iterators. Thus &a[0] is essentially .begin() on the array, and &a[size] (where size is the number of elements in the array) is essentially .end(). (See also std::array where this ends up being more explicit)
Edit: Erm, I may have to retract this answer. While it probably applies in most cases, if the type stored in the array has an overridden operator& then when you do the &a[size], the operator& method may attempt to access members of the instance of the type at a[size] where there is no instance.
Assuming size is the actual array size, you are passing a pointer to past-the-end element to std::sort().
So, as I understand it, the question boils down to: is this pointer equivalent to arr.end()?
There is little doubt this is true for every existing compiler, since array iterators are indeed plain old pointers, so &arr[size] is the obvious choice for arr.end().
However, I doubt there is a specific requirement about the actual implementation of plain old array iterators.
So, for the sake of the argument, you could imagine a compiler using a "past end" bit in addition to the actual address to implement plain old array iterators internally and perversely paint your mustache pink if it detected any concievable inconsistency between iterators and addresses obtained through pointer arithmetics.
This freakish compiler would cause a lot of existing C++ code to crash without actually violating the spec, which might just be worth the effort of designing it...
If we admit that arr[i] is just a shorthand for *(arr + i), we can rewrite &arr[size] as &*(arr + size). Hence, we are dereferencing a pointer that points to the past-the-end element, which leads to an undefined behavior. As you correctly say, arr + size would instead be legal, because no dereferencing operation takes place.
Coincidentally, this is also presented as a quiz in Stepanov's notes (page 11).
It's perfectly fine and well defined as long as size is not larger than the size of the actual array (in units of the array elements).
So if main () called mysort (a, 100), &arr [size] would already be undefined behaviour (but most likely undetected, but std::sort would obviously go wrong badly as well).

Given that p is a pointer is "p > nullptr" well-formed?

Given a pointer p:
char *p ; // Could be any type
assuming p is properly initialized is the following well-formed:
if (p > 0) // or p > nullptr
More generally is it well-formed to use a relational operator when one operand is a pointer and the other is a null pointer constant?
In C++14 this code is ill-formed but prior to the C++14 this was well-formed code(but the result is unspecified), as defect report 583: Relational pointer comparisons against the null pointer constant notes:
In C, this is ill-formed (cf C99 6.5.8):
void f(char* s) {
if (s < 0) { }
}
...but in C++, it's not. Why? Who would ever need to write (s > 0)
when they could just as well write (s != 0)?
This has been in the language since the ARM (and possibly earlier);
apparently it's because the pointer conversions (4.10 [conv.ptr]) need
to be performed on both operands whenever one of the operands is of
pointer type. So it looks like the "null-ptr-to-real-pointer-type"
conversion is hitching a ride with the other pointer conversions.
In C++14 this was made ill-formed when N3624 was applied to the draft C++14 standard, which is a revision of N3478. The proposed resolution to 583 notes:
This issue is resolved by the resolution of issue 1512.
and issue 1512 proposed resolution is N3478(N3624 is a revision of N3478):
The proposed wording is found in document N3478.
Changes to section 5.9 from C++11 to C++14
Section 5.9 Relational operators changed a lot between the C++11 draft standard and the C++14 draft standard, the following highlights the most relevant differences (emphasis mine going forward), from paragraph 1:
The operands shall have arithmetic, enumeration, or pointer type, or
type std::nullptr_t.
changes to:
The operands shall have arithmetic, enumeration, or pointer type
So the type std::nullptr_t is no longer a valid operand but that still leaves 0 which is a null pointer constant and therefore can be converted(section 4.10) to a pointer type.
This is covered by paragraph 2 which in C++11 says:
[...]Pointer conversions (4.10) and qualification conversions (4.4)
are performed on pointer operands (or on a pointer operand and a null
pointer constant, or on two null pointer constants, at least one of
which is non-integral) to bring them to their composite pointer type.
If one operand is a null pointer constant, the composite pointer type
is std::nullptr_t if the other operand is also a null pointer constant
or, if the other operand is a pointer, the type of the other
operand.[...]
this explicitly provides an exception for a null pointer constant operand, changes to the following in C++14:
The usual arithmetic conversions are performed on operands of
arithmetic or enumeration type. If both operands are pointers, pointer
conversions (4.10) and qualification conversions (4.4) are performed
to bring them to their composite pointer type (Clause 5). After
conversions, the operands shall have the same type.
In which there is no case that allows 0 to be converted to a pointer type. Both operands must be pointers in order for pointer conversions to be applied and it is required that the operands have the same type after conversions. Which is not satisfied in the case where one operand is a pointer type and the other is a null pointer constant 0.
What if both operands are pointers but one is a null pointer value?
R Sahu asks, is the following code well-formed?:
char* p = "";
char* q = nullptr;
if ( p > q ) {}
Yes, in C++14 this code is well formed, both p and q are pointers but the result of the comparison is unspecified. The defined comparisons for two pointers is set out in paragraph 3 and says:
Comparing pointers to objects is defined as follows:
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher
subscript compares greater.
If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the
array, the latter pointer compares greater.
If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the
pointer to the later declared member compares greater provided the two
members have the same access control (Clause 11) and provided their
class is not a union.
Null pointers values are not defined here and later on in paragraph 4 it says:
[...]Otherwise, the result of each of the operators is unspecified.
In C++11 it specifically makes the results unspecified in paragraph 3:
If two pointers p and q of the same type point to different objects
that are not members of the same object or elements of the same array
or to different functions, or if only one of them is null, the results
of p<q, p>q, p<=q, and p>=q are unspecified.