void *p...; if (p > 0) .... Is this undefined behavior? - c++

I ran into a new warning message after a compiler upgrade.
warning: ordered comparison of pointer with integer zero [-Wextra]
if (inx > 0)
As it turns out inx is a pointer. Normally I would expect to see this old code compared against 0, or NULL. This got me to thinking about signed and unsigned values, and possible risk.
A bit of research suggests:
A pointer greater than zero in c++, what does mean?
Can a pointer (address) ever be negative?
memory address positive or negative value in c?
malloc returns negative value
These seem to suggest that an address (returned by malloc) can never be zero
Which took me to my old copy of the standard.
4.10 Pointer conversions
1 A null pointer constant is an integral constant expression (5.19) prvalue of integer type that evaluates to zero
or a prvalue of type std::nullptr_t. A null pointer constant can be converted to a pointer type; the result
is the null pointer value of that type and is distinguishable from every other value of pointer to object or
pointer to function type. Such a conversion is called a null pointer conversion. Two null pointer values of the
same type shall compare equal. The conversion of a null pointer constant to a pointer to cv-qualified type is
a single conversion, and not the sequence of a pointer conversion followed by a qualification conversion (4.4).
A null pointer constant of integral type can be converted to a prvalue of type std::nullptr_t.
It specifically states that two null pointers compare equal.
With that in mind, is that little piece of code undefined behavior? or is there another piece to the puzzle I am missing?

It's not undefined behaviour, but the result is unspecified if inx is not null.
C++11 5.9/2: If two pointers p and q of the same type point to different objects that are not members of the same object or elements of the same array or to different functions, or if only one of them is null, the results of p<q, p>q, p<=q, and p>=q are unspecified.
So you can be sure that the the conditional code won't execute if inx is null - but not that it will if it's not null. The comparison should probably be inx != 0, which is well defined to be true if and only if inx is non-null.

You're looking at pointer conversions, but you should be looking at pointer comparisons.
Specifically, comparisons between pointers that don't refer to (subobjects of) the same array or object.
Section 5.9, paragraphs 3 and 4, this wording is found in C++14 drafts.
Comparing pointers to objects is defined as follows:
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the array, the latter pointer compares greater.
If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member compares greater provided the two members have the same access control (Clause 11) and provided their class is not a union.
If two operands p and q compare equal (5.10), p<=q and p>=q both yield true and p<q and p>q both yield
false. Otherwise, if a pointer p compares greater than a pointer q, p>=q, p>q, q<=p, and q<p all yield true
and p<=q, p<q, q>=p, and q>p all yield false. Otherwise, the result of each of the operators is unspecified.
In your case, no "pointer compares greater than" relationship is defined, and therefore the operators act according to their "otherwise" behavior, giving unspecified results. This comparison won't directly crash the program, but it could take either branch through the if, assuming that inx is non-null.

Related

Operator less than between a non-null raw pointer and nullptr

Are the operations nullptr < ptr and ptr < nullptr well defined for a non-null raw pointer ptr != nullptr? Quotes from the C++ standard are welcome.
Such a comparison is well-formed, but its result is unspecified.
[expr.rel]/3 Comparing pointers to objects is defined as follows:
— If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
— If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the array, the latter pointer compares greater.
— If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member compares greater provided the two members have the same access control (Clause 11) and provided their class is not a union.
[expr.rel]/4 If two operands p and q compare equal (5.10), p<=q and p>=q both yield true and p<q and p>q both yield false. Otherwise, if a pointer p compares greater than a pointer q, p>=q, p>q, q<=p, and q<p all yield true and p<=q, p<q, q>=p, and q>p all yield false. Otherwise, the result of each of the operators is unspecified.
A null pointer doesn't fall into any of the three clauses of [expr.rel]/3, and so it compares neither greater nor less than a non-null pointer. This case then falls into the "otherwise" clause of [expr.rel]/4.

How to write C/C++ code correctly when null pointer is not all bits zero

As the comp.lang.c FAQ says, there are architectures where the null pointer is not all bits zero. So the question is what actually checks the following construction:
void* p = get_some_pointer();
if (!p)
return;
Am I comparing p with machine dependent null pointer or I'm comparing p with arithmetic zero?
Should I write
void* p = get_some_pointer();
if (NULL == p)
return;
instead to be ready for such architectures or is it just my paranoia?
According to the C spec:
An integer constant expression with the value 0, or such an expression
cast to type void *, is called a null pointer constant. 55) If a null
pointer constant is converted to a pointer type, the resulting
pointer, called a null pointer, is guaranteed to compare unequal to a
pointer to any object or function.
So 0 is a null pointer constant. And if we convert it to a pointer type we will get a null pointer that might be non-all-bits-zero for some architectures. Next let's see what the spec says about comparing pointers and a null pointer constant:
If one operand is a
pointer and the other is a null pointer constant, the null pointer
constant is converted to the type of the pointer.
Let's consider (p == 0): first 0 is converted to a null pointer, and then p is compared with a null pointer constant whose actual bit values are architecture-dependent.
Next, see what the spec says about the negation operator:
The result of the logical negation operator ! is 0 if the value of its
operand compares unequal to 0, 1 if the value of its operand compares
equal to 0. The result has type int. The expression !E is equivalent
to (0==E).
This means that (!p) is equivalent to (p == 0) which is, according to the spec, testing p against the machine-defined null pointer constant.
Thus, you may safely write if (!p) even on architectures where the null pointer constant is not all-bits-zero.
As for C++, a null pointer constant is defined as:
A null pointer constant is an integral constant expression (5.19)
prvalue of integer type that evaluates to zero or a prvalue of type
std::nullptr_t. A null pointer constant can be converted to a pointer
type; the result is the null pointer value of that type and is
distinguishable from every other value of object pointer or function
pointer type.
Which is close to what we have for C, plus the nullptr syntax sugar. The behavior of operator == is defined by:
In addition, pointers to members can be compared, or a pointer to
member and a null pointer constant. Pointer to member conversions
(4.11) and qualification conversions (4.4) are performed to bring them
to a common type. If one operand is a null pointer constant, the
common type is the type of the other operand. Otherwise, the common
type is a pointer to member type similar (4.4) to the type of one of
the operands, with a cv-qualification signature (4.4) that is the
union of the cv-qualification signatures of the operand types. [ Note:
this implies that any pointer to member can be compared to a null
pointer constant. — end note ]
That leads to conversion of 0 to a pointer type (as for C). For the negation operator:
The operand of the logical negation operator ! is contextually
converted to bool (Clause 4); its value is true if the converted
operand is true and false otherwise. The type of the result is bool.
That means that result of !p depends on how conversion from pointer to bool is performed. The standard says:
A zero value, null pointer value, or null member pointer value is
converted to false;
So if (p==NULL) and if (!p) does the same things in C++ too.
It doesn't matter if null pointer is all-bits zero or not in the actual machine. Assuming p is a pointer:
if (!p)
is always a legal way to test if p is a null pointer, and it's always equivalent to:
if (p == NULL)
You may be interested in another C-FAQ article: This is strange. NULL is guaranteed to be 0, but the null pointer is not?
Above is true for both C and C++. Note that in C++(11), it's preferred to use nullptr for null pointer literal.
This answer applies to C.
Don't mix up NULL with null pointers. NULL is just a macro guaranteed to be a null pointer constant. A null pointer constant is guaranteed to be either 0 or (void*)0.
From C11 6.3.2.3:
An integer constant expression with the value 0, or such an expression
cast to type void *, is called a null pointer constant 66). If a null
pointer constant is converted to a pointer type, the resulting
pointer, called a null pointer, is guaranteed to compare unequal to a
pointer to any object or function.
66) The macro NULL is defined in <stddef.h> (and other headers) as a null pointer constant; see 7.19.
7.19:
The macros are
NULL
which expands to an implementation-defined null pointer constant;
Implementation-defined in the case of NULL, is either 0 or (void*)0. NULL cannot be anything else.
However, when a null pointer constant is assigned to a pointer, you get a null pointer, which may not have the value zero, even though it compares equal to a null pointer constant. The code if (!p) has nothing to do with the NULL macro, you are comparing a null pointer against the arithmetic value zero.
So in theory, code like int* p = NULL may result in a null pointer p which is different from zero.
Back in the day, STRATUS computers had null pointers as 1 in all languages.
This caused issues for C, so their C compiler would allow pointer comparison of 0 and 1 to return true
This would allow:
void * ptr=some_func();
if (!ptr)
{
return;
}
To return on a null ptr even though you could see that ptr had a value of 1 in the debugger
if ((void *)0 == (void *)1)
{
printf("Welcome to STRATUS\n");
}
Would in fact print "Welcome to STRATUS"
If your compiler is any good there are two things (and only two things) to watch out for.
1: Static default initialized (that is, not assigned) pointers won't have NULL in them.
2: memset() on a struct or array or by extension calloc() won't set pointers to NULL.

Given that p is a pointer is "p > nullptr" well-formed?

Given a pointer p:
char *p ; // Could be any type
assuming p is properly initialized is the following well-formed:
if (p > 0) // or p > nullptr
More generally is it well-formed to use a relational operator when one operand is a pointer and the other is a null pointer constant?
In C++14 this code is ill-formed but prior to the C++14 this was well-formed code(but the result is unspecified), as defect report 583: Relational pointer comparisons against the null pointer constant notes:
In C, this is ill-formed (cf C99 6.5.8):
void f(char* s) {
if (s < 0) { }
}
...but in C++, it's not. Why? Who would ever need to write (s > 0)
when they could just as well write (s != 0)?
This has been in the language since the ARM (and possibly earlier);
apparently it's because the pointer conversions (4.10 [conv.ptr]) need
to be performed on both operands whenever one of the operands is of
pointer type. So it looks like the "null-ptr-to-real-pointer-type"
conversion is hitching a ride with the other pointer conversions.
In C++14 this was made ill-formed when N3624 was applied to the draft C++14 standard, which is a revision of N3478. The proposed resolution to 583 notes:
This issue is resolved by the resolution of issue 1512.
and issue 1512 proposed resolution is N3478(N3624 is a revision of N3478):
The proposed wording is found in document N3478.
Changes to section 5.9 from C++11 to C++14
Section 5.9 Relational operators changed a lot between the C++11 draft standard and the C++14 draft standard, the following highlights the most relevant differences (emphasis mine going forward), from paragraph 1:
The operands shall have arithmetic, enumeration, or pointer type, or
type std::nullptr_t.
changes to:
The operands shall have arithmetic, enumeration, or pointer type
So the type std::nullptr_t is no longer a valid operand but that still leaves 0 which is a null pointer constant and therefore can be converted(section 4.10) to a pointer type.
This is covered by paragraph 2 which in C++11 says:
[...]Pointer conversions (4.10) and qualification conversions (4.4)
are performed on pointer operands (or on a pointer operand and a null
pointer constant, or on two null pointer constants, at least one of
which is non-integral) to bring them to their composite pointer type.
If one operand is a null pointer constant, the composite pointer type
is std::nullptr_t if the other operand is also a null pointer constant
or, if the other operand is a pointer, the type of the other
operand.[...]
this explicitly provides an exception for a null pointer constant operand, changes to the following in C++14:
The usual arithmetic conversions are performed on operands of
arithmetic or enumeration type. If both operands are pointers, pointer
conversions (4.10) and qualification conversions (4.4) are performed
to bring them to their composite pointer type (Clause 5). After
conversions, the operands shall have the same type.
In which there is no case that allows 0 to be converted to a pointer type. Both operands must be pointers in order for pointer conversions to be applied and it is required that the operands have the same type after conversions. Which is not satisfied in the case where one operand is a pointer type and the other is a null pointer constant 0.
What if both operands are pointers but one is a null pointer value?
R Sahu asks, is the following code well-formed?:
char* p = "";
char* q = nullptr;
if ( p > q ) {}
Yes, in C++14 this code is well formed, both p and q are pointers but the result of the comparison is unspecified. The defined comparisons for two pointers is set out in paragraph 3 and says:
Comparing pointers to objects is defined as follows:
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher
subscript compares greater.
If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the
array, the latter pointer compares greater.
If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the
pointer to the later declared member compares greater provided the two
members have the same access control (Clause 11) and provided their
class is not a union.
Null pointers values are not defined here and later on in paragraph 4 it says:
[...]Otherwise, the result of each of the operators is unspecified.
In C++11 it specifically makes the results unspecified in paragraph 3:
If two pointers p and q of the same type point to different objects
that are not members of the same object or elements of the same array
or to different functions, or if only one of them is null, the results
of p<q, p>q, p<=q, and p>=q are unspecified.

Are comparisons on out-of-range pointers well-defined?

Given the following code:
char buffer[1024];
char * const begin = buffer;
char * const end = buffer + 1024;
char *p = begin + 2000;
if (p < begin || p > end)
std::cout << "pointer is out of range\n";
Are the comparisons performed (p < begin and p > end) well-defined? Or does this code have undefined behaviour because the pointer has been advanced past the end of the array?
If the comparisons are well defined, what is that definition?
(extra credit: is the evaluation of begin + 2000 itself undefined behaviour?)
I'll assume the C++11 standard. According to section 5.7 (Additive Operands) paragraph 5, the behavior of *p = begin + 2000 is undefined first, before you even get to the comparison:
If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior
is undefined.
The evaluation of begin+2000 is undefined, it's going past the end of the array - you can go up to one past the end, but not further.
From C++11 §5.7/5 Additive operators:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is
undefined.
For pointer comparisons to be specified, assuming you have valid pointers to start with, they essentially need to be pointers to the same array (or one past the end), or pointers to non-static data members of the same access control of the same object (unless it's an union...).
The details are in §5.9/2 Relational operators:
Pointers to objects or functions of the same type (after pointer conversions) can be compared, with a result defined as follows:
If two pointers p and q of the same type point to the same object or function, or both point one past
the end of the same array, or are both null, then p<=q and p>=q both yield true and p<q and p>q
both yield false.
If two pointers p and q of the same type point to different objects that are not members of the same
object or elements of the same array or to different functions, or if only one of them is null, the results
of p<q, p>q, p<=q, and p>=q are unspecified.
If two pointers point to non-static data members of the same object, or to subobjects or array elements
of such members, recursively, the pointer to the later declared member compares greater provided the
two members have the same access control (Clause 11) and provided their class is not a union.
If two pointers point to non-static data members of the same object with different access control
(Clause 11) the result is unspecified.
— If two pointers point to non-static data members of the same union object, they compare equal (after
conversion to void*, if necessary). If two pointers point to elements of the same array or one beyond
the end of the array, the pointer to the object with the higher subscript compares higher.
Other pointer comparisons are unspecified.
Your program's behavior is undefined, but not because of the comparison.
The evaluation of the expression begin + 2000 has undefined behavior because the result would point more than one element past the end of the 1024-element array.
Quoting C++11 (actually the N3485 draft), 5.7p4 [expr.add]:
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. [...]
If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior
is undefined.
In short, just computing an out-of-bounds pointer has undefined behavior; it doesn't matter what operations you perform on that pointer after that.

Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?

I have seen it asserted several times now that the following code is not allowed by the C++ Standard:
int array[5];
int *array_begin = &array[0];
int *array_end = &array[5];
Is &array[5] legal C++ code in this context?
I would like an answer with a reference to the Standard if possible.
It would also be interesting to know if it meets the C standard. And if it isn't standard C++, why was the decision made to treat it differently from array + 5 or &array[4] + 1?
Yes, it's legal. From the C99 draft standard:
§6.5.2.1, paragraph 2:
A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero).
§6.5.3.2, paragraph 3 (emphasis mine):
The unary & operator yields the address of its operand. If the operand has type ‘‘type’’,
the result has type ‘‘pointer to type’’. If the operand is the result of a unary * operator,
neither that operator nor the & operator is evaluated and the result is as if both were
omitted, except that the constraints on the operators still apply and the result is not an
lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator
were removed and the [] operator were changed to a + operator. Otherwise, the result is
a pointer to the object or function designated by its operand.
§6.5.6, paragraph 8:
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist. Moreover, if the expression P points to the last
element of an array object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element of an array object,
the expression (Q)-1 points to the last element of the array object. If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
Note that the standard explicitly allows pointers to point one element past the end of the array, provided that they are not dereferenced. By 6.5.2.1 and 6.5.3.2, the expression &array[5] is equivalent to &*(array + 5), which is equivalent to (array+5), which points one past the end of the array. This does not result in a dereference (by 6.5.3.2), so it is legal.
Your example is legal, but only because you're not actually using an out of bounds pointer.
Let's deal with out of bounds pointers first (because that's how I originally interpreted your question, before I noticed that the example uses a one-past-the-end pointer instead):
In general, you're not even allowed to create an out-of-bounds pointer. A pointer must point to an element within the array, or one past the end. Nowhere else.
The pointer is not even allowed to exist, which means you're obviously not allowed to dereference it either.
Here's what the standard has to say on the subject:
5.7:5:
When an expression that has integral
type is added to or subtracted from a
pointer, the result has the type of
the pointer operand. If the pointer
operand points to an element of an
array object, and the array is large
enough, the result points to an
element offset from the original
element such that the difference of
the subscripts of the resulting and
original array elements equals the
integral expression. In other words,
if the expression P points to the i-th
element of an array object, the
expressions (P)+N (equivalently,
N+(P)) and (P)-N (where N has the
value n) point to, respectively, the
i+n-th and i−n-th elements of the
array object, provided they exist.
Moreover, if the expression P points
to the last element of an array
object, the expression (P)+1 points
one past the last element of the array
object, and if the expression Q points
one past the last element of an array
object, the expression (Q)-1 points to
the last element of the array object.
If both the pointer operand and the
result point to elements of the same
array object, or one past the last
element of the array object, the
evaluation shall not produce an
overflow; otherwise, the behavior is
undefined.
(emphasis mine)
Of course, this is for operator+. So just to be sure, here's what the standard says about array subscripting:
5.2.1:1:
The expression E1[E2] is identical (by definition) to *((E1)+(E2))
Of course, there's an obvious caveat: Your example doesn't actually show an out-of-bounds pointer. it uses a "one past the end" pointer, which is different. The pointer is allowed to exist (as the above says), but the standard, as far as I can see, says nothing about dereferencing it. The closest I can find is 3.9.2:3:
[Note: for instance, the address one past the end of an array (5.7) would be considered to
point to an unrelated object of the array’s element type that might be located at that address. —end note ]
Which seems to me to imply that yes, you can legally dereference it, but the result of reading or writing to the location is unspecified.
Thanks to ilproxyil for correcting the last bit here, answering the last part of your question:
array + 5 doesn't actually
dereference anything, it simply
creates a pointer to one past the end
of array.
&array[4] + 1 dereferences
array+4 (which is perfectly safe),
takes the address of that lvalue, and
adds one to that address, which
results in a one-past-the-end pointer
(but that pointer never gets
dereferenced.
&array[5] dereferences array+5
(which as far as I can see is legal,
and results in "an unrelated object
of the array’s element type", as the
above said), and then takes the
address of that element, which also
seems legal enough.
So they don't do quite the same thing, although in this case, the end result is the same.
It is legal.
According to the gcc documentation for C++, &array[5] is legal. In both C++ and in C you may safely address the element one past the end of an array - you will get a valid pointer. So &array[5] as an expression is legal.
However, it is still undefined behavior to attempt to dereference pointers to unallocated memory, even if the pointer points to a valid address. So attempting to dereference the pointer generated by that expression is still undefined behavior (i.e. illegal) even though the pointer itself is valid.
In practice, I imagine it would usually not cause a crash, though.
Edit: By the way, this is generally how the end() iterator for STL containers is implemented (as a pointer to one-past-the-end), so that's a pretty good testament to the practice being legal.
Edit: Oh, now I see you're not really asking if holding a pointer to that address is legal, but if that exact way of obtaining the pointer is legal. I'll defer to the other answerers on that.
I believe that this is legal, and it depends on the 'lvalue to rvalue' conversion taking place. The last line Core issue 232 has the following:
We agreed that the approach in the standard seems okay: p = 0; *p; is not inherently an error. An lvalue-to-rvalue conversion would give it undefined behavior
Although this is slightly different example, what it does show is that the '*' does not result in lvalue to rvalue conversion and so, given that the expression is the immediate operand of '&' which expects an lvalue then the behaviour is defined.
I don't believe that it is illegal, but I do believe that the behaviour of &array[5] is undefined.
5.2.1 [expr.sub] E1[E2] is identical (by definition) to *((E1)+(E2))
5.3.1 [expr.unary.op] unary * operator ... the result is an lvalue referring to the object or function to which the expression points.
At this point you have undefined behaviour because the expression ((E1)+(E2)) didn't actually point to an object and the standard does say what the result should be unless it does.
1.3.12 [defns.undefined] Undefined behaviour may also be expected when this International Standard omits the description of any explicit definition of behaviour.
As noted elsewhere, array + 5 and &array[0] + 5 are valid and well defined ways of obtaining a pointer one beyond the end of array.
In addition to the above answers, I'll point out operator& can be overridden for classes. So even if it was valid for PODs, it probably isn't a good idea to do for an object you know isn't valid (much like overriding operator&() in the first place).
This is legal:
int array[5];
int *array_begin = &array[0];
int *array_end = &array[5];
Section 5.2.1 Subscripting The expression E1[E2] is identical (by definition) to *((E1)+(E2))
So by this we can say that array_end is equivalent too:
int *array_end = &(*((array) + 5)); // or &(*(array + 5))
Section 5.3.1.1 Unary operator '*': The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or
a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.
If the type of the expression is “pointer to T,” the type of the result is “T.” [ Note: a pointer to an incomplete type (other
than cv void) can be dereferenced. The lvalue thus obtained can be used in limited ways (to initialize a reference, for
example); this lvalue must not be converted to an rvalue, see 4.1. — end note ]
The important part of the above:
'the result is an lvalue referring to the object or function'.
The unary operator '*' is returning a lvalue referring to the int (no de-refeference). The unary operator '&' then gets the address of the lvalue.
As long as there is no de-referencing of an out of bounds pointer then the operation is fully covered by the standard and all behavior is defined. So by my reading the above is completely legal.
The fact that a lot of the STL algorithms depend on the behavior being well defined, is a sort of hint that the standards committee has already though of this and I am sure there is a something that covers this explicitly.
The comment section below presents two arguments:
(please read: but it is long and both of us end up trollish)
Argument 1
this is illegal because of section 5.7 paragraph 5
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
And though the section is relevant; it does not show undefined behavior. All the elements in the array we are talking about are either within the array or one past the end (which is well defined by the above paragraph).
Argument 2:
The second argument presented below is: * is the de-reference operator.
And though this is a common term used to describe the '*' operator; this term is deliberately avoided in the standard as the term 'de-reference' is not well defined in terms of the language and what that means to the underlying hardware.
Though accessing the memory one beyond the end of the array is definitely undefined behavior. I am not convinced the unary * operator accesses the memory (reads/writes to memory) in this context (not in a way the standard defines). In this context (as defined by the standard (see 5.3.1.1)) the unary * operator returns a lvalue referring to the object. In my understanding of the language this is not access to the underlying memory. The result of this expression is then immediately used by the unary & operator operator that returns the address of the object referred to by the lvalue referring to the object.
Many other references to Wikipedia and non canonical sources are presented. All of which I find irrelevant. C++ is defined by the standard.
Conclusion:
I am wiling to concede there are many parts of the standard that I may have not considered and may prove my above arguments wrong. NON are provided below. If you show me a standard reference that shows this is UB. I will
Leave the answer.
Put in all caps this is stupid and I am wrong for all to read.
This is not an argument:
Not everything in the entire world is defined by the C++ standard. Open your mind.
Working draft (n2798):
"The result of the unary & operator is
a pointer to its operand. The operand
shall be an lvalue or a qualified-id.
In the first case, if the type of the
expression is “T,” the type of the
result is “pointer to T.”" (p. 103)
array[5] is not a qualified-id as best I can tell (the list is on p. 87); the closest would seem to be identifier, but while array is an identifier array[5] is not. It is not an lvalue because "An lvalue refers to an object or function. " (p. 76). array[5] is obviously not a function, and is not guaranteed to refer to a valid object (because array + 5 is after the last allocated array element).
Obviously, it may work in certain cases, but it's not valid C++ or safe.
Note: It is legal to add to get one past the array (p. 113):
"if the expression P [a pointer]
points to the last element of an array
object, the expression (P)+1 points
one past the last element of the array
object, and if the expression Q points
one past the last element of an array
object, the expression (Q)-1 points to
the last element of the array object.
If both the pointer operand and the
result point to elements of the same
array object, or one past the last
element of the array object, the
evaluation shall not produce an
overflow"
But it is not legal to do so using &.
Even if it is legal, why depart from convention? array + 5 is shorter anyway, and in my opinion, more readable.
Edit: If you want it to by symmetric you can write
int* array_begin = array;
int* array_end = array + 5;
It should be undefined behaviour, for the following reasons:
Trying to access out-of-bounds elements results in undefined behaviour. Hence the standard does not forbid an implementation throwing an exception in that case (i.e. an implementation checking bounds before an element is accessed). If & (array[size]) were defined to be begin (array) + size, an implementation throwing an exception in case of out-of-bound access would not conform to the standard anymore.
It's impossible to make this yield end (array) if array is not an array but rather an arbitrary collection type.
C++ standard, 5.19, paragraph 4:
An address constant expression is a pointer to an lvalue....The pointer shall be created explicitly, using the unary & operator...or using an expression of array (4.2)...type. The subscripting operator []...can be used in the creation of an address constant expression, but the value of an object shall not be accessed by the use of these operators. If the subscripting operator is used, one of its operands shall be an integral constant expression.
Looks to me like &array[5] is legal C++, being an address constant expression.
If your example is NOT a general case but a specific one, then it is allowed. You can legally, AFAIK, move one past the allocated block of memory.
It does not work for a generic case though i.e where you are trying to access elements farther by 1 from the end of an array.
Just searched C-Faq : link text
It is perfectly legal.
The vector<> template class from the stl does exactly this when you call myVec.end(): it gets you a pointer (here as an iterator) which points one element past the end of the array.