Is &arr[size] valid? - c++

Let's say I have a function, called like this:
void mysort(int *arr, std::size_t size)
{
std::sort(&arr[0], &arr[size]);
}
int main()
{
int a[] = { 42, 314 };
mysort(a, 2);
}
My question is: does the code of mysort (more specifically, &arr[size]) have defined behaviour?
I know it would be perfectly valid if replaced by arr + size; pointer arithmetic allows pointing past-the-end normally. However, my question is specifically about the use of & and [].
Per C++11 5.2.1/1, arr[size] is equivalent to *(arr + size).
Quoting 5.3.1/1, the rules for unary *:
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an
object type, or a pointer to a function type and the result is an lvalue referring to the object or function
to which the expression points. If the type of the expression is “pointer to T,” the type of the result is “T.”
[ Note: a pointer to an incomplete type (other than cv void) can be dereferenced. The lvalue thus obtained
can be used in limited ways (to initialize a reference, for example); this lvalue must not be converted to a
prvalue, see 4.1. —end note ]
Finally, 5.3.1/3 giving the rules for &:
The result of the unary & operator is a pointer to its operand. The operand shall be an lvalue ... if the type of
the expression is T, the result has type “pointer to T” and is a prvalue that is the address of the designated object (1.7) or a pointer to the designated function.
(Emphasis and ellipses mine).
I can't quite make up my mind about this. I know for sure that forcing an lvalue-to-rvalue conversion on arr[size] would be Undefined. But no such conversion happens in the code. arr + size does not point to an object; but while the paragraphs above talk about objects, they never seem to explicitly call out the necessity for an object to actually exist at that location (unlike e.g. the lvalue-to-rvalue conversion in 4.1/1).
So, the questio is: is mysort, the way it's called, valid or not?
(Note that I'm quoting C++11 above, but if this is handled more explicitly in a later standard/draft, I would be perfectly happy with that).

It's not valid. You bolded "result is an lvalue referring to the object or function to which the expression points" in your question. That's exactly the problem. array + size is a valid pointer value that does not point to an object. Therefore, your quote about *(array + size) does not specify what the result refers to, and that then means there is no requirement for &*(array + size) to give the same value as array + size.
In C, this was considered a defect and fixed so that the spec now says in &*ptr, neither & nor * gets evaluated. C++ hasn't yet received fixed wording. It's the subject of a very old still active DR: DR #232. The intent is that it is valid, just as it is in C, but the standard doesn't say so.

In the context of normal C++ arrays, yes. It is legal to form the address of the one-past-the-end element of the array. It is not legal to read or write to what it is pointing at, however (after all, there is no actual element there). So when you do the &arr[size], the arr[size] forms what you might think of as a reference to the one-past-the-end element, but has not tried to actually access that element yet. Then the & gets you the address of that element. Since nothing has tried to actually follow that pointer, nothing bad has happened.
This isn't by accident, this makes pointers into arrays behave like iterators. Thus &a[0] is essentially .begin() on the array, and &a[size] (where size is the number of elements in the array) is essentially .end(). (See also std::array where this ends up being more explicit)
Edit: Erm, I may have to retract this answer. While it probably applies in most cases, if the type stored in the array has an overridden operator& then when you do the &a[size], the operator& method may attempt to access members of the instance of the type at a[size] where there is no instance.

Assuming size is the actual array size, you are passing a pointer to past-the-end element to std::sort().
So, as I understand it, the question boils down to: is this pointer equivalent to arr.end()?
There is little doubt this is true for every existing compiler, since array iterators are indeed plain old pointers, so &arr[size] is the obvious choice for arr.end().
However, I doubt there is a specific requirement about the actual implementation of plain old array iterators.
So, for the sake of the argument, you could imagine a compiler using a "past end" bit in addition to the actual address to implement plain old array iterators internally and perversely paint your mustache pink if it detected any concievable inconsistency between iterators and addresses obtained through pointer arithmetics.
This freakish compiler would cause a lot of existing C++ code to crash without actually violating the spec, which might just be worth the effort of designing it...

If we admit that arr[i] is just a shorthand for *(arr + i), we can rewrite &arr[size] as &*(arr + size). Hence, we are dereferencing a pointer that points to the past-the-end element, which leads to an undefined behavior. As you correctly say, arr + size would instead be legal, because no dereferencing operation takes place.
Coincidentally, this is also presented as a quiz in Stepanov's notes (page 11).

It's perfectly fine and well defined as long as size is not larger than the size of the actual array (in units of the array elements).
So if main () called mysort (a, 100), &arr [size] would already be undefined behaviour (but most likely undetected, but std::sort would obviously go wrong badly as well).

Related

Is incrementing a pointer to a 0-sized dynamic array undefined?

AFAIK, although we cannot create a 0-sized static-memory array, but we can do it with dynamic ones:
int a[0]{}; // Compile-time error
int* p = new int[0]; // Is well-defined
As I've read, p acts like one-past-end element. I can print the address that p points to.
if(p)
cout << p << endl;
Although I am sure of we cannot dereference that pointer (past-last-element) as we cannot with iterators (past-last element), but what I am not sure of is whether incrementing that pointer p? Is an undefined behaviour (UB) like with iterators?
p++; // UB?
Pointers to elements of arrays are allowed to point to a valid element, or one past the end. If you increment a pointer in a way that goes more than one past the end, the behavior is undefined.
For your 0-sized array, p is already pointing one past the end, so incrementing it is not allowed.
See C++17 8.7/4 regarding the + operator (++ has the same restrictions):
f the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n; otherwise, the behavior is undefined.
I guess you've already have the answer; If you look a bit deeper: You've said that incrementing an off-the-end iterator is UB thus: This answer is in what is an iterator?
The iterator is just an object that has a pointer and incrementing that iterator is really incrementing the pointer it has. Thus in many aspects an iterator is handled in terms of a pointer.
int arr[] = {0,1,2,3,4,5,6,7,8,9};
int *p = arr; // p points to the first element in arr
++p; // p points to arr[1]
Just as we can use iterators to traverse the elements in a vector, we can use pointers to traverse the elements in an array. Of course, to do so, we need to obtain pointers to the first and one past the last element. As we’ve just seen, we can obtain a pointer to the first element by using the array itself or by taking the address-of the first element. We can obtain an off-the-end pointer by using another special property of arrays. We can take the address of the nonexistent element one past the last element of an array:
int *e = &arr[10]; // pointer just past the last element in arr
Here we used the subscript operator to index a nonexisting element; arr has ten elements, so the last element in arr is at index position 9. The only thing we can do with this element is take its address, which we do to initialize e. Like an off-the-end iterator (§ 3.4.1, p. 106), an off-the-end pointer does not point to an element. As a result, we may not dereference or increment an off-the-end pointer.
This is from C++ primer 5 edition by Lipmann.
So it is UB don't do it.
In the strictest sense, this is not Undefined Behavior, but implementation-defined. So, although inadvisable if you plan to support non-mainstream architectures, you can probably do it.
The standard quote given by interjay is a good one, indicating UB, but it is only the second best hit in my opinion, since it deals with pointer-pointer arithmetic (funnily, one is explicitly UB, while the other isn't). There is a paragraph dealing with the operation in the question directly:
[expr.post.incr] / [expr.pre.incr]
The operand shall be [...] or a pointer to a completely-defined object type.
Oh, wait a moment, a completely-defined object type? That's all? I mean, really, type? So you don't need an object at all?
It takes quite a bit of reading to actually find a hint that something in there might not be quite so well-defined. Because so far, it reads as if you are perfectly allowed to do it, no restrictions.
[basic.compound] 3 makes a statement about what type of pointer one may have, and being none of the other three, the result of your operation would clearly fall under 3.4: invalid pointer.
It however doesn't say that you aren't allowed to have an invalid pointer. On the contrary, it lists some very common, normal conditions (e.g. end of storage duration) where pointers regularly become invalid. So that's apparently an allowable thing to happen. And indeed:
[basic.stc] 4
Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.
We are doing an "any other" there, so it's not Undefined Behavior, but implementation-defined, thus generally allowable (unless the implementation explicitly says something different).
Unluckily, that's not the end of the story. Although the net result doesn't change any more from here on, it gets more confusing, the longer you search for "pointer":
[basic.compound]
A valid value of an object pointer type represents either the address of a byte in memory or a null pointer. If an object of type T is located at an address A [...] is said to point to that object, regardless of how the value was obtained.
[ Note: For instance, the address one past the end of an array would be considered to point to an unrelated object of the array's element type that might be located at that address. [...]].
Read as: OK, who cares! As long as a pointer points somewhere in memory, I'm good?
[basic.stc.dynamic.safety]
A pointer value is a safely-derived pointer [blah blah]
Read as: OK, safely-derived, whatever. It doesn't explain what this is, nor does it say I actually need it. Safely-derived-the-heck. Apparently I can still have non-safely-derived pointers just fine. I'm guessing that dereferencing them would probably not be such a good idea, but it's perfectly allowable to have them. It doesn't say otherwise.
An implementation may have relaxed pointer safety, in which case the validity of a pointer value does not depend on whether it is a safely-derived pointer value.
Oh, so it may not matter, just what I thought. But wait... "may not"? That means, it may as well. How do I know?
Alternatively, an implementation may have strict pointer safety, in which case a pointer value that is not a safely-derived pointer value is an invalid pointer value unless the referenced complete object is of dynamic storage duration and has previously been declared reachable
Wait, so it's even possible that I need to call declare_reachable() on every pointer? How do I know?
Now, you can convert to intptr_t, which is well-defined, giving an integer representation of a safely-derived pointer. For which, of course, being an integer, it is perfectly legitimate and well-defined to increment it as you please.
And yes, you can convert the intptr_t back to a pointer, which is also well-defined. Only just, not being the original value, it is no longer guaranteed that you have a safely-derived pointer (obviously). Still, all in all, to the letter of the standard, while being implementation-defined, this is a 100% legitimate thing to do:
[expr.reinterpret.cast] 5
A value of integral type or enumeration type can be explicitly converted to a pointer. A pointer converted to an integer of sufficient size [...] and back to the same pointer type [...] original value; mappings between pointers and integers are otherwise implementation-defined.
The catch
Pointers are just ordinary integers, only you happen to use them as pointers. Oh if only that was true!
Unluckily, there exist architectures where that isn't true at all, and merely generating an invalid pointer (not dereferencing it, just having it in a pointer register) will cause a trap.
So that's the base of "implementation defined". That, and the fact that incrementing a pointer whenever you want, as you please could of course cause overflow, which the standard doesn't want to deal with. The end of application address space may not coincide with the location of overflow, and you do not even know whether there is any such thing as overflow for pointers on a particular architecture. All in all it's a nightmarish mess not in any relation of the possible benefits.
Dealing with the one-past-object condition on the other hand side, is easy: The implementation must simply make sure no object is ever allocated so the last byte in the address space is occupied. So that's well-defined as it's useful and trivial to guarantee.

std::launder in conjunction with reinterpret_cast [duplicate]

The current draft standard (and presumably C++17) say in [basic.compound/4]:
[ Note: An array object and its first element are not pointer-interconvertible, even though they have the same address. — end note ]
So a pointer to an object cannot be reinterpret_cast'd to get its enclosing array pointer.
Now, there is std::launder, [ptr.launder/1]:
template<class T> [[nodiscard]] constexpr T* launder(T* p) noexcept;
Requires: p represents the address A of a byte in memory. An object X that is within its lifetime and whose type is similar to T is located at the address A. All bytes of storage that would be reachable through the result are reachable through p (see below).
And the definion of reachable is in [ptr.launder/3]:
Remarks: An invocation of this function may be used in a core constant expression whenever the value of its argument may be used in a core constant expression. A byte of storage is reachable through a pointer value that points to an object Y if it is within the storage occupied by Y, an object that is pointer-interconvertible with Y, or the immediately-enclosing array object if Y is an array element. The program is ill-formed if T is a function type or cv void.
Now, at first sight, it seems that std::launder is can be used to do the aforementioned conversion, because of the part I've put emphasis.
But. If p points to an object of an array, the bytes of the array is reachable according to this definition (even though p is not pointer-interconvertible to array-pointer), just like the result of the launder. So, it seems that the definition doesn't say anything about this issue.
So, can std::launder be used to convert an object pointer to its enclosing array pointer?
This depends on whether the enclosing array object is a complete object, and if not, whether you can validly access more bytes through a pointer to that enclosing array object (e.g., because it's an array element itself, or pointer-interconvertible with a larger object, or pointer-interconvertible with an object that's an array element). The "reachable" requirement means that you cannot use launder to obtain a pointer that would allow you to access more bytes than the source pointer value allows, on pain of undefined behavior. This ensures that the possibility that some unknown code may call launder does not affect the compiler's escape analysis.
I suppose some examples could help. Each example below reinterpret_casts a int* pointing to the first element of an array of 10 ints into a int(*)[10]. Since they are not pointer-interconvertible, the reinterpret_cast does not change the pointer value, and you get a int(*)[10] with the value of "pointer to the first element of (whatever the array is)". Each example then attempts to obtain a pointer to the entire array by calling std::launder on the cast pointer.
int x[10];
auto p = std::launder(reinterpret_cast<int(*)[10]>(&x[0]));
This is OK; you can access all elements of x through the source pointer, and the result of the launder doesn't allow you to access anything else.
int x2[2][10];
auto p2 = std::launder(reinterpret_cast<int(*)[10]>(&x2[0][0]));
This is undefined. You can only access elements of x2[0] through the source pointer, but the result (which would be a pointer to x2[0]) would have allowed you to access x2[1], which you can't through the source.
struct X { int a[10]; } x3, x4[2]; // assume no padding
auto p3 = std::launder(reinterpret_cast<int(*)[10]>(&x3.a[0])); // OK
This is OK. Again, you can't access through a pointer to x3.a any byte you can't access already.
auto p4 = std::launder(reinterpret_cast<int(*)[10]>(&x4[0].a[0]));
This is (intended to be) undefined. You would have been able to reach x4[1] from the result because x4[0].a is pointer-interconvertible with x4[0], so a pointer to the former can be reinterpret_cast to yield a pointer to the latter, which then can be used for pointer arithmetic. See https://wg21.link/LWG2859.
struct Y { int a[10]; double y; } x5;
auto p3 = std::launder(reinterpret_cast<int(*)[10]>(&x5.a[0]));
And this is again undefined, because you would have been able to reach x5.y from the resulting pointer (by reinterpret_cast to a Y*) but the source pointer can't be used to access it.
Remark: any non schizophrenic compiler will probably gladly accept that, as it would accept a C-style cast or a re-interpret cast, so just try and see is not an option.
But IMHO, the answer to your question is no. The emphasized immediately-enclosing array object if Y is an array element lies in a Remark paragraph, not in the Requires one. That means that provided the requires section is respected, the remarks one also applies. As an array and its element type are not similar types, the requirement is not satisfied and std::launder cannot be used.
What follows is more of a general (philosophycal?) interpretation. At the time of K&R C (in the 70's), C was intended to be able to replace assembly language. For that reason the rule was: the compiler must obey the programmer provided the source code can be translated. So no strict aliasing rule and a pointer was no more that an address with additional arithmetics rules. This strongly changed in C99 and C++03 (not speaking of C++11 +). Programmers are now supposed to use C++ as a high level language. That means that a pointer is just an object that allows to access another object of a given type, and an array and its element type are totally different types. Memory addresses are now little more than implementation details. So trying to convert a pointer to an array to a pointer to its first element is then against the philosophy of the language and could bite the programmer in a later version of the compiler. Of course real life compiler still accept it for compatibility reasons, but we should not even try to use it in modern programs.

Can std::launder be used to convert an object pointer to its enclosing array pointer?

The current draft standard (and presumably C++17) say in [basic.compound/4]:
[ Note: An array object and its first element are not pointer-interconvertible, even though they have the same address. — end note ]
So a pointer to an object cannot be reinterpret_cast'd to get its enclosing array pointer.
Now, there is std::launder, [ptr.launder/1]:
template<class T> [[nodiscard]] constexpr T* launder(T* p) noexcept;
Requires: p represents the address A of a byte in memory. An object X that is within its lifetime and whose type is similar to T is located at the address A. All bytes of storage that would be reachable through the result are reachable through p (see below).
And the definion of reachable is in [ptr.launder/3]:
Remarks: An invocation of this function may be used in a core constant expression whenever the value of its argument may be used in a core constant expression. A byte of storage is reachable through a pointer value that points to an object Y if it is within the storage occupied by Y, an object that is pointer-interconvertible with Y, or the immediately-enclosing array object if Y is an array element. The program is ill-formed if T is a function type or cv void.
Now, at first sight, it seems that std::launder is can be used to do the aforementioned conversion, because of the part I've put emphasis.
But. If p points to an object of an array, the bytes of the array is reachable according to this definition (even though p is not pointer-interconvertible to array-pointer), just like the result of the launder. So, it seems that the definition doesn't say anything about this issue.
So, can std::launder be used to convert an object pointer to its enclosing array pointer?
This depends on whether the enclosing array object is a complete object, and if not, whether you can validly access more bytes through a pointer to that enclosing array object (e.g., because it's an array element itself, or pointer-interconvertible with a larger object, or pointer-interconvertible with an object that's an array element). The "reachable" requirement means that you cannot use launder to obtain a pointer that would allow you to access more bytes than the source pointer value allows, on pain of undefined behavior. This ensures that the possibility that some unknown code may call launder does not affect the compiler's escape analysis.
I suppose some examples could help. Each example below reinterpret_casts a int* pointing to the first element of an array of 10 ints into a int(*)[10]. Since they are not pointer-interconvertible, the reinterpret_cast does not change the pointer value, and you get a int(*)[10] with the value of "pointer to the first element of (whatever the array is)". Each example then attempts to obtain a pointer to the entire array by calling std::launder on the cast pointer.
int x[10];
auto p = std::launder(reinterpret_cast<int(*)[10]>(&x[0]));
This is OK; you can access all elements of x through the source pointer, and the result of the launder doesn't allow you to access anything else.
int x2[2][10];
auto p2 = std::launder(reinterpret_cast<int(*)[10]>(&x2[0][0]));
This is undefined. You can only access elements of x2[0] through the source pointer, but the result (which would be a pointer to x2[0]) would have allowed you to access x2[1], which you can't through the source.
struct X { int a[10]; } x3, x4[2]; // assume no padding
auto p3 = std::launder(reinterpret_cast<int(*)[10]>(&x3.a[0])); // OK
This is OK. Again, you can't access through a pointer to x3.a any byte you can't access already.
auto p4 = std::launder(reinterpret_cast<int(*)[10]>(&x4[0].a[0]));
This is (intended to be) undefined. You would have been able to reach x4[1] from the result because x4[0].a is pointer-interconvertible with x4[0], so a pointer to the former can be reinterpret_cast to yield a pointer to the latter, which then can be used for pointer arithmetic. See https://wg21.link/LWG2859.
struct Y { int a[10]; double y; } x5;
auto p3 = std::launder(reinterpret_cast<int(*)[10]>(&x5.a[0]));
And this is again undefined, because you would have been able to reach x5.y from the resulting pointer (by reinterpret_cast to a Y*) but the source pointer can't be used to access it.
Remark: any non schizophrenic compiler will probably gladly accept that, as it would accept a C-style cast or a re-interpret cast, so just try and see is not an option.
But IMHO, the answer to your question is no. The emphasized immediately-enclosing array object if Y is an array element lies in a Remark paragraph, not in the Requires one. That means that provided the requires section is respected, the remarks one also applies. As an array and its element type are not similar types, the requirement is not satisfied and std::launder cannot be used.
What follows is more of a general (philosophycal?) interpretation. At the time of K&R C (in the 70's), C was intended to be able to replace assembly language. For that reason the rule was: the compiler must obey the programmer provided the source code can be translated. So no strict aliasing rule and a pointer was no more that an address with additional arithmetics rules. This strongly changed in C99 and C++03 (not speaking of C++11 +). Programmers are now supposed to use C++ as a high level language. That means that a pointer is just an object that allows to access another object of a given type, and an array and its element type are totally different types. Memory addresses are now little more than implementation details. So trying to convert a pointer to an array to a pointer to its first element is then against the philosophy of the language and could bite the programmer in a later version of the compiler. Of course real life compiler still accept it for compatibility reasons, but we should not even try to use it in modern programs.

Obtaining a past-the-end pointer using the address of an array

In C and C++, it is often useful to use a past-the-end pointer to write functions that can operate on arbitrarily large arrays. C++ gives a std::end overload to make this easier. In C, on the other hand, I've found it's not uncommon to see a macro defined and used like this:
#define ARRAYLEN(array) (sizeof(array)/sizeof(array[0]))
// ...
int a [42];
do_something (a, a + ARRAYLEN (a));
I've also seen a pointer arithmetic trick used to let such functions operate on single objects:
int b;
do_something (&b, &b + 1);
It occured to me that something similar could be done with arrays, since they are considered by C (and, I believe, C++) to be "complete objects." Given an array, we can derive a pointer to an array immediately after it, dereference that pointer, and use array-to-pointer conversion on the resulting reference to an array to get a past-the-end pointer for the original array:
#define END(array) (*(&array + 1))
// ...
int a [42];
do_something (a, END (a));
My question is this: In dereferencing a pointer to a non-existent array object, does this code exhibit undefined behaviour? I'm interested in what the most recent revisions of both C and C++ have to say about this code (not because I intend to use it, as there are better ways of achieving the same result, but because it's an interesting question).
I've used that in my own code, as (&arr)[1].
I'm quite sure it is safe. Array to pointer decay is not "lvalue-to-rvalue conversion", although it starts with an lvalue and ends with an rvalue.
It is undefined behaviour.
a is of type array of 42 int.
&a is of type pointer to array of 42 int. (Note this is not an array-to-pointer conversion)
&a + 1 is also of type pointer to array of 42 int
5.7p5 states:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and [...] otherwise, the behavior is undefined
The pointer does not point to an element of an array object. It points to an array object. So the "otherwise, the behaviour is undefined" is true. Behaviour is undefined.
It is undefined behavior in C, dereferencing a pointer that points beyond an existing object always is unless it is itself part of a bigger object that contains more elements.
But the basic idea of using &array + 1 is correct, whenever array is an lvalue. (There are cases where arrays aren't lvalues.) In that case that is a valid pointer operation. Now to obtain a pointer to the first element you just have to cast that back to the base type. In your case that would be
(int*)(&array + 1)
The value of a pointer to array is guaranteed to be the same value as a pointer to its first element, only the types differ.
Unfortunately I don't see a way to make such an expression type agnostic such that you could put this in a generic macro, unless you cast to void*. (With the gcc typeof extension you could do, e.g)
So you'd better stick to the portable (array)+ARRAYLEN(array), that one should work in all cases.
In a weird corner case an array that is part of a struct and is returned as rvalue from a function is not an lvalue. I think that the standard allows pointer arithmetic here, too, bu t I never understood that construction completely, so I am not sure that it will work in that case.

Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?

I have seen it asserted several times now that the following code is not allowed by the C++ Standard:
int array[5];
int *array_begin = &array[0];
int *array_end = &array[5];
Is &array[5] legal C++ code in this context?
I would like an answer with a reference to the Standard if possible.
It would also be interesting to know if it meets the C standard. And if it isn't standard C++, why was the decision made to treat it differently from array + 5 or &array[4] + 1?
Yes, it's legal. From the C99 draft standard:
§6.5.2.1, paragraph 2:
A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero).
§6.5.3.2, paragraph 3 (emphasis mine):
The unary & operator yields the address of its operand. If the operand has type ‘‘type’’,
the result has type ‘‘pointer to type’’. If the operand is the result of a unary * operator,
neither that operator nor the & operator is evaluated and the result is as if both were
omitted, except that the constraints on the operators still apply and the result is not an
lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator
were removed and the [] operator were changed to a + operator. Otherwise, the result is
a pointer to the object or function designated by its operand.
§6.5.6, paragraph 8:
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist. Moreover, if the expression P points to the last
element of an array object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element of an array object,
the expression (Q)-1 points to the last element of the array object. If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
Note that the standard explicitly allows pointers to point one element past the end of the array, provided that they are not dereferenced. By 6.5.2.1 and 6.5.3.2, the expression &array[5] is equivalent to &*(array + 5), which is equivalent to (array+5), which points one past the end of the array. This does not result in a dereference (by 6.5.3.2), so it is legal.
Your example is legal, but only because you're not actually using an out of bounds pointer.
Let's deal with out of bounds pointers first (because that's how I originally interpreted your question, before I noticed that the example uses a one-past-the-end pointer instead):
In general, you're not even allowed to create an out-of-bounds pointer. A pointer must point to an element within the array, or one past the end. Nowhere else.
The pointer is not even allowed to exist, which means you're obviously not allowed to dereference it either.
Here's what the standard has to say on the subject:
5.7:5:
When an expression that has integral
type is added to or subtracted from a
pointer, the result has the type of
the pointer operand. If the pointer
operand points to an element of an
array object, and the array is large
enough, the result points to an
element offset from the original
element such that the difference of
the subscripts of the resulting and
original array elements equals the
integral expression. In other words,
if the expression P points to the i-th
element of an array object, the
expressions (P)+N (equivalently,
N+(P)) and (P)-N (where N has the
value n) point to, respectively, the
i+n-th and i−n-th elements of the
array object, provided they exist.
Moreover, if the expression P points
to the last element of an array
object, the expression (P)+1 points
one past the last element of the array
object, and if the expression Q points
one past the last element of an array
object, the expression (Q)-1 points to
the last element of the array object.
If both the pointer operand and the
result point to elements of the same
array object, or one past the last
element of the array object, the
evaluation shall not produce an
overflow; otherwise, the behavior is
undefined.
(emphasis mine)
Of course, this is for operator+. So just to be sure, here's what the standard says about array subscripting:
5.2.1:1:
The expression E1[E2] is identical (by definition) to *((E1)+(E2))
Of course, there's an obvious caveat: Your example doesn't actually show an out-of-bounds pointer. it uses a "one past the end" pointer, which is different. The pointer is allowed to exist (as the above says), but the standard, as far as I can see, says nothing about dereferencing it. The closest I can find is 3.9.2:3:
[Note: for instance, the address one past the end of an array (5.7) would be considered to
point to an unrelated object of the array’s element type that might be located at that address. —end note ]
Which seems to me to imply that yes, you can legally dereference it, but the result of reading or writing to the location is unspecified.
Thanks to ilproxyil for correcting the last bit here, answering the last part of your question:
array + 5 doesn't actually
dereference anything, it simply
creates a pointer to one past the end
of array.
&array[4] + 1 dereferences
array+4 (which is perfectly safe),
takes the address of that lvalue, and
adds one to that address, which
results in a one-past-the-end pointer
(but that pointer never gets
dereferenced.
&array[5] dereferences array+5
(which as far as I can see is legal,
and results in "an unrelated object
of the array’s element type", as the
above said), and then takes the
address of that element, which also
seems legal enough.
So they don't do quite the same thing, although in this case, the end result is the same.
It is legal.
According to the gcc documentation for C++, &array[5] is legal. In both C++ and in C you may safely address the element one past the end of an array - you will get a valid pointer. So &array[5] as an expression is legal.
However, it is still undefined behavior to attempt to dereference pointers to unallocated memory, even if the pointer points to a valid address. So attempting to dereference the pointer generated by that expression is still undefined behavior (i.e. illegal) even though the pointer itself is valid.
In practice, I imagine it would usually not cause a crash, though.
Edit: By the way, this is generally how the end() iterator for STL containers is implemented (as a pointer to one-past-the-end), so that's a pretty good testament to the practice being legal.
Edit: Oh, now I see you're not really asking if holding a pointer to that address is legal, but if that exact way of obtaining the pointer is legal. I'll defer to the other answerers on that.
I believe that this is legal, and it depends on the 'lvalue to rvalue' conversion taking place. The last line Core issue 232 has the following:
We agreed that the approach in the standard seems okay: p = 0; *p; is not inherently an error. An lvalue-to-rvalue conversion would give it undefined behavior
Although this is slightly different example, what it does show is that the '*' does not result in lvalue to rvalue conversion and so, given that the expression is the immediate operand of '&' which expects an lvalue then the behaviour is defined.
I don't believe that it is illegal, but I do believe that the behaviour of &array[5] is undefined.
5.2.1 [expr.sub] E1[E2] is identical (by definition) to *((E1)+(E2))
5.3.1 [expr.unary.op] unary * operator ... the result is an lvalue referring to the object or function to which the expression points.
At this point you have undefined behaviour because the expression ((E1)+(E2)) didn't actually point to an object and the standard does say what the result should be unless it does.
1.3.12 [defns.undefined] Undefined behaviour may also be expected when this International Standard omits the description of any explicit definition of behaviour.
As noted elsewhere, array + 5 and &array[0] + 5 are valid and well defined ways of obtaining a pointer one beyond the end of array.
In addition to the above answers, I'll point out operator& can be overridden for classes. So even if it was valid for PODs, it probably isn't a good idea to do for an object you know isn't valid (much like overriding operator&() in the first place).
This is legal:
int array[5];
int *array_begin = &array[0];
int *array_end = &array[5];
Section 5.2.1 Subscripting The expression E1[E2] is identical (by definition) to *((E1)+(E2))
So by this we can say that array_end is equivalent too:
int *array_end = &(*((array) + 5)); // or &(*(array + 5))
Section 5.3.1.1 Unary operator '*': The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or
a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.
If the type of the expression is “pointer to T,” the type of the result is “T.” [ Note: a pointer to an incomplete type (other
than cv void) can be dereferenced. The lvalue thus obtained can be used in limited ways (to initialize a reference, for
example); this lvalue must not be converted to an rvalue, see 4.1. — end note ]
The important part of the above:
'the result is an lvalue referring to the object or function'.
The unary operator '*' is returning a lvalue referring to the int (no de-refeference). The unary operator '&' then gets the address of the lvalue.
As long as there is no de-referencing of an out of bounds pointer then the operation is fully covered by the standard and all behavior is defined. So by my reading the above is completely legal.
The fact that a lot of the STL algorithms depend on the behavior being well defined, is a sort of hint that the standards committee has already though of this and I am sure there is a something that covers this explicitly.
The comment section below presents two arguments:
(please read: but it is long and both of us end up trollish)
Argument 1
this is illegal because of section 5.7 paragraph 5
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
And though the section is relevant; it does not show undefined behavior. All the elements in the array we are talking about are either within the array or one past the end (which is well defined by the above paragraph).
Argument 2:
The second argument presented below is: * is the de-reference operator.
And though this is a common term used to describe the '*' operator; this term is deliberately avoided in the standard as the term 'de-reference' is not well defined in terms of the language and what that means to the underlying hardware.
Though accessing the memory one beyond the end of the array is definitely undefined behavior. I am not convinced the unary * operator accesses the memory (reads/writes to memory) in this context (not in a way the standard defines). In this context (as defined by the standard (see 5.3.1.1)) the unary * operator returns a lvalue referring to the object. In my understanding of the language this is not access to the underlying memory. The result of this expression is then immediately used by the unary & operator operator that returns the address of the object referred to by the lvalue referring to the object.
Many other references to Wikipedia and non canonical sources are presented. All of which I find irrelevant. C++ is defined by the standard.
Conclusion:
I am wiling to concede there are many parts of the standard that I may have not considered and may prove my above arguments wrong. NON are provided below. If you show me a standard reference that shows this is UB. I will
Leave the answer.
Put in all caps this is stupid and I am wrong for all to read.
This is not an argument:
Not everything in the entire world is defined by the C++ standard. Open your mind.
Working draft (n2798):
"The result of the unary & operator is
a pointer to its operand. The operand
shall be an lvalue or a qualified-id.
In the first case, if the type of the
expression is “T,” the type of the
result is “pointer to T.”" (p. 103)
array[5] is not a qualified-id as best I can tell (the list is on p. 87); the closest would seem to be identifier, but while array is an identifier array[5] is not. It is not an lvalue because "An lvalue refers to an object or function. " (p. 76). array[5] is obviously not a function, and is not guaranteed to refer to a valid object (because array + 5 is after the last allocated array element).
Obviously, it may work in certain cases, but it's not valid C++ or safe.
Note: It is legal to add to get one past the array (p. 113):
"if the expression P [a pointer]
points to the last element of an array
object, the expression (P)+1 points
one past the last element of the array
object, and if the expression Q points
one past the last element of an array
object, the expression (Q)-1 points to
the last element of the array object.
If both the pointer operand and the
result point to elements of the same
array object, or one past the last
element of the array object, the
evaluation shall not produce an
overflow"
But it is not legal to do so using &.
Even if it is legal, why depart from convention? array + 5 is shorter anyway, and in my opinion, more readable.
Edit: If you want it to by symmetric you can write
int* array_begin = array;
int* array_end = array + 5;
It should be undefined behaviour, for the following reasons:
Trying to access out-of-bounds elements results in undefined behaviour. Hence the standard does not forbid an implementation throwing an exception in that case (i.e. an implementation checking bounds before an element is accessed). If & (array[size]) were defined to be begin (array) + size, an implementation throwing an exception in case of out-of-bound access would not conform to the standard anymore.
It's impossible to make this yield end (array) if array is not an array but rather an arbitrary collection type.
C++ standard, 5.19, paragraph 4:
An address constant expression is a pointer to an lvalue....The pointer shall be created explicitly, using the unary & operator...or using an expression of array (4.2)...type. The subscripting operator []...can be used in the creation of an address constant expression, but the value of an object shall not be accessed by the use of these operators. If the subscripting operator is used, one of its operands shall be an integral constant expression.
Looks to me like &array[5] is legal C++, being an address constant expression.
If your example is NOT a general case but a specific one, then it is allowed. You can legally, AFAIK, move one past the allocated block of memory.
It does not work for a generic case though i.e where you are trying to access elements farther by 1 from the end of an array.
Just searched C-Faq : link text
It is perfectly legal.
The vector<> template class from the stl does exactly this when you call myVec.end(): it gets you a pointer (here as an iterator) which points one element past the end of the array.