C++ issue about arrays [duplicate] - c++

This question already has answers here:
With arrays, why is it the case that a[5] == 5[a]?
(20 answers)
Closed 6 years ago.
Can anyone explain me why this is possible ?.
I understand that a is the same as &a[0] and for example i also understand that
*(a+1) is second a element and i am able to use *((a+i)+j) techniques, but first time in my life i saw code like this and i am confused now.
I know that output is fourth element of the a array. But can anyone explain me why this is possible ?
#include<iostream>
int a[] = {1,2,3,4,5};
std::cout << (3)[a] << std::endl;

The binary expression:
x[y]
is defined to be equal to:
*(x+y)
(in the absence of operator [] overloads).
3+a is the same as a+3 so 3[a] is the same as a[3].
In the context of 3+a, the name of array a decays to a pointer to its first element, so we are trying to add a pointer and an integer using the builtin addition operator. The latest draft of C++14 says in section 5.7 [expr.add] para 4:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type
of the pointer operand. If the pointer operand points to an element of an array, and the array is
large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if
the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P))
and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array
object, provided they exist
This is very close to the text of the C++14 standard, but this behaviour has been stable since C89 (and actually since BCPL in the '70s).
Having said that, if you find code like this hunt down the author and beat him over the head with a clue-bat. It is very easy to write C++ that nobody (including yourself after a week) can read - and that is no use to anyone.

Related

Does C or C++ guarantee array < array + SIZE?

Suppose you have an array:
int array[SIZE];
or
int *array = new(int[SIZE]);
Does C or C++ guarantee that array < array + SIZE, and if so where?
I understand that regardless of the language spec, many operating systems guarantee this property by reserving the top of the virtual address space for the kernel. My question is whether this is also guaranteed by the language, rather than just by the vast majority of implementations.
As an example, suppose an OS kernel lives in low memory and sometimes gives the highest page of virtual memory out to user processes in response to mmap requests for anonymous memory. If malloc or ::operator new[] directly calls mmap for the allocation of a huge array, and the end of the array abuts the top of the virtual address space such that array + SIZE wraps around to zero, does this amount to a non-compliant implementation of the language?
Clarification
Note that the question is not asking about array+(SIZE-1), which is the address of the last element of the array. That one is guaranteed to be greater than array. The question is about a pointer one past the end of an array, or also p+1 when p is a pointer to a non-array object (which the section of the standard pointed to by the selected answer makes clear is treated the same way).
Stackoverflow has asked me to clarify why this question is not the same as this one. The other question asks how to implement total ordering of pointers. That other question essentially boils down to how could a library implement std::less such that it works even for pointers to differently allocated objects, which the standard says can only be compared for equality, not greater and less than.
In contrast, my question was about whether one past the end of an array is always guaranteed to be greater than the array. Whether the answer to my question is yes or no doesn't actually change how you would implement std::less, so the other question doesn't seem relevant. If it's illegal to compare to one past the end of an array, then std::less could simply exhibit undefined behavior in this case. (Also, typically the standard library is implemented by the same people as the compiler, and so is free to take advantage of properties of the particular compiler.)
Yes. From section 6.5.8 para 5.
If the expression P points to an element of an array object
and the expression Q points to the last element of the same array
object, the pointer expression Q+1 compares greater than P.
Expression array is P. The expression array + SIZE - 1 points to the last element of array, which is Q.
Thus:
array + SIZE = array + SIZE - 1 + 1 = Q + 1 > P = array
C requires this. Section 6.5.8 para 5 says:
pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values
I'm sure there's something analogous in the C++ specification.
This requirement effectively prevents allocating objects that wrap around the address space on common hardware, because it would be impractical to implement all the bookkeeping necessary to implement the relational operator efficiently.
The guarantee does not hold for the case int *array = new(int[SIZE]); when SIZE is zero .
The result of new int[0] is required to be a valid pointer that can have 0 added to it , but array == array + SIZE in this case, and a strictly less-than test will yield false.
This is defined in C++, from 7.6.6.4 (p139 of current C++23 draft):
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
(4.1) — If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
(4.2) — Otherwise, if P points to an array element i of an array object x with n elements (9.3.4.5) the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i + j of x if 0 <= i + j <= n and the expression P - J points to the (possibly-hypothetical) array element i − j of x if 0 <= i − j <= n.
(4.3) — Otherwise, the behavior is undefined.
Note that 4.2 explicitly has "<= n", not "< n". It's undefined for any value larger than size(), but is defined for size().
The ordering of array elements is defined in 7.6.9 (p141):
(4.1) If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript is required to compare greater.
Which means the hypothetical element n will compare greater than the array itself (element 0) for all well defined cases of n > 0.
The relevant rule in C++ is [expr.rel]/4.1:
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript is required to compare greater.
The above rule appears to only cover pointers to array elements, and array + SIZE doesn't point to an array element. However, as mentioned in the footnote, a one-past-the-end pointer is treated as if it were an array element here. The relevant language rule is in [basic.compound]/3:
For purposes of pointer arithmetic ([expr.add]) and comparison ([expr.rel], [expr.eq]), a pointer past the end of the last element of an array x of n elements is considered to be equivalent to a pointer to a hypothetical array element n of x and an object of type T that is not an array element is considered to belong to an array with one element of type T.
So C++ guarantees that array + SIZE > array (at least when SIZE > 0), and that &x + 1 > &x for any object x.
array is guaranteed to have consecutive memory space inside. after c++03 or so vectors is guaranteed to have one too for its &vec[0] ... &vec[vec.size() - 1]. This automatically means that that what you're asking about is true
it's called contiguous storage . can be found here for vectors
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0944r0.html
The elements of a vector are stored contiguously, meaning that if v is a vector<T, Allocator> where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size(). Presumably five more years of studying the interactions of contiguity with caching made it clear to WG21 that contiguity needed to be mandated and non-contiguous vector implementation should be clearly banned.
latter is from standard docs. C++03 I've guessed right.

Is &a[n] valid, where n is the size of the array? [duplicate]

This question already has answers here:
Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?
(13 answers)
Closed 8 years ago.
Consider the following simple code to sort an array.
int myarray[4] = {};
std::sort(myarray, myarray + 4);
I know that it is valid to create a pointer to one past the end of a C-style array.
I've recently seen code like this:
std::sort(myarray, &myarray[4]);
I'm not sure this is valid, because it dereferences an element outside the array bounds, even though the element value is not used for anything.
Is this valid code?
A[i] is syntactically equivalent to *(A + i) for an array or pointer A.
So &A[i] is syntactically equivalent to &(*(A + i)).
When *(A + i) does not have undefined behavior, &(*(A + i)) will behave identically to A + i.
The problem is that myarray[4] is syntactically equivalent to *(myarray + 4), which dereferences a location out of the array's bounds. That is undefined behavior according to the standard.
So you should absolutely prefer myarray + 4 over &myarray[4] - the latter is undefined behavior.
That &myarray[4] has "correct" behavior with most - if not all - compilers does not exempt it from having undefined behavior according to the standard.
It is valid for the same reason your original is valid: it can determine where that element would be. In the latter case, just because it identifies a nonexistent element, since it does't try to access it, there is no problem.
Or, more concisely: myarray+4 and &myarray[4] are synonyms.
The standard requires that a pointer to one-past the end of an array be a valid value for the pointer to have. (It doesn't mean that it is OK to dereference).
This is required in a few places, for example in pointer comparison (5.9, relational operators):
If two pointers point to elements of the same array or one beyond the
end of the array, the pointer to the object with the higher subscript
compares higher.
This is actually relied on for the STL. The "end()" function of iterators is equivalent to 1 past the end of the array.
I also see this in section 27.6.2: Stream Buffer requirements
So your snippet:
std::sort(myarray, &myarray[4]);
is basically the same as if you used a vector of int
vector<int> yourArray;
std::sort(yourArray.begin(), yourArray.end());
According to §6.5.3.2 in the C99 standard, its valid C:
The unary & operator yields the address of its operand. If the operand
has type ‘‘type’’, the result has type ‘‘pointer to type’’. If the
operand is the result of a unary * operator, neither that operator nor
the & operator is evaluated and the result is as if both were omitted
There seems to be no equivalent counterpart in C++, probably due to operator overloading.

Does the standard define the type for `a[i]` where `a` is `T [M][N]`?

I, very occasionally, make use of multidimensional arrays, and got curious what the standard says (C11 and/or C++11) about the behavior of indexing with less "dimensions" than the one declared for the array.
Given:
int a[2][2][2] = {{{1, 2}, {3, 4}}, {{5, 6}, {7, 8}}};
Does the standard says what type a[1] is, or a[0][1], is it legal, and whether it should properly index sub-arrays as expected?
auto& b = a[1];
std::cout << b[1][1];
m[1] is just of type int[2][2]. Likewise m[0][1] is just int[2]. And yes, indexing as sub-arrays works the way you think it does.
Does the standard define the type for a[i] where a is T [M][N]?
Of course. The standard basically defines the types of all expressions, and if it does not, it would be a defect report. But I guess you are more interested on what that type might be...
While the standard may not explicitly mention your case, the rules are stated and are simple, given an array a of N elements of type T, the expression a[0] is an lvalue expression of type T. In the variable declaration int a[2][2] the type of a is array of 2 elements of type array of two elements of type int, which applying the rule above means that a[0] is lvalue to an array of 2 elements, or as you would have to type it in a program: int (&)[2]. Adding extra dimensions does not affect the mechanism.
I think this example in C11 explained it implicitly.
C11 6.5.2.1 Array subscripting
EXAMPLE Consider the array object defined by the declaration int x[3][5]; Here x is a 3 × 5 array of ints; more precisely, x is an array of three element objects, each of which is an array of five ints. In the expression x[i], which is equivalent to (*((x) + (i))), x is first converted to a pointer to the initial array of five ints. Then i is adjusted according to the type of x, which conceptually entails multiplying i by the size of the object to which the pointer points, namely an array of five int objects. The results are added and indirection is applied to yield an array of five ints. When used in the expression x[i][j], that array is in turn converted to a pointer to the first of the ints, so x[i][j] yields an int.
The similar is in C++11 8.3.4 Arrays
Example: consider
int x[3][5];
Here x is a 3 × 5 array of integers. When x appears in an expression, it is converted to a pointer to (the first of three) five-membered arrays of integers. In the expression x[i] which is equivalent to *(x + i), x is first converted to a pointer as described; then x + i is converted to the type of x, which involves multiplying i by the length of the object to which the pointer points, namely five integer objects. The results are added
and indirection applied to yield an array (of five integers), which in turn is converted to a pointer to the first of the integers. If there is another subscript the same argument applies again; this time the result is an integer. —end example ] —end note ]
The key point to remember is that, in both C and C++, a multidimensional array is simply an array of arrays (so a 3-dimensional array is an array of arrays of arrays). All the syntax and semantics of multidimensional arrays follow from that (and from the other rules of the language, of course).
So given an object definition:
int m[2][2][2];
m is an object of type int[2][2][2] (an array of two arrays, each of which consists of two elements, each of which consists of two elements, each of which is an array of two ints).
When you write m[1][1][1], you're already evaluating m, m[1] and m[1][1].
The expression m is an lvalue referring to an array object of type int[2][2][2].
In m[1], the array expression m is implicitly converted to ("decays" to) a pointer to the array's first element. This pointer is of type int(*)[2][2], a pointer to a two-element array of two-element arrays of int. m[1] is by definition equivalent to *(m+1); the +1 advances m by one element and dereferences the resulting pointer value. So m[1] refers to an object of type int[2][2] (an array of two arrays, each of which consists of two int elements).
(The array indexing operator [] is defined to operate on a pointer, not an array. In a common case like arr[42], the pointer happens to be the result of an implicit array-to-pointer conversion.)
We repeat the process for m[1][1], giving us a pointer to an array of two ints (of type int(*)[2]).
Finally, m[1][1][1] takes the result of evaluating m[1][1] and repeats the process yet again, giving us an lvalue referring to an object of type int. And that's how multidimensional arrays work.
Just to add to the frivolity, an expression like foo[index1][index2][index3] can work directly with pointers as well as with arrays. That means you can construct something that works (almost) like a true multidimensional array using pointers and allocations of arbitrary size. This gives you the possibility of having "ragged" arrays with different numbers of elements in each row, or even rows and elements that are missing. But then it's up to you to manage the allocation and deallocation for each row, or even for each element.
Recommended reading: Section 6 of the comp.lang.c FAQ.
A side note: There are languages where multidimensional arrays are not arrays of arrays. In Ada, for example (which uses parentheses rather than square brackets for array indexing), you can have an array of arrays, indexed like arr(i)(j), or you can have a two-dimensional array, indexed like arr(i, j). C is different; C doesn't have direct built-in support for multidimensional arrays, but it gives you the tools to build them yourself.

Why does decay to pointer for array argument appear not to apply to sizeof()?

I read a question earlier that was closed due to being an exact duplicate of this
When a function has a specific-size array parameter, why is it replaced with a pointer?
and
How to find the 'sizeof' (a pointer pointing to an array)?
but after reading this I am still confused by how sizeof() works. I understand that passing an array as an argument to a function such as
void foo(int a[5])
will result in the array argument decaying to a pointer. What I did not find in the above 2 question links was a clear answer as to why it is that the sizeof() function itself is exempt from (or at least seemingly exempt from) this pointer decay behaviour. If sizeof() behaved like any other function then
int a[5] = {1,2,3,4,5};
cout << sizeof(a) << endl;
then the above should output 4 instead of 20. Have I missed something obvious as this seems to be a contradiction of the decay to pointer behaviour??? Sorry for bringing this up again but I really am having a hard time of understanding why this happens despite having happily used the function for years without really thinking about it.
Because the standard says so (emphasis mine):
(C99, 6.3.2.1p3) "Except when it is the operand of the sizeof operator or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue."
Note that for C++, the standard explicitly says the size is the size of the array:
(C++11, 5.3.3p2 sizeof) "[...] When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n
elements is n times the size of an element."
sizeof is an operator, not a function. It's a specific one at that, too. The parentheses aren't even necessary if it's an expression:
int a;
sizeof (int); //needed because `int` is a type
sizeof a; //optional because `a` is an expression
sizeof (a); //^ also works
As you can see, it's on this chart of precedence as well. It's also one of the non-overloadable operators.

Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?

I have seen it asserted several times now that the following code is not allowed by the C++ Standard:
int array[5];
int *array_begin = &array[0];
int *array_end = &array[5];
Is &array[5] legal C++ code in this context?
I would like an answer with a reference to the Standard if possible.
It would also be interesting to know if it meets the C standard. And if it isn't standard C++, why was the decision made to treat it differently from array + 5 or &array[4] + 1?
Yes, it's legal. From the C99 draft standard:
§6.5.2.1, paragraph 2:
A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero).
§6.5.3.2, paragraph 3 (emphasis mine):
The unary & operator yields the address of its operand. If the operand has type ‘‘type’’,
the result has type ‘‘pointer to type’’. If the operand is the result of a unary * operator,
neither that operator nor the & operator is evaluated and the result is as if both were
omitted, except that the constraints on the operators still apply and the result is not an
lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator
were removed and the [] operator were changed to a + operator. Otherwise, the result is
a pointer to the object or function designated by its operand.
§6.5.6, paragraph 8:
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist. Moreover, if the expression P points to the last
element of an array object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element of an array object,
the expression (Q)-1 points to the last element of the array object. If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
Note that the standard explicitly allows pointers to point one element past the end of the array, provided that they are not dereferenced. By 6.5.2.1 and 6.5.3.2, the expression &array[5] is equivalent to &*(array + 5), which is equivalent to (array+5), which points one past the end of the array. This does not result in a dereference (by 6.5.3.2), so it is legal.
Your example is legal, but only because you're not actually using an out of bounds pointer.
Let's deal with out of bounds pointers first (because that's how I originally interpreted your question, before I noticed that the example uses a one-past-the-end pointer instead):
In general, you're not even allowed to create an out-of-bounds pointer. A pointer must point to an element within the array, or one past the end. Nowhere else.
The pointer is not even allowed to exist, which means you're obviously not allowed to dereference it either.
Here's what the standard has to say on the subject:
5.7:5:
When an expression that has integral
type is added to or subtracted from a
pointer, the result has the type of
the pointer operand. If the pointer
operand points to an element of an
array object, and the array is large
enough, the result points to an
element offset from the original
element such that the difference of
the subscripts of the resulting and
original array elements equals the
integral expression. In other words,
if the expression P points to the i-th
element of an array object, the
expressions (P)+N (equivalently,
N+(P)) and (P)-N (where N has the
value n) point to, respectively, the
i+n-th and i−n-th elements of the
array object, provided they exist.
Moreover, if the expression P points
to the last element of an array
object, the expression (P)+1 points
one past the last element of the array
object, and if the expression Q points
one past the last element of an array
object, the expression (Q)-1 points to
the last element of the array object.
If both the pointer operand and the
result point to elements of the same
array object, or one past the last
element of the array object, the
evaluation shall not produce an
overflow; otherwise, the behavior is
undefined.
(emphasis mine)
Of course, this is for operator+. So just to be sure, here's what the standard says about array subscripting:
5.2.1:1:
The expression E1[E2] is identical (by definition) to *((E1)+(E2))
Of course, there's an obvious caveat: Your example doesn't actually show an out-of-bounds pointer. it uses a "one past the end" pointer, which is different. The pointer is allowed to exist (as the above says), but the standard, as far as I can see, says nothing about dereferencing it. The closest I can find is 3.9.2:3:
[Note: for instance, the address one past the end of an array (5.7) would be considered to
point to an unrelated object of the array’s element type that might be located at that address. —end note ]
Which seems to me to imply that yes, you can legally dereference it, but the result of reading or writing to the location is unspecified.
Thanks to ilproxyil for correcting the last bit here, answering the last part of your question:
array + 5 doesn't actually
dereference anything, it simply
creates a pointer to one past the end
of array.
&array[4] + 1 dereferences
array+4 (which is perfectly safe),
takes the address of that lvalue, and
adds one to that address, which
results in a one-past-the-end pointer
(but that pointer never gets
dereferenced.
&array[5] dereferences array+5
(which as far as I can see is legal,
and results in "an unrelated object
of the array’s element type", as the
above said), and then takes the
address of that element, which also
seems legal enough.
So they don't do quite the same thing, although in this case, the end result is the same.
It is legal.
According to the gcc documentation for C++, &array[5] is legal. In both C++ and in C you may safely address the element one past the end of an array - you will get a valid pointer. So &array[5] as an expression is legal.
However, it is still undefined behavior to attempt to dereference pointers to unallocated memory, even if the pointer points to a valid address. So attempting to dereference the pointer generated by that expression is still undefined behavior (i.e. illegal) even though the pointer itself is valid.
In practice, I imagine it would usually not cause a crash, though.
Edit: By the way, this is generally how the end() iterator for STL containers is implemented (as a pointer to one-past-the-end), so that's a pretty good testament to the practice being legal.
Edit: Oh, now I see you're not really asking if holding a pointer to that address is legal, but if that exact way of obtaining the pointer is legal. I'll defer to the other answerers on that.
I believe that this is legal, and it depends on the 'lvalue to rvalue' conversion taking place. The last line Core issue 232 has the following:
We agreed that the approach in the standard seems okay: p = 0; *p; is not inherently an error. An lvalue-to-rvalue conversion would give it undefined behavior
Although this is slightly different example, what it does show is that the '*' does not result in lvalue to rvalue conversion and so, given that the expression is the immediate operand of '&' which expects an lvalue then the behaviour is defined.
I don't believe that it is illegal, but I do believe that the behaviour of &array[5] is undefined.
5.2.1 [expr.sub] E1[E2] is identical (by definition) to *((E1)+(E2))
5.3.1 [expr.unary.op] unary * operator ... the result is an lvalue referring to the object or function to which the expression points.
At this point you have undefined behaviour because the expression ((E1)+(E2)) didn't actually point to an object and the standard does say what the result should be unless it does.
1.3.12 [defns.undefined] Undefined behaviour may also be expected when this International Standard omits the description of any explicit definition of behaviour.
As noted elsewhere, array + 5 and &array[0] + 5 are valid and well defined ways of obtaining a pointer one beyond the end of array.
In addition to the above answers, I'll point out operator& can be overridden for classes. So even if it was valid for PODs, it probably isn't a good idea to do for an object you know isn't valid (much like overriding operator&() in the first place).
This is legal:
int array[5];
int *array_begin = &array[0];
int *array_end = &array[5];
Section 5.2.1 Subscripting The expression E1[E2] is identical (by definition) to *((E1)+(E2))
So by this we can say that array_end is equivalent too:
int *array_end = &(*((array) + 5)); // or &(*(array + 5))
Section 5.3.1.1 Unary operator '*': The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or
a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.
If the type of the expression is “pointer to T,” the type of the result is “T.” [ Note: a pointer to an incomplete type (other
than cv void) can be dereferenced. The lvalue thus obtained can be used in limited ways (to initialize a reference, for
example); this lvalue must not be converted to an rvalue, see 4.1. — end note ]
The important part of the above:
'the result is an lvalue referring to the object or function'.
The unary operator '*' is returning a lvalue referring to the int (no de-refeference). The unary operator '&' then gets the address of the lvalue.
As long as there is no de-referencing of an out of bounds pointer then the operation is fully covered by the standard and all behavior is defined. So by my reading the above is completely legal.
The fact that a lot of the STL algorithms depend on the behavior being well defined, is a sort of hint that the standards committee has already though of this and I am sure there is a something that covers this explicitly.
The comment section below presents two arguments:
(please read: but it is long and both of us end up trollish)
Argument 1
this is illegal because of section 5.7 paragraph 5
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
And though the section is relevant; it does not show undefined behavior. All the elements in the array we are talking about are either within the array or one past the end (which is well defined by the above paragraph).
Argument 2:
The second argument presented below is: * is the de-reference operator.
And though this is a common term used to describe the '*' operator; this term is deliberately avoided in the standard as the term 'de-reference' is not well defined in terms of the language and what that means to the underlying hardware.
Though accessing the memory one beyond the end of the array is definitely undefined behavior. I am not convinced the unary * operator accesses the memory (reads/writes to memory) in this context (not in a way the standard defines). In this context (as defined by the standard (see 5.3.1.1)) the unary * operator returns a lvalue referring to the object. In my understanding of the language this is not access to the underlying memory. The result of this expression is then immediately used by the unary & operator operator that returns the address of the object referred to by the lvalue referring to the object.
Many other references to Wikipedia and non canonical sources are presented. All of which I find irrelevant. C++ is defined by the standard.
Conclusion:
I am wiling to concede there are many parts of the standard that I may have not considered and may prove my above arguments wrong. NON are provided below. If you show me a standard reference that shows this is UB. I will
Leave the answer.
Put in all caps this is stupid and I am wrong for all to read.
This is not an argument:
Not everything in the entire world is defined by the C++ standard. Open your mind.
Working draft (n2798):
"The result of the unary & operator is
a pointer to its operand. The operand
shall be an lvalue or a qualified-id.
In the first case, if the type of the
expression is “T,” the type of the
result is “pointer to T.”" (p. 103)
array[5] is not a qualified-id as best I can tell (the list is on p. 87); the closest would seem to be identifier, but while array is an identifier array[5] is not. It is not an lvalue because "An lvalue refers to an object or function. " (p. 76). array[5] is obviously not a function, and is not guaranteed to refer to a valid object (because array + 5 is after the last allocated array element).
Obviously, it may work in certain cases, but it's not valid C++ or safe.
Note: It is legal to add to get one past the array (p. 113):
"if the expression P [a pointer]
points to the last element of an array
object, the expression (P)+1 points
one past the last element of the array
object, and if the expression Q points
one past the last element of an array
object, the expression (Q)-1 points to
the last element of the array object.
If both the pointer operand and the
result point to elements of the same
array object, or one past the last
element of the array object, the
evaluation shall not produce an
overflow"
But it is not legal to do so using &.
Even if it is legal, why depart from convention? array + 5 is shorter anyway, and in my opinion, more readable.
Edit: If you want it to by symmetric you can write
int* array_begin = array;
int* array_end = array + 5;
It should be undefined behaviour, for the following reasons:
Trying to access out-of-bounds elements results in undefined behaviour. Hence the standard does not forbid an implementation throwing an exception in that case (i.e. an implementation checking bounds before an element is accessed). If & (array[size]) were defined to be begin (array) + size, an implementation throwing an exception in case of out-of-bound access would not conform to the standard anymore.
It's impossible to make this yield end (array) if array is not an array but rather an arbitrary collection type.
C++ standard, 5.19, paragraph 4:
An address constant expression is a pointer to an lvalue....The pointer shall be created explicitly, using the unary & operator...or using an expression of array (4.2)...type. The subscripting operator []...can be used in the creation of an address constant expression, but the value of an object shall not be accessed by the use of these operators. If the subscripting operator is used, one of its operands shall be an integral constant expression.
Looks to me like &array[5] is legal C++, being an address constant expression.
If your example is NOT a general case but a specific one, then it is allowed. You can legally, AFAIK, move one past the allocated block of memory.
It does not work for a generic case though i.e where you are trying to access elements farther by 1 from the end of an array.
Just searched C-Faq : link text
It is perfectly legal.
The vector<> template class from the stl does exactly this when you call myVec.end(): it gets you a pointer (here as an iterator) which points one element past the end of the array.