Out of the bounds in C++ and undefined behaviour

Out of the bounds in C++ and undefined behaviour - c++

I know that in c++ access out of buffer bounds is undefined behaviour.
Here is example from cppreference:
int table[4] = {};
bool exists_in_table(int v)
{
// return true in one of the first 4 iterations or UB due to out-of-bounds access
for (int i = 0; i <= 4; i++) {
if (table[i] == v) return true;
}
return false;
}
But, I can't find according paragraph in c++ standard.
Can anyone point me out on concrete paragraph in standard where such case is explained?

It's undefined behavior. We can juxtapose a couple of passages to be convinced of it. First, and I won't explicitly prove it, table[4] is *(table + 4). We need only ask ourselves the properties of the pointer value table + 4 and how it relates to the requirements of the indirection operator.
On the pointer, we have this passage:
[basic.compound]
3 Every value of pointer type is one of the following:
a pointer to an object or function (the pointer is said to point to the object or function), or
a pointer past the end of an object ([expr.add]), or
the null pointer value for that type, or
an invalid pointer value.
Our pointer is of the second bullet's type, not the first. As for the indirection operator:
[expr.unary.op]
1 The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to T”, the type of the result is “T”.
I hope it's obvious from reading this paragraph that the operation is defined for a pointer of the category described by the first bullet in the preceding paragraph.
So we apply an operation to a pointer value for which its behavior is not defined. The result is undefined behavior.

Subscript operator is defined through addition operator. The array decays to a pointer to first element in this identical expression, so rules of pointer arithmetic apply. Indirection operator is used on the hypothetical result of the addition.
[expr.sub]
A postfix expression followed by an expression in square brackets is a postfix expression.
One of the expressions shall be a glvalue of type “array of T” or a prvalue of type “pointer to T” and the other shall be a prvalue of unscoped enumeration or integral type.
The result is of type “T”.
The type “T” shall be a completely-defined object type.
The expression E1[E2] is identical (by definition) to *((E1)+(E2)), ...
In case where the array index is more than one past the last element i.e. E2 > std::size(E1) (which isn't the case in the example program), the hypothetical pointer arithmetic itself is undefined.
[expr.add]
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value ... (does not apply)
Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n. (does not apply when i-j > n)
Otherwise, the behavior is undefined.
In case of E2 == std::size(E1) (which is the case in last iteration of the example), the hypothetical result of the addition is a pointer to one past the array and points to outside the storage of the array. The hypothetical pointer arithmetic is well defined.
[basic.compound]
A value of a pointer type that is a pointer ... past the end of an object represents ... the first byte in memory after the end of the storage occupied by the object
Access is defined in terms of objects. But there is no object there, nor is there even storage, and thus there isn't definition for the behaviour.
OK, there might in some cases be an unrelated object in the pointed memory address. Following note says that pointer past the end is not a pointer to such unrelated object sharing the address. I couldn't find which normative rule causes this.
[Note 2: A pointer past the end of an object ([expr.add]) is not considered to point to an unrelated object of the object's type, even if the unrelated object is located at that address. ...
Alternatively, we can look at the definition of indirection operator:
[expr.unary.op]
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type ... and the result is an lvalue referring to the object ... to which the expression points. ...
There is a contradiction because there is no object that could be referred to.
So, in conclusion:
int table[N] = {};
table[N] == 0; // UB, accessing non-existing object
table[N + 1]; // UB, [expr.add]
table + N; // OK, one past last element
table[N]; // ¯\_(ツ)_/¯ See CWG 232

Related

process array in chunks using struct then cast as flat array - how to avoid UB (strict aliasing)?

An external API expects a pointer to an array of values (int as simple example here) plus a size.
It is logically clearer to deal with the elements in groups of 4.
So process elements via a "group of 4" struct and then pass the array of those structs to the external API using a pointer cast. See code below.
Spider sense says: "strict aliasing violation" in the reinterpret_cast => possible UB?
Are the static_asserts below enough to ensure:
a) this works in practice
b) this is actually standards compliant and not UB?
Otherwise, what do I need to do, to make it "not UB". A union? How exactly please?
or, is there overall a different, better way?
#include <cstddef>
void f(int*, std::size_t) {
// external implementation
// process array
}
int main() {
static constexpr std::size_t group_size = 4;
static constexpr std::size_t number_groups = 10;
static constexpr std::size_t total_number = group_size * number_groups;
static_assert(total_number % group_size == 0);
int vals[total_number]{};
struct quad {
int val[group_size]{};
};
quad vals2[number_groups]{};
// deal with values in groups of four using member functions of `quad`
static_assert(alignof(int) == alignof(quad));
static_assert(group_size * sizeof(int) == sizeof(quad));
static_assert(sizeof(vals) == sizeof(vals2));
f(vals, total_number);
f(reinterpret_cast<int*>(vals2), total_number); /// is this UB? or OK under above asserts?
}

No, this is not permitted. The relevant C++ standard section is §7.6.1.10. From the first paragraph, we have (emphasis mine)
The result of the expression reinterpret_cast<T>(v) is the result of converting the expression v to type T.
If T is an lvalue reference type or an rvalue reference to function type, the result is an lvalue; if T is an rvalue reference to object type, the result is an xvalue; otherwise, the result is a prvalue and the lvalue-to-rvalue, array-to-pointer, and function-to-pointer standard conversions are performed on the expression v.
Conversions that can be performed explicitly using reinterpret_cast are listed below.
No other conversion can be performed explicitly using reinterpret_cast.
So unless your use case is listed on that particular page, it's not valid. Most of the sections are not relevant to your use case, but this is the one that comes closest.
An object pointer can be explicitly converted to an object pointer of a different type.[58]
When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_cast<cv T*>(static_cast<cv void*>(v)).
So a reinterpret_cast from one pointer type to another is equivalent to a static_cast through an appropriately cv-qualified void*. Now, a static_cast that goes from T* to S* can be acceptably used as a S* if the types T and S are pointer-interconvertible. From §6.8.4
Two objects a and b are pointer-interconvertible if:
they are the same object, or
one is a union object and the other is a non-static data member of that object ([class.union]), or
one is a standard-layout class object and the other is the first non-static data member of that object or any base class subobject of that object ([class.mem]), or
there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.
If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_cast ([expr.reinterpret.cast]).
[Note 4: An array object and its first element are not pointer-interconvertible, even though they have the same address.
— end note]
To summarize, you can cast a pointer to a class C to a pointer to its first member (and back) if there's no vtable to stop you. You can cast a pointer to C into another pointer to C (that can come up if you're adding cv-qualifiers; for instance, reinterpret_cast<const C*>(my_c_ptr) is valid if my_c_ptr is C*). There are also some special rules for unions, which don't apply here. However, you can't factor through arrays, as per Note 4. The conversion you want here is quad[] -> quad -> int -> int[], and you can't convert between the quad[] and the quad. If quad was a simple struct that contained only an int, then you could reinterpret a quad* as an int*, but you can't do it through arrays, and certainly not through a nested layer of them.
None of the sections I've cited say anything about alignment. Or size. Or packing. Or padding. None of that matters. All your static_asserts are doing is slightly increasing the probability that the undefined behavior (which is still undefined) will happen to work on more compilers. But you're using a bandaid to repair a dam; it's not going to work.

No amount of static_asserts is going to make something which is categorically UB into well-defined behavior in accord with the standard. You did not create an array of ints; you created a struct containing an array of ints. So that's what you have.
It's legal to convert a pointer to a quad into a pointer to an int[group_size] (though you'll need to alter your code appropriately. Or you could just access the array directly and cast that to an int*.
Regardless of how you get a pointer to the first element, it's legal to do pointer arithmetic within that array. But the moment you try to do pointer arithmetic past the boundaries of the array within that quad object, you achieve undefined behavior. Pointer arithmetic is defined based on the existence of an array: [expr.add]/4
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n.
Otherwise, the behavior is undefined.
The pointer isn't null, so case 1 doesn't apply. The n above is group_size (because the array is the one within quad), so if the index is > group_size, then case 2 doesn't apply.
Therefore, undefined behavior will happen whenever someone tries to access the array past index 4. There is no cast that can wallpaper over that.
Otherwise, what do I need to do, to make it "not UB". A union? How exactly please?
You don't. What you're trying to do is simply not valid with respect to the C++ object model. You need an array of ints, so you must create an array of ints. You cannot treat an array of something other than ints as an array of ints (well, with minor exceptions of byte-wise arrays, but that's unhelpful to you).
The simplest valid way to process the array in groups is to just... do some nested loops:
int arr[total_number];
for(int* curr = arr; curr != std::end(arr); curr += 4)
{
//Use `curr[0]` to `curr[3]`;
//Or create a `std::span<int, 4> group(curr)`;
}

Is reinterpret_cast<char*>(myTypePtr) assumed to point to an array?

We know that char* can alias anything: According to cppreference
Whenever an attempt is made to read or modify the stored value of an object of type DynamicType through a glvalue of type AliasedType, the behavior is undefined unless one of the following is true:
[...]
AliasedType is std::byte, char, or unsigned char: this permits examination of the object representation of any object as an array of bytes. [...]
The statement in boldface is not present in n4659 draft [6.10, (8.8)].
Since doing pointer arithmetic on pointers that don't point to elements of the same array is undefined, can we really access bytes other than the first one using only reinterpret_cast?
Or maybe std::memcpy must be used for that purpose?

auto ptr = reinterpret_cast<char*>(myTypePtr);
The standard permit this conversion, due to:
An object pointer can be explicitly converted to an object pointer of a different type.73 When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_cast<cv T*>(static_cast<cv void*>(v)). [ Note: Converting a prvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value.  — end note ]
So, the conversion is equivalent to:
assume myTypePtr has no any cv qualifier.
auto ptr = static_cast<char*>(static_cast<void*>(myTypePtr))
And you are permitted to dereference myTypePtr to access the value within the object(the pointer point to), due to:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
a char, unsigned char, or std::byte type.
If myTypePtr is not an object of array of char type, as long as you applied addition to ptr, It will result in undefined behavior, due to:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements,86 the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element
x[j + p] if 0 ≤ i+j≤n ; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i - j] if 0 ≤ i - j≤n ; otherwise, the behavior is undefined.
For addition or subtraction, if the expressions P or Q have type “pointer to cv T”, where T and the array element type are not similar, the behavior is undefined.
Because the element of myTypePtr is not of type char. Hence applying addition to ptr result in undefined behavior.
Or maybe std::memcpy must be used for that purpose?
Yes, If the object to which myTypePtr point subject to the following rules:
For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes ([intro.memory]) making up the object can be copied into an array of char, unsigned char, or std::byte ([cstddef.syn]).43 If the content of that array is copied back into the object, the object shall subsequently hold its original value.
OR
For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the underlying bytes ([intro.memory]) making up obj1 are copied into obj2,44 obj2 shall subsequently hold the same value as obj1.
However, It's unfortunately we can't implement such a memcpy subject to the current standard.

As std::as_bytes and std::as_writable_bytes essentially depend on such usage (which is specified in [span.objectrep]), I guess we can assume it is supported, even though it is not true according to C++17/20 (and the latest working draft).
This defect has been revealed by P1839, and, unfortunately, has not been resolved yet.

Is member access on a null pointer defined in C++?

Is address computation on a null pointer defined behavior in C++? Here's a simple example program.
struct A { int x; };
int main() {
A* p = nullptr;
&(p->x); // is this undefined behavior?
return 0;
}
Thanks.
EDIT Subscripting is covered in this other question.

&(p->x); // is this undefined behavior?
Standard is a bit vague regarding this:
[expr.ref] ... The expression E1->E2 is converted to the equivalent form (*(E1)).E2;
[expr.unary.op] The unary * operator ... the result is an lvalue referring to the object ... to which the expression points.
There is no explicit mention of UB in the section. The quoted rule does appear to conflict with the fact that the null pointer doesn't point to any object. This could be interpreted that yes, behaviour is undefined.
[expr.unary.op] The result of the unary & operator is a pointer to its operand. ... if the operand is an lvalue of type T, the resulting expression is a prvalue of type “pointer to T” whose result is a pointer to the designated object ([intro.memory]).
Again, no designated object exists. Note that at no point is the operand lvalue converted to an rvalue, which would definitely have been UB.
Back in 2000 there was CWG issue to clarify whether indirection through null is undefined. The proposed resolution (2004), that would clarify that indirection through null is not UB, appears to not have been added to the standard so far.
However whether it is or isn't UB doesn't matter much since you don't need to do this. At the very least, the resulting pointer will be invalid and thus useless.
If you were planning to convert the pointer to an integer to get the offset of the member, there is no need to do this because you can instead us the offsetof macro from the standard library, which doesn't have UB.
&(p[1]); // undefined?
Here, behaviour is quite clearly undefined:
[expr.sub] ... The expression E1[E2] is identical (by definition) to *((E1)+(E2)), except that in the case of an array operand, the result is an lvalue if that operand is an lvalue and an xvalue otherwise.
[expr.add] When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0 (does not apply)
Otherwise, if P points to an array element (does not apply)
Otherwise, the behavior is undefined.
&(p[0]); // undefined?
As per previous rules, the first option applies:
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
And now we are back to the question of whether indirection through this null is UB. See the beginning of the answer.
Still, doesn't really matter. There is no need to write this, since this is simply unnecessarily complicated way to write sizeof(int) * i (with i being 1 and 0 respectively).

UB When Dereferencing Array of Unions

Which of these are undefined behaviour:
template <class T> struct Struct { T t; };
template <class T> union Union { T t; };
template <class T> void function() {
Struct aS[10];
Union aU[10];
// do something with aS[9].t and aU[9].t including initialization
T *aSP = reinterpret_cast<T *>(aS);
T *aUP = reinterpret_cast<T *>(aU);
// so here is this undefined behaviour?
T valueS = aSP[9];
// use valueS in whatever way
// so here is this undefined behaviour?
T valueU = aUP[9];
// use valueU in whatever way
// now is accessing aS[9].t or aU[9].t now UB?
}
So yeah, which of the last 3 operations is UB?
(My reasoning: I don't know about the struct, if there is any requirement for its size to be the same as its single element, but AFAIK the union has to be the same size as the element. Alignment requirements I don't know for the union, but I am guessing it is the same. For the struct I have no idea. In the case of the union I would guess that it is not UB, but as I said, I am really really not sure. For the struct I actually have no idea)

tl;dr: the last two statements in your code above will always invoke undefined behavior, simply casting a pointer to a union to a pointer to one of its member types is generally fine because it doesn't really do anything (it's unspecified at worst, but never undefined behavior; note: we're talking about just the cast itself, using the result of the cast to access an object is a whole different story).
Depending on what T ends up being, Struct<T> may potentially be a standard-layout struct [class.prop]/3 in which case
T *aSP = reinterpret_cast<T *>(aS);
would be well-defined because a Struct<T> would be pointer-interconvertible with its first member (which is of type T) [basic.compound]/4.3. Above reinterpret_cast is equivalent to [expr.reinterpret.cast]/7
T *aSP = static_cast<T *>(static_cast<void *>(aS));
which will invoke the array-to-pointer conversion [conv.array], resulting in a Struct<T>* pointing to the first element of aS. This pointer is then converted to void* (via [expr.static.cast]/4 and [conv.ptr]/2), which is then converted to T*, which would be legal via [expr.static.cast]/13:
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.
Similarly,
T *aUP = reinterpret_cast<T *>(aU);
would be well-defined in C++17 if Union<T> is a standard-layout union and looks to be well-defined in general with the coming version of C++ based on the current standard draft, where a union and one of its members are always pointer-interconvertible [basic.compound]/4.2
All of the above is irrelevant, however, because
T valueS = aSP[9];
and
T valueU = aUP[9];
will invoke undefined behavior no matter what. aSP[9] and aUP[9] are (by definition) the same as *(aSP + 9) and *(aUP + 9) respectively [expr.sub]/1. The pointer arithmetic in these expressions is subject to [expr.add]/4
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
Otherwise, if P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) element x[i−j] if 0≤i−j≤n.
Otherwise, the behavior is undefined.
aSP and aUP do not point to an element of an array. Even if aSP and aUP would be pointer-interconvertible with T, you'd only ever be allowed to access element 0 and compute the address of (but not access) element 1 of the hypothetical single-element array…

So if we look at the doc of reinterpret_cast (here)
5) Any object pointer type T1* can be converted to another object
pointer type cv T2*. This is exactly equivalent to static_cast(static_cast(expression)) (which implies that if T2's
alignment requirement is not stricter than T1's, the value of the
pointer does not change and conversion of the resulting pointer back
to its original type yields the original value). In any case, the
resulting pointer may only be dereferenced safely if allowed by the
type aliasing rules (see below)
Now What say the aliasing rules ?
Whenever an attempt is made to read or modify the stored value of an
object of type DynamicType through a glvalue of type AliasedType, the
behavior is undefined unless one of the following is true:
AliasedType and DynamicType are similar.
AliasedType is the (possibly cv-qualified) signed or unsigned variant of DynamicType.
AliasedType is std::byte, (since C++17)char, or unsigned char: this permits examination of the object representation of any object as
an array of bytes.
So it's not 2 nor 3. May be 1?
Similar:
Informally, two types are similar if, ignoring top-level
cv-qualification:
they are the same type; or
they are both pointers, and the pointed-to types are similar; or
they are both pointers to member of the same class, and the types of the pointed-to members are similar; or
they are both arrays of the same size or both arrays of unknown bound, and the array element types are similar.
And, from C++17 draft:
Two objects a and b are pointer-interconvertible if:
they are the same object, or
one is a union object and the other is a non-static data member of that object ([class.union]), or
one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no
non-static data members, any base class subobject of that object
([class.mem]), or
there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.
If two objects are pointer-interconvertible, then they have the same
address, and it is possible to obtain a pointer to one from a pointer
to the other via a reinterpret_cast. [ Note: An array object and its
first element are not pointer-interconvertible, even though they have
the same address. — end note]
So, for me :
T *aSP = reinterpret_cast<T *>(aS); // Is OK
T *aUP = reinterpret_cast<T *>(aU); // Is OK.

I found c++ - Is sizeof(T) == sizeof(int). This specifies that structs do not have to have the same size as their elements (sigh). As for unions, the same would probably apply (after reading the answers, I am led to believe so). This is alone necessary to make this situation UB. However, if sizeof(Struct) == sizeof(T), and "It's well-established that" in https://stackoverflow.com/a/21515546, a pointer to aSP[9] would be the same location as that of aS[9] (at least I think so), and reinterpret_cast'ing that is guarantied by the standard (according to the quote in https://stackoverflow.com/a/21509729).
EDIT: This is actually wrong. The correct answer is here.

Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?

I have seen it asserted several times now that the following code is not allowed by the C++ Standard:
int array[5];
int *array_begin = &array[0];
int *array_end = &array[5];
Is &array[5] legal C++ code in this context?
I would like an answer with a reference to the Standard if possible.
It would also be interesting to know if it meets the C standard. And if it isn't standard C++, why was the decision made to treat it differently from array + 5 or &array[4] + 1?

Yes, it's legal. From the C99 draft standard:
§6.5.2.1, paragraph 2:
A postfix expression followed by an expression in square brackets [] is a subscripted
designation of an element of an array object. The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that
apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the
initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th
element of E1 (counting from zero).
§6.5.3.2, paragraph 3 (emphasis mine):
The unary & operator yields the address of its operand. If the operand has type ‘‘type’’,
the result has type ‘‘pointer to type’’. If the operand is the result of a unary * operator,
neither that operator nor the & operator is evaluated and the result is as if both were
omitted, except that the constraints on the operators still apply and the result is not an
lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator
were removed and the [] operator were changed to a + operator. Otherwise, the result is
a pointer to the object or function designated by its operand.
§6.5.6, paragraph 8:
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist. Moreover, if the expression P points to the last
element of an array object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element of an array object,
the expression (Q)-1 points to the last element of the array object. If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
Note that the standard explicitly allows pointers to point one element past the end of the array, provided that they are not dereferenced. By 6.5.2.1 and 6.5.3.2, the expression &array[5] is equivalent to &*(array + 5), which is equivalent to (array+5), which points one past the end of the array. This does not result in a dereference (by 6.5.3.2), so it is legal.

Your example is legal, but only because you're not actually using an out of bounds pointer.
Let's deal with out of bounds pointers first (because that's how I originally interpreted your question, before I noticed that the example uses a one-past-the-end pointer instead):
In general, you're not even allowed to create an out-of-bounds pointer. A pointer must point to an element within the array, or one past the end. Nowhere else.
The pointer is not even allowed to exist, which means you're obviously not allowed to dereference it either.
Here's what the standard has to say on the subject:
5.7:5:
When an expression that has integral
type is added to or subtracted from a
pointer, the result has the type of
the pointer operand. If the pointer
operand points to an element of an
array object, and the array is large
enough, the result points to an
element offset from the original
element such that the difference of
the subscripts of the resulting and
original array elements equals the
integral expression. In other words,
if the expression P points to the i-th
element of an array object, the
expressions (P)+N (equivalently,
N+(P)) and (P)-N (where N has the
value n) point to, respectively, the
i+n-th and i−n-th elements of the
array object, provided they exist.
Moreover, if the expression P points
to the last element of an array
object, the expression (P)+1 points
one past the last element of the array
object, and if the expression Q points
one past the last element of an array
object, the expression (Q)-1 points to
the last element of the array object.
If both the pointer operand and the
result point to elements of the same
array object, or one past the last
element of the array object, the
evaluation shall not produce an
overﬂow; otherwise, the behavior is
undeﬁned.
(emphasis mine)
Of course, this is for operator+. So just to be sure, here's what the standard says about array subscripting:
5.2.1:1:
The expression E1[E2] is identical (by deﬁnition) to *((E1)+(E2))
Of course, there's an obvious caveat: Your example doesn't actually show an out-of-bounds pointer. it uses a "one past the end" pointer, which is different. The pointer is allowed to exist (as the above says), but the standard, as far as I can see, says nothing about dereferencing it. The closest I can find is 3.9.2:3:
[Note: for instance, the address one past the end of an array (5.7) would be considered to
point to an unrelated object of the array’s element type that might be located at that address. —end note ]
Which seems to me to imply that yes, you can legally dereference it, but the result of reading or writing to the location is unspecified.
Thanks to ilproxyil for correcting the last bit here, answering the last part of your question:
array + 5 doesn't actually
dereference anything, it simply
creates a pointer to one past the end
of array.
&array[4] + 1 dereferences
array+4 (which is perfectly safe),
takes the address of that lvalue, and
adds one to that address, which
results in a one-past-the-end pointer
(but that pointer never gets
dereferenced.
&array[5] dereferences array+5
(which as far as I can see is legal,
and results in "an unrelated object
of the array’s element type", as the
above said), and then takes the
address of that element, which also
seems legal enough.
So they don't do quite the same thing, although in this case, the end result is the same.

It is legal.
According to the gcc documentation for C++, &array[5] is legal. In both C++ and in C you may safely address the element one past the end of an array - you will get a valid pointer. So &array[5] as an expression is legal.
However, it is still undefined behavior to attempt to dereference pointers to unallocated memory, even if the pointer points to a valid address. So attempting to dereference the pointer generated by that expression is still undefined behavior (i.e. illegal) even though the pointer itself is valid.
In practice, I imagine it would usually not cause a crash, though.
Edit: By the way, this is generally how the end() iterator for STL containers is implemented (as a pointer to one-past-the-end), so that's a pretty good testament to the practice being legal.
Edit: Oh, now I see you're not really asking if holding a pointer to that address is legal, but if that exact way of obtaining the pointer is legal. I'll defer to the other answerers on that.

I believe that this is legal, and it depends on the 'lvalue to rvalue' conversion taking place. The last line Core issue 232 has the following:
We agreed that the approach in the standard seems okay: p = 0; *p; is not inherently an error. An lvalue-to-rvalue conversion would give it undefined behavior
Although this is slightly different example, what it does show is that the '*' does not result in lvalue to rvalue conversion and so, given that the expression is the immediate operand of '&' which expects an lvalue then the behaviour is defined.

I don't believe that it is illegal, but I do believe that the behaviour of &array[5] is undefined.
5.2.1 [expr.sub] E1[E2] is identical (by definition) to *((E1)+(E2))
5.3.1 [expr.unary.op] unary * operator ... the result is an lvalue referring to the object or function to which the expression points.
At this point you have undefined behaviour because the expression ((E1)+(E2)) didn't actually point to an object and the standard does say what the result should be unless it does.
1.3.12 [defns.undefined] Undefined behaviour may also be expected when this International Standard omits the description of any explicit definition of behaviour.
As noted elsewhere, array + 5 and &array[0] + 5 are valid and well defined ways of obtaining a pointer one beyond the end of array.

In addition to the above answers, I'll point out operator& can be overridden for classes. So even if it was valid for PODs, it probably isn't a good idea to do for an object you know isn't valid (much like overriding operator&() in the first place).

This is legal:
int array[5];
int *array_begin = &array[0];
int *array_end = &array[5];
Section 5.2.1 Subscripting The expression E1[E2] is identical (by deﬁnition) to *((E1)+(E2))
So by this we can say that array_end is equivalent too:
int *array_end = &(*((array) + 5)); // or &(*(array + 5))
Section 5.3.1.1 Unary operator '*': The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or
a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.
If the type of the expression is “pointer to T,” the type of the result is “T.” [ Note: a pointer to an incomplete type (other
than cv void) can be dereferenced. The lvalue thus obtained can be used in limited ways (to initialize a reference, for
example); this lvalue must not be converted to an rvalue, see 4.1. — end note ]
The important part of the above:
'the result is an lvalue referring to the object or function'.
The unary operator '*' is returning a lvalue referring to the int (no de-refeference). The unary operator '&' then gets the address of the lvalue.
As long as there is no de-referencing of an out of bounds pointer then the operation is fully covered by the standard and all behavior is defined. So by my reading the above is completely legal.
The fact that a lot of the STL algorithms depend on the behavior being well defined, is a sort of hint that the standards committee has already though of this and I am sure there is a something that covers this explicitly.
The comment section below presents two arguments:
(please read: but it is long and both of us end up trollish)
Argument 1
this is illegal because of section 5.7 paragraph 5
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
And though the section is relevant; it does not show undefined behavior. All the elements in the array we are talking about are either within the array or one past the end (which is well defined by the above paragraph).
Argument 2:
The second argument presented below is: * is the de-reference operator.
And though this is a common term used to describe the '*' operator; this term is deliberately avoided in the standard as the term 'de-reference' is not well defined in terms of the language and what that means to the underlying hardware.
Though accessing the memory one beyond the end of the array is definitely undefined behavior. I am not convinced the unary * operator accesses the memory (reads/writes to memory) in this context (not in a way the standard defines). In this context (as defined by the standard (see 5.3.1.1)) the unary * operator returns a lvalue referring to the object. In my understanding of the language this is not access to the underlying memory. The result of this expression is then immediately used by the unary & operator operator that returns the address of the object referred to by the lvalue referring to the object.
Many other references to Wikipedia and non canonical sources are presented. All of which I find irrelevant. C++ is defined by the standard.
Conclusion:
I am wiling to concede there are many parts of the standard that I may have not considered and may prove my above arguments wrong. NON are provided below. If you show me a standard reference that shows this is UB. I will
Leave the answer.
Put in all caps this is stupid and I am wrong for all to read.
This is not an argument:
Not everything in the entire world is defined by the C++ standard. Open your mind.

Working draft (n2798):
"The result of the unary & operator is
a pointer to its operand. The operand
shall be an lvalue or a qualiﬁed-id.
In the ﬁrst case, if the type of the
expression is “T,” the type of the
result is “pointer to T.”" (p. 103)
array[5] is not a qualified-id as best I can tell (the list is on p. 87); the closest would seem to be identifier, but while array is an identifier array[5] is not. It is not an lvalue because "An lvalue refers to an object or function. " (p. 76). array[5] is obviously not a function, and is not guaranteed to refer to a valid object (because array + 5 is after the last allocated array element).
Obviously, it may work in certain cases, but it's not valid C++ or safe.
Note: It is legal to add to get one past the array (p. 113):
"if the expression P [a pointer]
points to the last element of an array
object, the expression (P)+1 points
one past the last element of the array
object, and if the expression Q points
one past the last element of an array
object, the expression (Q)-1 points to
the last element of the array object.
If both the pointer operand and the
result point to elements of the same
array object, or one past the last
element of the array object, the
evaluation shall not produce an
overﬂow"
But it is not legal to do so using &.

Even if it is legal, why depart from convention? array + 5 is shorter anyway, and in my opinion, more readable.
Edit: If you want it to by symmetric you can write
int* array_begin = array;
int* array_end = array + 5;

It should be undefined behaviour, for the following reasons:
Trying to access out-of-bounds elements results in undefined behaviour. Hence the standard does not forbid an implementation throwing an exception in that case (i.e. an implementation checking bounds before an element is accessed). If & (array[size]) were defined to be begin (array) + size, an implementation throwing an exception in case of out-of-bound access would not conform to the standard anymore.
It's impossible to make this yield end (array) if array is not an array but rather an arbitrary collection type.

C++ standard, 5.19, paragraph 4:
An address constant expression is a pointer to an lvalue....The pointer shall be created explicitly, using the unary & operator...or using an expression of array (4.2)...type. The subscripting operator []...can be used in the creation of an address constant expression, but the value of an object shall not be accessed by the use of these operators. If the subscripting operator is used, one of its operands shall be an integral constant expression.
Looks to me like &array[5] is legal C++, being an address constant expression.

If your example is NOT a general case but a specific one, then it is allowed. You can legally, AFAIK, move one past the allocated block of memory.
It does not work for a generic case though i.e where you are trying to access elements farther by 1 from the end of an array.
Just searched C-Faq : link text

It is perfectly legal.
The vector<> template class from the stl does exactly this when you call myVec.end(): it gets you a pointer (here as an iterator) which points one element past the end of the array.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Out of the bounds in C++ and undefined behaviour - c++

Related

process array in chunks using struct then cast as flat array - how to avoid UB (strict aliasing)?

Is reinterpret_cast<char*>(myTypePtr) assumed to point to an array?

Is member access on a null pointer defined in C++?

UB When Dereferencing Array of Unions

Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?

Categories

Resources