The current draft standard (and presumably C++17) say in [basic.compound/4]:
[ Note: An array object and its first element are not pointer-interconvertible, even though they have the same address. — end note ]
So a pointer to an object cannot be reinterpret_cast'd to get its enclosing array pointer.
Now, there is std::launder, [ptr.launder/1]:
template<class T> [[nodiscard]] constexpr T* launder(T* p) noexcept;
Requires: p represents the address A of a byte in memory. An object X that is within its lifetime and whose type is similar to T is located at the address A. All bytes of storage that would be reachable through the result are reachable through p (see below).
And the definion of reachable is in [ptr.launder/3]:
Remarks: An invocation of this function may be used in a core constant expression whenever the value of its argument may be used in a core constant expression. A byte of storage is reachable through a pointer value that points to an object Y if it is within the storage occupied by Y, an object that is pointer-interconvertible with Y, or the immediately-enclosing array object if Y is an array element. The program is ill-formed if T is a function type or cv void.
Now, at first sight, it seems that std::launder is can be used to do the aforementioned conversion, because of the part I've put emphasis.
But. If p points to an object of an array, the bytes of the array is reachable according to this definition (even though p is not pointer-interconvertible to array-pointer), just like the result of the launder. So, it seems that the definition doesn't say anything about this issue.
So, can std::launder be used to convert an object pointer to its enclosing array pointer?
This depends on whether the enclosing array object is a complete object, and if not, whether you can validly access more bytes through a pointer to that enclosing array object (e.g., because it's an array element itself, or pointer-interconvertible with a larger object, or pointer-interconvertible with an object that's an array element). The "reachable" requirement means that you cannot use launder to obtain a pointer that would allow you to access more bytes than the source pointer value allows, on pain of undefined behavior. This ensures that the possibility that some unknown code may call launder does not affect the compiler's escape analysis.
I suppose some examples could help. Each example below reinterpret_casts a int* pointing to the first element of an array of 10 ints into a int(*)[10]. Since they are not pointer-interconvertible, the reinterpret_cast does not change the pointer value, and you get a int(*)[10] with the value of "pointer to the first element of (whatever the array is)". Each example then attempts to obtain a pointer to the entire array by calling std::launder on the cast pointer.
int x[10];
auto p = std::launder(reinterpret_cast<int(*)[10]>(&x[0]));
This is OK; you can access all elements of x through the source pointer, and the result of the launder doesn't allow you to access anything else.
int x2[2][10];
auto p2 = std::launder(reinterpret_cast<int(*)[10]>(&x2[0][0]));
This is undefined. You can only access elements of x2[0] through the source pointer, but the result (which would be a pointer to x2[0]) would have allowed you to access x2[1], which you can't through the source.
struct X { int a[10]; } x3, x4[2]; // assume no padding
auto p3 = std::launder(reinterpret_cast<int(*)[10]>(&x3.a[0])); // OK
This is OK. Again, you can't access through a pointer to x3.a any byte you can't access already.
auto p4 = std::launder(reinterpret_cast<int(*)[10]>(&x4[0].a[0]));
This is (intended to be) undefined. You would have been able to reach x4[1] from the result because x4[0].a is pointer-interconvertible with x4[0], so a pointer to the former can be reinterpret_cast to yield a pointer to the latter, which then can be used for pointer arithmetic. See https://wg21.link/LWG2859.
struct Y { int a[10]; double y; } x5;
auto p3 = std::launder(reinterpret_cast<int(*)[10]>(&x5.a[0]));
And this is again undefined, because you would have been able to reach x5.y from the resulting pointer (by reinterpret_cast to a Y*) but the source pointer can't be used to access it.
Remark: any non schizophrenic compiler will probably gladly accept that, as it would accept a C-style cast or a re-interpret cast, so just try and see is not an option.
But IMHO, the answer to your question is no. The emphasized immediately-enclosing array object if Y is an array element lies in a Remark paragraph, not in the Requires one. That means that provided the requires section is respected, the remarks one also applies. As an array and its element type are not similar types, the requirement is not satisfied and std::launder cannot be used.
What follows is more of a general (philosophycal?) interpretation. At the time of K&R C (in the 70's), C was intended to be able to replace assembly language. For that reason the rule was: the compiler must obey the programmer provided the source code can be translated. So no strict aliasing rule and a pointer was no more that an address with additional arithmetics rules. This strongly changed in C99 and C++03 (not speaking of C++11 +). Programmers are now supposed to use C++ as a high level language. That means that a pointer is just an object that allows to access another object of a given type, and an array and its element type are totally different types. Memory addresses are now little more than implementation details. So trying to convert a pointer to an array to a pointer to its first element is then against the philosophy of the language and could bite the programmer in a later version of the compiler. Of course real life compiler still accept it for compatibility reasons, but we should not even try to use it in modern programs.
Related
(In reference to this question and answer.)
Before the C++17 standard, the following sentence was included in [basic.compound]/3:
If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.
But since C++17, this sentence has been removed.
For example I believe that this sentence made this example code defined, and that since C++17 this is undefined behavior:
alignas(int) unsigned char buffer[2*sizeof(int)];
auto p1=new(buffer) int{};
auto p2=new(p1+1) int{};
*(p1+1)=10;
Before C++17, p1+1 holds the address to *p2 and has the right type, so *(p1+1) is a pointer to *p2. In C++17 p1+1 is a pointer past-the-end, so it is not a pointer to object and I believe it is not dereferencable.
Is this interpretation of this modification of the standard right or are there other rules that compensate the deletion of the cited sentence?
Is this interpretation of this modification of the standard right or are there other rules that compensate the deletion of this cited sentence?
Yes, this interpretation is correct. A pointer past the end isn't simply convertible to another pointer value that happens to point to that address.
The new [basic.compound]/3 says:
Every value of pointer type is one of the following:
(3.1)
a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2)
a pointer past the end of an object ([expr.add]), or
Those are mutually exclusive. p1+1 is a pointer past the end, not a pointer to an object. p1+1 points to a hypothetical x[1] of a size-1 array at p1, not to p2. Those two objects are not pointer-interconvertible.
We also have the non-normative note:
[ Note: A pointer past the end of an object ([expr.add]) is not considered to point to an unrelated object of the object's type that might be located at that address. [...]
which clarifies the intent.
As T.C. points out in numerous comments (notably this one), this is really a special case of the problem that comes with trying to implement std::vector - which is that [v.data(), v.data() + v.size()) needs to be a valid range and yet vector doesn't create an array object, so the only defined pointer arithmetic would be going from any given object in the vector to past-the-end of its hypothetical one-size array. Fore more resources, see CWG 2182, this std discussion, and two revisions of a paper on the subject: P0593R0 and P0593R1 (section 1.3 specifically).
In your example, *(p1 + 1) = 10; should be UB, because it is one past the end of the array of size 1. But we are in a very special case here, because the array was dynamically constructed in a larger char array.
Dynamic object creation is described in 4.5 The C++ object model [intro.object], §3 of the n4659 draft of the C++ standard:
3 If a complete object is created (8.3.4) in storage associated with another object e of type “array of N
unsigned char” or of type “array of N std::byte” (21.2.1), that array provides storage for the created
object if:
(3.1) — the lifetime of e has begun and not ended, and
(3.2) — the storage for the new object fits entirely within e, and
(3.3) — there is no smaller array object that satisfies these constraints.
The 3.3 seems rather unclear, but the examples below make the intent more clear:
struct A { unsigned char a[32]; };
struct B { unsigned char b[16]; };
A a;
B *b = new (a.a + 8) B; // a.a provides storage for *b
int *p = new (b->b + 4) int; // b->b provides storage for *p
// a.a does not provide storage for *p (directly),
// but *p is nested within a (see below)
So in the example, the buffer array provides storage for both *p1 and *p2.
The following paragraphs prove that the complete object for both *p1 and *p2 is buffer:
4 An object a is nested within another object b if:
(4.1) — a is a subobject of b, or
(4.2) — b provides storage for a, or
(4.3) — there exists an object c where a is nested within c, and c is nested within b.
5 For every object x, there is some object called the complete object of x, determined as follows:
(5.1) — If x is a complete object, then the complete object of x is itself.
(5.2) — Otherwise, the complete object of x is the complete object of the (unique) object that contains x.
Once this is established, the other relevant part of draft n4659 for C++17 is [basic.coumpound] §3(emphasize mine):
3 ... Every
value of pointer type is one of the following:
(3.1) — a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2) — a pointer past the end of an object (8.7), or
(3.3) — the null pointer value (7.11) for that type, or
(3.4) — an invalid pointer value.
A value of a pointer type that is a pointer to or past the end of an object represents the address of the
first byte in memory (4.4) occupied by the object or the first byte in memory after the end of the storage
occupied by the object, respectively. [ Note: A pointer past the end of an object (8.7) is not considered to
point to an unrelated object of the object’s type that might be located at that address. A pointer value
becomes invalid when the storage it denotes reaches the end of its storage duration; see 6.7. —end note ]
For purposes of pointer arithmetic (8.7) and comparison (8.9, 8.10), a pointer past the end of the last element
of an array x of n elements is considered to be equivalent to a pointer to a hypothetical element x[n]. The
value representation of pointer types is implementation-defined. Pointers to layout-compatible types shall
have the same value representation and alignment requirements (6.11)...
The note A pointer past the end... does not apply here because the objects pointed to by p1 and p2 and not unrelated, but are nested into the same complete object, so pointer arithmetics make sense inside the object that provide storage: p2 - p1 is defined and is (&buffer[sizeof(int)] - buffer]) / sizeof(int) that is 1.
So p1 + 1 is a pointer to *p2, and *(p1 + 1) = 10; has defined behaviour and sets the value of *p2.
I have also read the C4 annex on the compatibility between C++14 and current (C++17) standards. Removing the possibility to use pointer arithmetics between objects dynamically created in a single character array would be an important change that IMHO should be cited there, because it is a commonly used feature. As nothing about it exist in the compatibility pages, I think that it confirms that it was not the intent of the standard to forbid it.
In particular, it would defeat that common dynamic construction of an array of objects from a class with no default constructor:
class T {
...
public T(U initialization) {
...
}
};
...
unsigned char *mem = new unsigned char[N * sizeof(T)];
T * arr = reinterpret_cast<T*>(mem); // See the array as an array of N T
for (i=0; i<N; i++) {
U u(...);
new(arr + i) T(u);
}
arr can then be used as a pointer to the first element of an array...
To expand on the answers given here is an example of what I believe the revised wording is excluding:
Warning: Undefined Behaviour
#include <iostream>
int main() {
int A[1]{7};
int B[1]{10};
bool same{(B)==(A+1)};
std::cout<<B<< ' '<< A <<' '<<sizeof(*A)<<'\n';
std::cout<<(same?"same":"not same")<<'\n';
std::cout<<*(A+1)<<'\n';//!!!!!
return 0;
}
For entirely implementation dependent (and fragile) reasons possible output of this program is:
0x7fff1e4f2a64 0x7fff1e4f2a60 4
same
10
That output shows that the two arrays (in that case) happen to be stored in memory such that 'one past the end' of A happens to hold the value of the address of the first element of B.
The revised specification is ensuring that regardless A+1 is never a valid pointer to B. The old phrase 'regardless of how the value is obtained' says that if 'A+1' happens to point to 'B[0]' then it's a valid pointer to 'B[0]'.
That can't be good and surely never the intention.
The current draft standard (and presumably C++17) say in [basic.compound/4]:
[ Note: An array object and its first element are not pointer-interconvertible, even though they have the same address. — end note ]
So a pointer to an object cannot be reinterpret_cast'd to get its enclosing array pointer.
Now, there is std::launder, [ptr.launder/1]:
template<class T> [[nodiscard]] constexpr T* launder(T* p) noexcept;
Requires: p represents the address A of a byte in memory. An object X that is within its lifetime and whose type is similar to T is located at the address A. All bytes of storage that would be reachable through the result are reachable through p (see below).
And the definion of reachable is in [ptr.launder/3]:
Remarks: An invocation of this function may be used in a core constant expression whenever the value of its argument may be used in a core constant expression. A byte of storage is reachable through a pointer value that points to an object Y if it is within the storage occupied by Y, an object that is pointer-interconvertible with Y, or the immediately-enclosing array object if Y is an array element. The program is ill-formed if T is a function type or cv void.
Now, at first sight, it seems that std::launder is can be used to do the aforementioned conversion, because of the part I've put emphasis.
But. If p points to an object of an array, the bytes of the array is reachable according to this definition (even though p is not pointer-interconvertible to array-pointer), just like the result of the launder. So, it seems that the definition doesn't say anything about this issue.
So, can std::launder be used to convert an object pointer to its enclosing array pointer?
This depends on whether the enclosing array object is a complete object, and if not, whether you can validly access more bytes through a pointer to that enclosing array object (e.g., because it's an array element itself, or pointer-interconvertible with a larger object, or pointer-interconvertible with an object that's an array element). The "reachable" requirement means that you cannot use launder to obtain a pointer that would allow you to access more bytes than the source pointer value allows, on pain of undefined behavior. This ensures that the possibility that some unknown code may call launder does not affect the compiler's escape analysis.
I suppose some examples could help. Each example below reinterpret_casts a int* pointing to the first element of an array of 10 ints into a int(*)[10]. Since they are not pointer-interconvertible, the reinterpret_cast does not change the pointer value, and you get a int(*)[10] with the value of "pointer to the first element of (whatever the array is)". Each example then attempts to obtain a pointer to the entire array by calling std::launder on the cast pointer.
int x[10];
auto p = std::launder(reinterpret_cast<int(*)[10]>(&x[0]));
This is OK; you can access all elements of x through the source pointer, and the result of the launder doesn't allow you to access anything else.
int x2[2][10];
auto p2 = std::launder(reinterpret_cast<int(*)[10]>(&x2[0][0]));
This is undefined. You can only access elements of x2[0] through the source pointer, but the result (which would be a pointer to x2[0]) would have allowed you to access x2[1], which you can't through the source.
struct X { int a[10]; } x3, x4[2]; // assume no padding
auto p3 = std::launder(reinterpret_cast<int(*)[10]>(&x3.a[0])); // OK
This is OK. Again, you can't access through a pointer to x3.a any byte you can't access already.
auto p4 = std::launder(reinterpret_cast<int(*)[10]>(&x4[0].a[0]));
This is (intended to be) undefined. You would have been able to reach x4[1] from the result because x4[0].a is pointer-interconvertible with x4[0], so a pointer to the former can be reinterpret_cast to yield a pointer to the latter, which then can be used for pointer arithmetic. See https://wg21.link/LWG2859.
struct Y { int a[10]; double y; } x5;
auto p3 = std::launder(reinterpret_cast<int(*)[10]>(&x5.a[0]));
And this is again undefined, because you would have been able to reach x5.y from the resulting pointer (by reinterpret_cast to a Y*) but the source pointer can't be used to access it.
Remark: any non schizophrenic compiler will probably gladly accept that, as it would accept a C-style cast or a re-interpret cast, so just try and see is not an option.
But IMHO, the answer to your question is no. The emphasized immediately-enclosing array object if Y is an array element lies in a Remark paragraph, not in the Requires one. That means that provided the requires section is respected, the remarks one also applies. As an array and its element type are not similar types, the requirement is not satisfied and std::launder cannot be used.
What follows is more of a general (philosophycal?) interpretation. At the time of K&R C (in the 70's), C was intended to be able to replace assembly language. For that reason the rule was: the compiler must obey the programmer provided the source code can be translated. So no strict aliasing rule and a pointer was no more that an address with additional arithmetics rules. This strongly changed in C99 and C++03 (not speaking of C++11 +). Programmers are now supposed to use C++ as a high level language. That means that a pointer is just an object that allows to access another object of a given type, and an array and its element type are totally different types. Memory addresses are now little more than implementation details. So trying to convert a pointer to an array to a pointer to its first element is then against the philosophy of the language and could bite the programmer in a later version of the compiler. Of course real life compiler still accept it for compatibility reasons, but we should not even try to use it in modern programs.
(In reference to this question and answer.)
Before the C++17 standard, the following sentence was included in [basic.compound]/3:
If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.
But since C++17, this sentence has been removed.
For example I believe that this sentence made this example code defined, and that since C++17 this is undefined behavior:
alignas(int) unsigned char buffer[2*sizeof(int)];
auto p1=new(buffer) int{};
auto p2=new(p1+1) int{};
*(p1+1)=10;
Before C++17, p1+1 holds the address to *p2 and has the right type, so *(p1+1) is a pointer to *p2. In C++17 p1+1 is a pointer past-the-end, so it is not a pointer to object and I believe it is not dereferencable.
Is this interpretation of this modification of the standard right or are there other rules that compensate the deletion of the cited sentence?
Is this interpretation of this modification of the standard right or are there other rules that compensate the deletion of this cited sentence?
Yes, this interpretation is correct. A pointer past the end isn't simply convertible to another pointer value that happens to point to that address.
The new [basic.compound]/3 says:
Every value of pointer type is one of the following:
(3.1)
a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2)
a pointer past the end of an object ([expr.add]), or
Those are mutually exclusive. p1+1 is a pointer past the end, not a pointer to an object. p1+1 points to a hypothetical x[1] of a size-1 array at p1, not to p2. Those two objects are not pointer-interconvertible.
We also have the non-normative note:
[ Note: A pointer past the end of an object ([expr.add]) is not considered to point to an unrelated object of the object's type that might be located at that address. [...]
which clarifies the intent.
As T.C. points out in numerous comments (notably this one), this is really a special case of the problem that comes with trying to implement std::vector - which is that [v.data(), v.data() + v.size()) needs to be a valid range and yet vector doesn't create an array object, so the only defined pointer arithmetic would be going from any given object in the vector to past-the-end of its hypothetical one-size array. Fore more resources, see CWG 2182, this std discussion, and two revisions of a paper on the subject: P0593R0 and P0593R1 (section 1.3 specifically).
In your example, *(p1 + 1) = 10; should be UB, because it is one past the end of the array of size 1. But we are in a very special case here, because the array was dynamically constructed in a larger char array.
Dynamic object creation is described in 4.5 The C++ object model [intro.object], §3 of the n4659 draft of the C++ standard:
3 If a complete object is created (8.3.4) in storage associated with another object e of type “array of N
unsigned char” or of type “array of N std::byte” (21.2.1), that array provides storage for the created
object if:
(3.1) — the lifetime of e has begun and not ended, and
(3.2) — the storage for the new object fits entirely within e, and
(3.3) — there is no smaller array object that satisfies these constraints.
The 3.3 seems rather unclear, but the examples below make the intent more clear:
struct A { unsigned char a[32]; };
struct B { unsigned char b[16]; };
A a;
B *b = new (a.a + 8) B; // a.a provides storage for *b
int *p = new (b->b + 4) int; // b->b provides storage for *p
// a.a does not provide storage for *p (directly),
// but *p is nested within a (see below)
So in the example, the buffer array provides storage for both *p1 and *p2.
The following paragraphs prove that the complete object for both *p1 and *p2 is buffer:
4 An object a is nested within another object b if:
(4.1) — a is a subobject of b, or
(4.2) — b provides storage for a, or
(4.3) — there exists an object c where a is nested within c, and c is nested within b.
5 For every object x, there is some object called the complete object of x, determined as follows:
(5.1) — If x is a complete object, then the complete object of x is itself.
(5.2) — Otherwise, the complete object of x is the complete object of the (unique) object that contains x.
Once this is established, the other relevant part of draft n4659 for C++17 is [basic.coumpound] §3(emphasize mine):
3 ... Every
value of pointer type is one of the following:
(3.1) — a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2) — a pointer past the end of an object (8.7), or
(3.3) — the null pointer value (7.11) for that type, or
(3.4) — an invalid pointer value.
A value of a pointer type that is a pointer to or past the end of an object represents the address of the
first byte in memory (4.4) occupied by the object or the first byte in memory after the end of the storage
occupied by the object, respectively. [ Note: A pointer past the end of an object (8.7) is not considered to
point to an unrelated object of the object’s type that might be located at that address. A pointer value
becomes invalid when the storage it denotes reaches the end of its storage duration; see 6.7. —end note ]
For purposes of pointer arithmetic (8.7) and comparison (8.9, 8.10), a pointer past the end of the last element
of an array x of n elements is considered to be equivalent to a pointer to a hypothetical element x[n]. The
value representation of pointer types is implementation-defined. Pointers to layout-compatible types shall
have the same value representation and alignment requirements (6.11)...
The note A pointer past the end... does not apply here because the objects pointed to by p1 and p2 and not unrelated, but are nested into the same complete object, so pointer arithmetics make sense inside the object that provide storage: p2 - p1 is defined and is (&buffer[sizeof(int)] - buffer]) / sizeof(int) that is 1.
So p1 + 1 is a pointer to *p2, and *(p1 + 1) = 10; has defined behaviour and sets the value of *p2.
I have also read the C4 annex on the compatibility between C++14 and current (C++17) standards. Removing the possibility to use pointer arithmetics between objects dynamically created in a single character array would be an important change that IMHO should be cited there, because it is a commonly used feature. As nothing about it exist in the compatibility pages, I think that it confirms that it was not the intent of the standard to forbid it.
In particular, it would defeat that common dynamic construction of an array of objects from a class with no default constructor:
class T {
...
public T(U initialization) {
...
}
};
...
unsigned char *mem = new unsigned char[N * sizeof(T)];
T * arr = reinterpret_cast<T*>(mem); // See the array as an array of N T
for (i=0; i<N; i++) {
U u(...);
new(arr + i) T(u);
}
arr can then be used as a pointer to the first element of an array...
To expand on the answers given here is an example of what I believe the revised wording is excluding:
Warning: Undefined Behaviour
#include <iostream>
int main() {
int A[1]{7};
int B[1]{10};
bool same{(B)==(A+1)};
std::cout<<B<< ' '<< A <<' '<<sizeof(*A)<<'\n';
std::cout<<(same?"same":"not same")<<'\n';
std::cout<<*(A+1)<<'\n';//!!!!!
return 0;
}
For entirely implementation dependent (and fragile) reasons possible output of this program is:
0x7fff1e4f2a64 0x7fff1e4f2a60 4
same
10
That output shows that the two arrays (in that case) happen to be stored in memory such that 'one past the end' of A happens to hold the value of the address of the first element of B.
The revised specification is ensuring that regardless A+1 is never a valid pointer to B. The old phrase 'regardless of how the value is obtained' says that if 'A+1' happens to point to 'B[0]' then it's a valid pointer to 'B[0]'.
That can't be good and surely never the intention.
Let's say I have a function, called like this:
void mysort(int *arr, std::size_t size)
{
std::sort(&arr[0], &arr[size]);
}
int main()
{
int a[] = { 42, 314 };
mysort(a, 2);
}
My question is: does the code of mysort (more specifically, &arr[size]) have defined behaviour?
I know it would be perfectly valid if replaced by arr + size; pointer arithmetic allows pointing past-the-end normally. However, my question is specifically about the use of & and [].
Per C++11 5.2.1/1, arr[size] is equivalent to *(arr + size).
Quoting 5.3.1/1, the rules for unary *:
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an
object type, or a pointer to a function type and the result is an lvalue referring to the object or function
to which the expression points. If the type of the expression is “pointer to T,” the type of the result is “T.”
[ Note: a pointer to an incomplete type (other than cv void) can be dereferenced. The lvalue thus obtained
can be used in limited ways (to initialize a reference, for example); this lvalue must not be converted to a
prvalue, see 4.1. —end note ]
Finally, 5.3.1/3 giving the rules for &:
The result of the unary & operator is a pointer to its operand. The operand shall be an lvalue ... if the type of
the expression is T, the result has type “pointer to T” and is a prvalue that is the address of the designated object (1.7) or a pointer to the designated function.
(Emphasis and ellipses mine).
I can't quite make up my mind about this. I know for sure that forcing an lvalue-to-rvalue conversion on arr[size] would be Undefined. But no such conversion happens in the code. arr + size does not point to an object; but while the paragraphs above talk about objects, they never seem to explicitly call out the necessity for an object to actually exist at that location (unlike e.g. the lvalue-to-rvalue conversion in 4.1/1).
So, the questio is: is mysort, the way it's called, valid or not?
(Note that I'm quoting C++11 above, but if this is handled more explicitly in a later standard/draft, I would be perfectly happy with that).
It's not valid. You bolded "result is an lvalue referring to the object or function to which the expression points" in your question. That's exactly the problem. array + size is a valid pointer value that does not point to an object. Therefore, your quote about *(array + size) does not specify what the result refers to, and that then means there is no requirement for &*(array + size) to give the same value as array + size.
In C, this was considered a defect and fixed so that the spec now says in &*ptr, neither & nor * gets evaluated. C++ hasn't yet received fixed wording. It's the subject of a very old still active DR: DR #232. The intent is that it is valid, just as it is in C, but the standard doesn't say so.
In the context of normal C++ arrays, yes. It is legal to form the address of the one-past-the-end element of the array. It is not legal to read or write to what it is pointing at, however (after all, there is no actual element there). So when you do the &arr[size], the arr[size] forms what you might think of as a reference to the one-past-the-end element, but has not tried to actually access that element yet. Then the & gets you the address of that element. Since nothing has tried to actually follow that pointer, nothing bad has happened.
This isn't by accident, this makes pointers into arrays behave like iterators. Thus &a[0] is essentially .begin() on the array, and &a[size] (where size is the number of elements in the array) is essentially .end(). (See also std::array where this ends up being more explicit)
Edit: Erm, I may have to retract this answer. While it probably applies in most cases, if the type stored in the array has an overridden operator& then when you do the &a[size], the operator& method may attempt to access members of the instance of the type at a[size] where there is no instance.
Assuming size is the actual array size, you are passing a pointer to past-the-end element to std::sort().
So, as I understand it, the question boils down to: is this pointer equivalent to arr.end()?
There is little doubt this is true for every existing compiler, since array iterators are indeed plain old pointers, so &arr[size] is the obvious choice for arr.end().
However, I doubt there is a specific requirement about the actual implementation of plain old array iterators.
So, for the sake of the argument, you could imagine a compiler using a "past end" bit in addition to the actual address to implement plain old array iterators internally and perversely paint your mustache pink if it detected any concievable inconsistency between iterators and addresses obtained through pointer arithmetics.
This freakish compiler would cause a lot of existing C++ code to crash without actually violating the spec, which might just be worth the effort of designing it...
If we admit that arr[i] is just a shorthand for *(arr + i), we can rewrite &arr[size] as &*(arr + size). Hence, we are dereferencing a pointer that points to the past-the-end element, which leads to an undefined behavior. As you correctly say, arr + size would instead be legal, because no dereferencing operation takes place.
Coincidentally, this is also presented as a quiz in Stepanov's notes (page 11).
It's perfectly fine and well defined as long as size is not larger than the size of the actual array (in units of the array elements).
So if main () called mysort (a, 100), &arr [size] would already be undefined behaviour (but most likely undetected, but std::sort would obviously go wrong badly as well).
I had a discussion with someone on IRC and this question turned up. We are allowed by the Standard to change an object of type int by a char lvalue.
int a;
char *b = (char*) &a;
*b = 0;
Would we be allowed to do this in the opposite direction, if we know that the alignment is fine?
The issue I'm seeing is that the aliasing rule does not cover the simple case of the following, if one considers the aliasing rule as a non-symmetric relation
int a;
a = 0;
The reason is, that each object contains a sequence of sizeof(obj) unsigned char objects (called the "object representation"). If we change the int, we will change some or all of those objects. However, the aliasing rule only states we are allowed to change a int by an char or unsigned char, but not the other way around. Another example
int a[1];
int *ra = a;
*ra = 0;
Only one direction is described by 3.10/15 ("An aggregate or union type that includes..."), but this time we need the other way around ("A type that is the element or non-static data member type of an aggregate...").
Is the other direction implied? This question also applies to C.
The aliasing rule simply states that there's one "effective type" (C99 6.5.7, plus footnote 73) for any given object in memory, and any accesses to such an object go through one of:
A type compatible with the effective type (qualifiers such as const and restrict, as well signed/unsigned-ness may vary)
A struct or union containing one such type
A character type
The effective type isn't specified in advanced, of course - it's just a construct that is used to specify aliasing. But the intent is simply that you don't access the same object with two different non-character types.
So the answer is, yes, you can indeed go the other direction.
The standard (C99 6.3.2.3 §7) defines pointer casts like this as "just fine", the casted pointer will point at the same address. (Unless the CPU has alignment which makes the cast impossible, then it is undefined behavior.)
That is, the actual cast in itself is fine. What will happen if you start to manipulate the data... now that's another implementation-defined story.
Here's from the standard:
"A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned (57) for the pointed-to type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.
When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers
to the remaining bytes of the object."
"57) In general, the concept ‘‘correctly aligned’’ is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is
correctly aligned for a pointer to type C."
I think I'm a bit confused by the question, but the 2nd and 3rd examples are accessing the int through an lvalue that has the object's type (int in the examples).
C++ 3.10/15 states as it's first item that it's OK to access the object through an lvalue that has the the type of "the dynamic type of the object".
What am I misunderstanding in the question?