What does "possibly-hypothetical" mean in the pointer arithmetic rules? - c++

In the standard's specification for pointer arithmetic ([expr.add]/4.2, we have:
Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i + j of x if 0 ≤ i + j ≤ n and the expression P - J points to the (possibly-hypothetical) array element i − j of x if 0 ≤ i − j ≤ n.
What does "possibly-hypothetical" mean here? The passage already constrains the resulting pointer to be in range of the array. Well, including the one-past-the-end slot. Is that what it's referring to?

Yes, it's the one-past-the-end "element".
[basic.compound]/3: [..] For purposes of pointer arithmetic ([expr.add]) and comparison ([expr.rel], [expr.eq]), a pointer past the end of the last element of an array x of n elements is considered to be equivalent to a pointer to a hypothetical array element n of x and an object of type T that is not an array element is considered to belong to an array with one element of type T. [..]

Related

Is it allowed to access the second row of a 2d array from the first? [duplicate]

I am wondering if the C++ standard guarantees that multidimensional arrays (not dynamically allocated) are flattened into a 1D array of exactly the same space. For example, if I have
char x[100];
char y[10][10];
Would these both be equivalent? I'm aware that most compilers would flatten y, but is this actually guaranteed to happen? Reading section 11.3.4 Arrays of the C++ Standard, I cannot actually find anywhere that guarantees this.
The C++ standard guarantees that y[i] follows immediately after y[i-1]. Since y[i-1] is 10 characters long, then, logically speaking, y[i] should take place 10 characters later in memory; however, could a compiler pad y[i-1] with extra characters to keep y[i] aligned?
What you are looking for is found in [dcl.array]/6
An object of type “array of N U” contains a contiguously allocated non-empty set of N subobjects of type U, known as the elements of the array, and numbered 0 to N-1.
What this states is that if you have an array like int arr[10] then to have 10 int's that are contiguous in memory. This definition works recursively though so if you have
int arr[5][10]
then what you have is an array of 5 int[10] arrays. If we apply the definition from above then we know that the 5 int[10] arrays are contiguous and then int[10]'s themselves are contiguous so all 50 int's are contiguous. So yes, a 2d array look just like a 1d array in memory since really that is what they are.
This does not mean you can get a pointer to arr[0][0] and iterate to arr[4][9] with it. Per [expr.add]/4
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n.
Otherwise, the behavior is undefined.
What this states is that if you have a pointer to an array, then the valid indices you can add to it are [0, array_size]. So if you did
int * it = &arr[0][0]
then what it points to is the first element of the first array which means you can legally only increment it to it + 10 since that is the past then end element of the first array. Going into the second array is UB even though they are contiguous.

Multi-Dimensional Arrays in Memory

I am wondering if the C++ standard guarantees that multidimensional arrays (not dynamically allocated) are flattened into a 1D array of exactly the same space. For example, if I have
char x[100];
char y[10][10];
Would these both be equivalent? I'm aware that most compilers would flatten y, but is this actually guaranteed to happen? Reading section 11.3.4 Arrays of the C++ Standard, I cannot actually find anywhere that guarantees this.
The C++ standard guarantees that y[i] follows immediately after y[i-1]. Since y[i-1] is 10 characters long, then, logically speaking, y[i] should take place 10 characters later in memory; however, could a compiler pad y[i-1] with extra characters to keep y[i] aligned?
What you are looking for is found in [dcl.array]/6
An object of type “array of N U” contains a contiguously allocated non-empty set of N subobjects of type U, known as the elements of the array, and numbered 0 to N-1.
What this states is that if you have an array like int arr[10] then to have 10 int's that are contiguous in memory. This definition works recursively though so if you have
int arr[5][10]
then what you have is an array of 5 int[10] arrays. If we apply the definition from above then we know that the 5 int[10] arrays are contiguous and then int[10]'s themselves are contiguous so all 50 int's are contiguous. So yes, a 2d array look just like a 1d array in memory since really that is what they are.
This does not mean you can get a pointer to arr[0][0] and iterate to arr[4][9] with it. Per [expr.add]/4
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n.
Otherwise, the behavior is undefined.
What this states is that if you have a pointer to an array, then the valid indices you can add to it are [0, array_size]. So if you did
int * it = &arr[0][0]
then what it points to is the first element of the first array which means you can legally only increment it to it + 10 since that is the past then end element of the first array. Going into the second array is UB even though they are contiguous.

Is accessing the middle of a multidimensional array via a pointer to its first element UB?

Consider the following code:
int data[2][2];
int* p(&data[0][0]);
p[3] = 0;
Or equivalently:
int data[2][2];
int (&row0)[2] = data[0];
int* p = &row0[0];
p[3] = 0;
It's not clear to me whether this is undefined behaviour or not.
p is a pointer to the first element of an array row0 with 2 elements, therefore p[3] accesses past the end of the array, which is UB according to 7.6.6 [expr.add]:
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
Otherwise, if P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0 ≤ i + j ≤ n and the expression P - J points to the (possibly-hypothetical) element x[i−j] if 0 ≤ i − j ≤ n.
Otherwise, the behavior is undefined.
I don't see anything in the standard that gives special treatment to multidimensional arrays, so I can only conclude that the above is, in fact, UB.
Am I correct?
What about the case of data being declared as std::array<std::array<int, 2>, 2>? This case seems even more likely to be UB, as structs may have padding.
Yes, you are correct, and there is not much to add to it. There are no mutidimensional arrays in C++ type system, there are only arrays (of arrays of arrays of arrays ad libitum).
Accessing an element beyond array size is undefined behavior.

Adding onto an array pointer

So I've recently come across something that isn't really intuitive to me and got me a little confused. If I allocate an array on the heap like this:
uint32_t* Array = new uint32_t[5];
and then try to add a certain amount of bytes to the array pointer like this:
Array + 3
the result is going to be Array + sizeof(uint32_t) * 3 instead of Array + 3.
Why is this being done?
Additive operators (§7.6.6/4) [expr.add]/4:
When an expression J that has integral type is added to or subtracted
from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0, the
result is a null pointer value.
Otherwise, if P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n and the expression P - J points to the (possibly-hypothetical) element *x[i − j] if 0 ≤ i − j ≤ n.
Otherwise, the behavior is undefined.
Subscripting (§7.6.1.1/1) [expr.sub]/1
A postfix expression followed by an expression in square brackets is a
postfix expression. One of the expressions shall be a glvalue of type
“array of T” or a prvalue of type “pointer to T” and the other shall
be a prvalue of unscoped enumeration or integral type. The result is
of type “T”. The type “T” shall be a completely-defined object type.
The expression E1[E2] is identical (by definition) to *((E1)+(E2)),
except that in the case of an array operand, the result is an lvalue
if that operand is an lvalue and an xvalue otherwise. The expression
E1 is sequenced before the expression E2.

Is pointer arithmetic on allocated storage UB?

Let's say I want to implement std::vector without invoking any undefined behavior (UB). Is the code below invokes UB:
struct X{int i;};
int main(){
auto p = static_cast<X*>(::operator new(sizeof(X)*2));
new(p) X{};
new(p+1) X{};// p+1 UB?
}
Folowing a selection of quote from the standard that may help:
[basic.stc.dynamic.allocation]
The pointer returned (by an allocation function) shall be suitably aligned so that it can be converted to a pointer to any
suitable complete object type (21.6.2.1) and then used to access the object or array in the storage allocated
(until the storage is explicitly deallocated by a call to a corresponding deallocation function).
[expr.add]
When an expression that has integral type is added to or subtracted from a pointer, the result has the type
of the pointer operand. If the expression P points to element x[i] of an array object x with n elements,
the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element
x[i + j] if 0<=i + j<=n; otherwise, the behavior is undefined. Likewise, the expression P - J points to the
(possibly-hypothetical) element x[i − j] if 0<=i − j <=n; otherwise, the behavior is undefined.
My interpretation is that allocation provides an possibly-hypothetical array of X (in C++ arrays are objects) so pointer arithmetic on allocated storage as in the exemple may not invoke undefined behavior. Or my interpretation of hypothetical is wrong? How could I do if the previous code snipest is UB?
Yes, technically it has undefined behaviour, though we tend to ignore that. P0593 should fix it properly.
The phrase "possibly-hypothetical" refers to one-past-the-end "elements" (ref), and does not permit this case.