I am wondering if the C++ standard guarantees that multidimensional arrays (not dynamically allocated) are flattened into a 1D array of exactly the same space. For example, if I have
char x[100];
char y[10][10];
Would these both be equivalent? I'm aware that most compilers would flatten y, but is this actually guaranteed to happen? Reading section 11.3.4 Arrays of the C++ Standard, I cannot actually find anywhere that guarantees this.
The C++ standard guarantees that y[i] follows immediately after y[i-1]. Since y[i-1] is 10 characters long, then, logically speaking, y[i] should take place 10 characters later in memory; however, could a compiler pad y[i-1] with extra characters to keep y[i] aligned?
What you are looking for is found in [dcl.array]/6
An object of type “array of N U” contains a contiguously allocated non-empty set of N subobjects of type U, known as the elements of the array, and numbered 0 to N-1.
What this states is that if you have an array like int arr[10] then to have 10 int's that are contiguous in memory. This definition works recursively though so if you have
int arr[5][10]
then what you have is an array of 5 int[10] arrays. If we apply the definition from above then we know that the 5 int[10] arrays are contiguous and then int[10]'s themselves are contiguous so all 50 int's are contiguous. So yes, a 2d array look just like a 1d array in memory since really that is what they are.
This does not mean you can get a pointer to arr[0][0] and iterate to arr[4][9] with it. Per [expr.add]/4
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n.
Otherwise, the behavior is undefined.
What this states is that if you have a pointer to an array, then the valid indices you can add to it are [0, array_size]. So if you did
int * it = &arr[0][0]
then what it points to is the first element of the first array which means you can legally only increment it to it + 10 since that is the past then end element of the first array. Going into the second array is UB even though they are contiguous.
Related
Suppose you have an array:
int array[SIZE];
or
int *array = new(int[SIZE]);
Does C or C++ guarantee that array < array + SIZE, and if so where?
I understand that regardless of the language spec, many operating systems guarantee this property by reserving the top of the virtual address space for the kernel. My question is whether this is also guaranteed by the language, rather than just by the vast majority of implementations.
As an example, suppose an OS kernel lives in low memory and sometimes gives the highest page of virtual memory out to user processes in response to mmap requests for anonymous memory. If malloc or ::operator new[] directly calls mmap for the allocation of a huge array, and the end of the array abuts the top of the virtual address space such that array + SIZE wraps around to zero, does this amount to a non-compliant implementation of the language?
Clarification
Note that the question is not asking about array+(SIZE-1), which is the address of the last element of the array. That one is guaranteed to be greater than array. The question is about a pointer one past the end of an array, or also p+1 when p is a pointer to a non-array object (which the section of the standard pointed to by the selected answer makes clear is treated the same way).
Stackoverflow has asked me to clarify why this question is not the same as this one. The other question asks how to implement total ordering of pointers. That other question essentially boils down to how could a library implement std::less such that it works even for pointers to differently allocated objects, which the standard says can only be compared for equality, not greater and less than.
In contrast, my question was about whether one past the end of an array is always guaranteed to be greater than the array. Whether the answer to my question is yes or no doesn't actually change how you would implement std::less, so the other question doesn't seem relevant. If it's illegal to compare to one past the end of an array, then std::less could simply exhibit undefined behavior in this case. (Also, typically the standard library is implemented by the same people as the compiler, and so is free to take advantage of properties of the particular compiler.)
Yes. From section 6.5.8 para 5.
If the expression P points to an element of an array object
and the expression Q points to the last element of the same array
object, the pointer expression Q+1 compares greater than P.
Expression array is P. The expression array + SIZE - 1 points to the last element of array, which is Q.
Thus:
array + SIZE = array + SIZE - 1 + 1 = Q + 1 > P = array
C requires this. Section 6.5.8 para 5 says:
pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values
I'm sure there's something analogous in the C++ specification.
This requirement effectively prevents allocating objects that wrap around the address space on common hardware, because it would be impractical to implement all the bookkeeping necessary to implement the relational operator efficiently.
The guarantee does not hold for the case int *array = new(int[SIZE]); when SIZE is zero .
The result of new int[0] is required to be a valid pointer that can have 0 added to it , but array == array + SIZE in this case, and a strictly less-than test will yield false.
This is defined in C++, from 7.6.6.4 (p139 of current C++23 draft):
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
(4.1) — If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
(4.2) — Otherwise, if P points to an array element i of an array object x with n elements (9.3.4.5) the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i + j of x if 0 <= i + j <= n and the expression P - J points to the (possibly-hypothetical) array element i − j of x if 0 <= i − j <= n.
(4.3) — Otherwise, the behavior is undefined.
Note that 4.2 explicitly has "<= n", not "< n". It's undefined for any value larger than size(), but is defined for size().
The ordering of array elements is defined in 7.6.9 (p141):
(4.1) If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript is required to compare greater.
Which means the hypothetical element n will compare greater than the array itself (element 0) for all well defined cases of n > 0.
The relevant rule in C++ is [expr.rel]/4.1:
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript is required to compare greater.
The above rule appears to only cover pointers to array elements, and array + SIZE doesn't point to an array element. However, as mentioned in the footnote, a one-past-the-end pointer is treated as if it were an array element here. The relevant language rule is in [basic.compound]/3:
For purposes of pointer arithmetic ([expr.add]) and comparison ([expr.rel], [expr.eq]), a pointer past the end of the last element of an array x of n elements is considered to be equivalent to a pointer to a hypothetical array element n of x and an object of type T that is not an array element is considered to belong to an array with one element of type T.
So C++ guarantees that array + SIZE > array (at least when SIZE > 0), and that &x + 1 > &x for any object x.
array is guaranteed to have consecutive memory space inside. after c++03 or so vectors is guaranteed to have one too for its &vec[0] ... &vec[vec.size() - 1]. This automatically means that that what you're asking about is true
it's called contiguous storage . can be found here for vectors
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0944r0.html
The elements of a vector are stored contiguously, meaning that if v is a vector<T, Allocator> where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size(). Presumably five more years of studying the interactions of contiguity with caching made it clear to WG21 that contiguity needed to be mandated and non-contiguous vector implementation should be clearly banned.
latter is from standard docs. C++03 I've guessed right.
I am wondering if the C++ standard guarantees that multidimensional arrays (not dynamically allocated) are flattened into a 1D array of exactly the same space. For example, if I have
char x[100];
char y[10][10];
Would these both be equivalent? I'm aware that most compilers would flatten y, but is this actually guaranteed to happen? Reading section 11.3.4 Arrays of the C++ Standard, I cannot actually find anywhere that guarantees this.
The C++ standard guarantees that y[i] follows immediately after y[i-1]. Since y[i-1] is 10 characters long, then, logically speaking, y[i] should take place 10 characters later in memory; however, could a compiler pad y[i-1] with extra characters to keep y[i] aligned?
What you are looking for is found in [dcl.array]/6
An object of type “array of N U” contains a contiguously allocated non-empty set of N subobjects of type U, known as the elements of the array, and numbered 0 to N-1.
What this states is that if you have an array like int arr[10] then to have 10 int's that are contiguous in memory. This definition works recursively though so if you have
int arr[5][10]
then what you have is an array of 5 int[10] arrays. If we apply the definition from above then we know that the 5 int[10] arrays are contiguous and then int[10]'s themselves are contiguous so all 50 int's are contiguous. So yes, a 2d array look just like a 1d array in memory since really that is what they are.
This does not mean you can get a pointer to arr[0][0] and iterate to arr[4][9] with it. Per [expr.add]/4
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n.
Otherwise, the behavior is undefined.
What this states is that if you have a pointer to an array, then the valid indices you can add to it are [0, array_size]. So if you did
int * it = &arr[0][0]
then what it points to is the first element of the first array which means you can legally only increment it to it + 10 since that is the past then end element of the first array. Going into the second array is UB even though they are contiguous.
Consider the following code:
int data[2][2];
int* p(&data[0][0]);
p[3] = 0;
Or equivalently:
int data[2][2];
int (&row0)[2] = data[0];
int* p = &row0[0];
p[3] = 0;
It's not clear to me whether this is undefined behaviour or not.
p is a pointer to the first element of an array row0 with 2 elements, therefore p[3] accesses past the end of the array, which is UB according to 7.6.6 [expr.add]:
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
Otherwise, if P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0 ≤ i + j ≤ n and the expression P - J points to the (possibly-hypothetical) element x[i−j] if 0 ≤ i − j ≤ n.
Otherwise, the behavior is undefined.
I don't see anything in the standard that gives special treatment to multidimensional arrays, so I can only conclude that the above is, in fact, UB.
Am I correct?
What about the case of data being declared as std::array<std::array<int, 2>, 2>? This case seems even more likely to be UB, as structs may have padding.
Yes, you are correct, and there is not much to add to it. There are no mutidimensional arrays in C++ type system, there are only arrays (of arrays of arrays of arrays ad libitum).
Accessing an element beyond array size is undefined behavior.
As far as I know, multidimensional array on stack will occupy continuous memory in row order. Is it undefined behavior to index multidimensional array using a pointer to elements according to ISO C++ Standard? For example:
#include <iostream>
#include <type_traits>
int main() {
int a[5][4]{{1,2,3,4},{},{5,6,7,8}};
constexpr auto sz = sizeof(a) / sizeof(std::remove_all_extents<decltype(a)>::type);
int *p = &a[0][0];
int i = p[11]; // <-- here
p[19] = 20; // <-- here
for (int k = 0; k < sz; ++k)
std::cout << p[k] << ' '; // <-- and here
return 0;
}
Above code will compile and run correctly if pointer does not go out of the boundary of array a. But is this happen because of compiler defined behavior or language standard? Any reference from the ISO C++ Standard would be best.
The problem here is the strict aliasing rule that exists in my draft n3337 for C++11 in 3.10 Lvalues and rvalues [basic.lval] § 10. This is an exhaustive list that does not explicetely allow to alias a multidimensional array to an unidimensional one of the whole size.
So even if it is indeed required that arrays are allocated consecutively in memory, which proves that the size of a multidimensional array, say for example T arr[n][m] is the product of is dimensions by the size of an element: n * m *sizeof(T). When converted to char pointers, you can even do arithmetic pointer operations on the whole array, because any pointer to an object can be converted to a char pointer, and that char pointer can be used to access the consecutive bytes of the object (*).
But unfortunately, for any other type, the standard only allow arithmetic pointer operations inside one array (and by definition dereferening an array element is the same as dereferencing a pointer after pointer arithmetics: a[i] is *(a + i)). So if you both respect the rule on pointer arithmetics and the strict aliasing rule, the global indexing of a multi-dimensional array is not defined by C++11 standard, unless you go through char pointer arithmetics:
int a[3][4];
int *p = &a[0][0]; // perfectly defined
int b = p[3]; // ok you are in same row which means in same array
b = p[5]; // OUPS: you dereference past the declared array that builds first row
char *cq = (((char *) p) + 5 * sizeof(int)); // ok: char pointer arithmetics inside an object
int *q = (int *) cq; // ok because what lies there is an int object
b = *q; // almost the same as p[5] but behaviour is defined
That char pointer arithmetics along with the fear of breaking a lot of existing code explains why all well known compiler silently accept the aliasing of a multi-dimensional array with a 1D one of same global size (it leads to same internal code), but technically, the global pointer arithmetics is only valid for char pointers.
(*) The standard declares in 1.7 The C++ memory model [intro.memory] that
The fundamental storage unit in the C++ memory model is the byte... The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every
byte has a unique address.
and later in 3.9 Types [basic.types] §2
For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object
holds a valid value of type T, the underlying bytes making up the object can be copied into an array
of char or unsigned char.
and to copy them you must access them through a char * or unsigned char *
I believe the behavior in your example is technically undefined.
The standard has no concept of a multidimensional array. What you've actually declared is an "array of 5 arrays of 4 ints". That is a[0] and a[1] are actually two different arrays of 4 ints, both of which are contained in the array a. What this means is that a[0][0] and a[1][0] are not elements of the same array.
[expr.add]/4 says the following (emphasis mine)
When an expression that has integral type is added to or subtracted from a pointer, the result has the type
of the pointer operand. If the pointer operand points to an element of an array object, and the array is
large enough, the result points to an element offset from the original element such that the difference of
the subscripts of the resulting and original array elements equals the integral expression. In other words, if
the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P))
and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array
object, provided they exist. Moreover, if the expression P points to the last element of an array object,
the expression (P)+1 points one past the last element of the array object, and if the expression Q points
one past the last element of an array object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is
undefined
So, since p[11] expands to *(p + 11) and since p and p + 11 are not elements of the same array (one is an element of a[0] and the other is more than one element past the end of a[0]), the behavior of that addition is undefined.
I would, however, be very surprised to find any implementation where such an addition resulted in anything other than the one you expect.
if you declare
int arr[3][4][5];
the type of arr is int[3][4][5], type of arr[3] is int[4][5], etc. Array of array of arrays, but NOT an array of pointers. Let's see what happens if we increment first index? It would shift pointer forward by size of array element, but array element of arr is a two-dimensional array! It is equivalent to incrementing: arr + sizeof(int[4][5])/sizeof(int) or arr + 20.
Iterating this way we'll find that arr[a][b][c] equals to *(*(*(arr + a) + b) + c), provided that there is never any padding with arrays (to comply with mandatory compatibility of POD types with C99):
*((int*)arr + 20*a + 5*b + c)
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integral expression
I, very occasionally, make use of multidimensional arrays, and got curious what the standard says (C11 and/or C++11) about the behavior of indexing with less "dimensions" than the one declared for the array.
Given:
int a[2][2][2] = {{{1, 2}, {3, 4}}, {{5, 6}, {7, 8}}};
Does the standard says what type a[1] is, or a[0][1], is it legal, and whether it should properly index sub-arrays as expected?
auto& b = a[1];
std::cout << b[1][1];
m[1] is just of type int[2][2]. Likewise m[0][1] is just int[2]. And yes, indexing as sub-arrays works the way you think it does.
Does the standard define the type for a[i] where a is T [M][N]?
Of course. The standard basically defines the types of all expressions, and if it does not, it would be a defect report. But I guess you are more interested on what that type might be...
While the standard may not explicitly mention your case, the rules are stated and are simple, given an array a of N elements of type T, the expression a[0] is an lvalue expression of type T. In the variable declaration int a[2][2] the type of a is array of 2 elements of type array of two elements of type int, which applying the rule above means that a[0] is lvalue to an array of 2 elements, or as you would have to type it in a program: int (&)[2]. Adding extra dimensions does not affect the mechanism.
I think this example in C11 explained it implicitly.
C11 6.5.2.1 Array subscripting
EXAMPLE Consider the array object defined by the declaration int x[3][5]; Here x is a 3 × 5 array of ints; more precisely, x is an array of three element objects, each of which is an array of five ints. In the expression x[i], which is equivalent to (*((x) + (i))), x is first converted to a pointer to the initial array of five ints. Then i is adjusted according to the type of x, which conceptually entails multiplying i by the size of the object to which the pointer points, namely an array of five int objects. The results are added and indirection is applied to yield an array of five ints. When used in the expression x[i][j], that array is in turn converted to a pointer to the first of the ints, so x[i][j] yields an int.
The similar is in C++11 8.3.4 Arrays
Example: consider
int x[3][5];
Here x is a 3 × 5 array of integers. When x appears in an expression, it is converted to a pointer to (the first of three) five-membered arrays of integers. In the expression x[i] which is equivalent to *(x + i), x is first converted to a pointer as described; then x + i is converted to the type of x, which involves multiplying i by the length of the object to which the pointer points, namely five integer objects. The results are added
and indirection applied to yield an array (of five integers), which in turn is converted to a pointer to the first of the integers. If there is another subscript the same argument applies again; this time the result is an integer. —end example ] —end note ]
The key point to remember is that, in both C and C++, a multidimensional array is simply an array of arrays (so a 3-dimensional array is an array of arrays of arrays). All the syntax and semantics of multidimensional arrays follow from that (and from the other rules of the language, of course).
So given an object definition:
int m[2][2][2];
m is an object of type int[2][2][2] (an array of two arrays, each of which consists of two elements, each of which consists of two elements, each of which is an array of two ints).
When you write m[1][1][1], you're already evaluating m, m[1] and m[1][1].
The expression m is an lvalue referring to an array object of type int[2][2][2].
In m[1], the array expression m is implicitly converted to ("decays" to) a pointer to the array's first element. This pointer is of type int(*)[2][2], a pointer to a two-element array of two-element arrays of int. m[1] is by definition equivalent to *(m+1); the +1 advances m by one element and dereferences the resulting pointer value. So m[1] refers to an object of type int[2][2] (an array of two arrays, each of which consists of two int elements).
(The array indexing operator [] is defined to operate on a pointer, not an array. In a common case like arr[42], the pointer happens to be the result of an implicit array-to-pointer conversion.)
We repeat the process for m[1][1], giving us a pointer to an array of two ints (of type int(*)[2]).
Finally, m[1][1][1] takes the result of evaluating m[1][1] and repeats the process yet again, giving us an lvalue referring to an object of type int. And that's how multidimensional arrays work.
Just to add to the frivolity, an expression like foo[index1][index2][index3] can work directly with pointers as well as with arrays. That means you can construct something that works (almost) like a true multidimensional array using pointers and allocations of arbitrary size. This gives you the possibility of having "ragged" arrays with different numbers of elements in each row, or even rows and elements that are missing. But then it's up to you to manage the allocation and deallocation for each row, or even for each element.
Recommended reading: Section 6 of the comp.lang.c FAQ.
A side note: There are languages where multidimensional arrays are not arrays of arrays. In Ada, for example (which uses parentheses rather than square brackets for array indexing), you can have an array of arrays, indexed like arr(i)(j), or you can have a two-dimensional array, indexed like arr(i, j). C is different; C doesn't have direct built-in support for multidimensional arrays, but it gives you the tools to build them yourself.