As far as I know, multidimensional array on stack will occupy continuous memory in row order. Is it undefined behavior to index multidimensional array using a pointer to elements according to ISO C++ Standard? For example:
#include <iostream>
#include <type_traits>
int main() {
int a[5][4]{{1,2,3,4},{},{5,6,7,8}};
constexpr auto sz = sizeof(a) / sizeof(std::remove_all_extents<decltype(a)>::type);
int *p = &a[0][0];
int i = p[11]; // <-- here
p[19] = 20; // <-- here
for (int k = 0; k < sz; ++k)
std::cout << p[k] << ' '; // <-- and here
return 0;
}
Above code will compile and run correctly if pointer does not go out of the boundary of array a. But is this happen because of compiler defined behavior or language standard? Any reference from the ISO C++ Standard would be best.
The problem here is the strict aliasing rule that exists in my draft n3337 for C++11 in 3.10 Lvalues and rvalues [basic.lval] § 10. This is an exhaustive list that does not explicetely allow to alias a multidimensional array to an unidimensional one of the whole size.
So even if it is indeed required that arrays are allocated consecutively in memory, which proves that the size of a multidimensional array, say for example T arr[n][m] is the product of is dimensions by the size of an element: n * m *sizeof(T). When converted to char pointers, you can even do arithmetic pointer operations on the whole array, because any pointer to an object can be converted to a char pointer, and that char pointer can be used to access the consecutive bytes of the object (*).
But unfortunately, for any other type, the standard only allow arithmetic pointer operations inside one array (and by definition dereferening an array element is the same as dereferencing a pointer after pointer arithmetics: a[i] is *(a + i)). So if you both respect the rule on pointer arithmetics and the strict aliasing rule, the global indexing of a multi-dimensional array is not defined by C++11 standard, unless you go through char pointer arithmetics:
int a[3][4];
int *p = &a[0][0]; // perfectly defined
int b = p[3]; // ok you are in same row which means in same array
b = p[5]; // OUPS: you dereference past the declared array that builds first row
char *cq = (((char *) p) + 5 * sizeof(int)); // ok: char pointer arithmetics inside an object
int *q = (int *) cq; // ok because what lies there is an int object
b = *q; // almost the same as p[5] but behaviour is defined
That char pointer arithmetics along with the fear of breaking a lot of existing code explains why all well known compiler silently accept the aliasing of a multi-dimensional array with a 1D one of same global size (it leads to same internal code), but technically, the global pointer arithmetics is only valid for char pointers.
(*) The standard declares in 1.7 The C++ memory model [intro.memory] that
The fundamental storage unit in the C++ memory model is the byte... The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every
byte has a unique address.
and later in 3.9 Types [basic.types] §2
For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object
holds a valid value of type T, the underlying bytes making up the object can be copied into an array
of char or unsigned char.
and to copy them you must access them through a char * or unsigned char *
I believe the behavior in your example is technically undefined.
The standard has no concept of a multidimensional array. What you've actually declared is an "array of 5 arrays of 4 ints". That is a[0] and a[1] are actually two different arrays of 4 ints, both of which are contained in the array a. What this means is that a[0][0] and a[1][0] are not elements of the same array.
[expr.add]/4 says the following (emphasis mine)
When an expression that has integral type is added to or subtracted from a pointer, the result has the type
of the pointer operand. If the pointer operand points to an element of an array object, and the array is
large enough, the result points to an element offset from the original element such that the difference of
the subscripts of the resulting and original array elements equals the integral expression. In other words, if
the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P))
and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array
object, provided they exist. Moreover, if the expression P points to the last element of an array object,
the expression (P)+1 points one past the last element of the array object, and if the expression Q points
one past the last element of an array object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is
undefined
So, since p[11] expands to *(p + 11) and since p and p + 11 are not elements of the same array (one is an element of a[0] and the other is more than one element past the end of a[0]), the behavior of that addition is undefined.
I would, however, be very surprised to find any implementation where such an addition resulted in anything other than the one you expect.
if you declare
int arr[3][4][5];
the type of arr is int[3][4][5], type of arr[3] is int[4][5], etc. Array of array of arrays, but NOT an array of pointers. Let's see what happens if we increment first index? It would shift pointer forward by size of array element, but array element of arr is a two-dimensional array! It is equivalent to incrementing: arr + sizeof(int[4][5])/sizeof(int) or arr + 20.
Iterating this way we'll find that arr[a][b][c] equals to *(*(*(arr + a) + b) + c), provided that there is never any padding with arrays (to comply with mandatory compatibility of POD types with C99):
*((int*)arr + 20*a + 5*b + c)
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integral expression
Related
Suppose you have an array:
int array[SIZE];
or
int *array = new(int[SIZE]);
Does C or C++ guarantee that array < array + SIZE, and if so where?
I understand that regardless of the language spec, many operating systems guarantee this property by reserving the top of the virtual address space for the kernel. My question is whether this is also guaranteed by the language, rather than just by the vast majority of implementations.
As an example, suppose an OS kernel lives in low memory and sometimes gives the highest page of virtual memory out to user processes in response to mmap requests for anonymous memory. If malloc or ::operator new[] directly calls mmap for the allocation of a huge array, and the end of the array abuts the top of the virtual address space such that array + SIZE wraps around to zero, does this amount to a non-compliant implementation of the language?
Clarification
Note that the question is not asking about array+(SIZE-1), which is the address of the last element of the array. That one is guaranteed to be greater than array. The question is about a pointer one past the end of an array, or also p+1 when p is a pointer to a non-array object (which the section of the standard pointed to by the selected answer makes clear is treated the same way).
Stackoverflow has asked me to clarify why this question is not the same as this one. The other question asks how to implement total ordering of pointers. That other question essentially boils down to how could a library implement std::less such that it works even for pointers to differently allocated objects, which the standard says can only be compared for equality, not greater and less than.
In contrast, my question was about whether one past the end of an array is always guaranteed to be greater than the array. Whether the answer to my question is yes or no doesn't actually change how you would implement std::less, so the other question doesn't seem relevant. If it's illegal to compare to one past the end of an array, then std::less could simply exhibit undefined behavior in this case. (Also, typically the standard library is implemented by the same people as the compiler, and so is free to take advantage of properties of the particular compiler.)
Yes. From section 6.5.8 para 5.
If the expression P points to an element of an array object
and the expression Q points to the last element of the same array
object, the pointer expression Q+1 compares greater than P.
Expression array is P. The expression array + SIZE - 1 points to the last element of array, which is Q.
Thus:
array + SIZE = array + SIZE - 1 + 1 = Q + 1 > P = array
C requires this. Section 6.5.8 para 5 says:
pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values
I'm sure there's something analogous in the C++ specification.
This requirement effectively prevents allocating objects that wrap around the address space on common hardware, because it would be impractical to implement all the bookkeeping necessary to implement the relational operator efficiently.
The guarantee does not hold for the case int *array = new(int[SIZE]); when SIZE is zero .
The result of new int[0] is required to be a valid pointer that can have 0 added to it , but array == array + SIZE in this case, and a strictly less-than test will yield false.
This is defined in C++, from 7.6.6.4 (p139 of current C++23 draft):
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
(4.1) — If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
(4.2) — Otherwise, if P points to an array element i of an array object x with n elements (9.3.4.5) the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i + j of x if 0 <= i + j <= n and the expression P - J points to the (possibly-hypothetical) array element i − j of x if 0 <= i − j <= n.
(4.3) — Otherwise, the behavior is undefined.
Note that 4.2 explicitly has "<= n", not "< n". It's undefined for any value larger than size(), but is defined for size().
The ordering of array elements is defined in 7.6.9 (p141):
(4.1) If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript is required to compare greater.
Which means the hypothetical element n will compare greater than the array itself (element 0) for all well defined cases of n > 0.
The relevant rule in C++ is [expr.rel]/4.1:
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript is required to compare greater.
The above rule appears to only cover pointers to array elements, and array + SIZE doesn't point to an array element. However, as mentioned in the footnote, a one-past-the-end pointer is treated as if it were an array element here. The relevant language rule is in [basic.compound]/3:
For purposes of pointer arithmetic ([expr.add]) and comparison ([expr.rel], [expr.eq]), a pointer past the end of the last element of an array x of n elements is considered to be equivalent to a pointer to a hypothetical array element n of x and an object of type T that is not an array element is considered to belong to an array with one element of type T.
So C++ guarantees that array + SIZE > array (at least when SIZE > 0), and that &x + 1 > &x for any object x.
array is guaranteed to have consecutive memory space inside. after c++03 or so vectors is guaranteed to have one too for its &vec[0] ... &vec[vec.size() - 1]. This automatically means that that what you're asking about is true
it's called contiguous storage . can be found here for vectors
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0944r0.html
The elements of a vector are stored contiguously, meaning that if v is a vector<T, Allocator> where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size(). Presumably five more years of studying the interactions of contiguity with caching made it clear to WG21 that contiguity needed to be mandated and non-contiguous vector implementation should be clearly banned.
latter is from standard docs. C++03 I've guessed right.
I am wondering if the C++ standard guarantees that multidimensional arrays (not dynamically allocated) are flattened into a 1D array of exactly the same space. For example, if I have
char x[100];
char y[10][10];
Would these both be equivalent? I'm aware that most compilers would flatten y, but is this actually guaranteed to happen? Reading section 11.3.4 Arrays of the C++ Standard, I cannot actually find anywhere that guarantees this.
The C++ standard guarantees that y[i] follows immediately after y[i-1]. Since y[i-1] is 10 characters long, then, logically speaking, y[i] should take place 10 characters later in memory; however, could a compiler pad y[i-1] with extra characters to keep y[i] aligned?
What you are looking for is found in [dcl.array]/6
An object of type “array of N U” contains a contiguously allocated non-empty set of N subobjects of type U, known as the elements of the array, and numbered 0 to N-1.
What this states is that if you have an array like int arr[10] then to have 10 int's that are contiguous in memory. This definition works recursively though so if you have
int arr[5][10]
then what you have is an array of 5 int[10] arrays. If we apply the definition from above then we know that the 5 int[10] arrays are contiguous and then int[10]'s themselves are contiguous so all 50 int's are contiguous. So yes, a 2d array look just like a 1d array in memory since really that is what they are.
This does not mean you can get a pointer to arr[0][0] and iterate to arr[4][9] with it. Per [expr.add]/4
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n.
Otherwise, the behavior is undefined.
What this states is that if you have a pointer to an array, then the valid indices you can add to it are [0, array_size]. So if you did
int * it = &arr[0][0]
then what it points to is the first element of the first array which means you can legally only increment it to it + 10 since that is the past then end element of the first array. Going into the second array is UB even though they are contiguous.
I am wondering if the C++ standard guarantees that multidimensional arrays (not dynamically allocated) are flattened into a 1D array of exactly the same space. For example, if I have
char x[100];
char y[10][10];
Would these both be equivalent? I'm aware that most compilers would flatten y, but is this actually guaranteed to happen? Reading section 11.3.4 Arrays of the C++ Standard, I cannot actually find anywhere that guarantees this.
The C++ standard guarantees that y[i] follows immediately after y[i-1]. Since y[i-1] is 10 characters long, then, logically speaking, y[i] should take place 10 characters later in memory; however, could a compiler pad y[i-1] with extra characters to keep y[i] aligned?
What you are looking for is found in [dcl.array]/6
An object of type “array of N U” contains a contiguously allocated non-empty set of N subobjects of type U, known as the elements of the array, and numbered 0 to N-1.
What this states is that if you have an array like int arr[10] then to have 10 int's that are contiguous in memory. This definition works recursively though so if you have
int arr[5][10]
then what you have is an array of 5 int[10] arrays. If we apply the definition from above then we know that the 5 int[10] arrays are contiguous and then int[10]'s themselves are contiguous so all 50 int's are contiguous. So yes, a 2d array look just like a 1d array in memory since really that is what they are.
This does not mean you can get a pointer to arr[0][0] and iterate to arr[4][9] with it. Per [expr.add]/4
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n.
Otherwise, the behavior is undefined.
What this states is that if you have a pointer to an array, then the valid indices you can add to it are [0, array_size]. So if you did
int * it = &arr[0][0]
then what it points to is the first element of the first array which means you can legally only increment it to it + 10 since that is the past then end element of the first array. Going into the second array is UB even though they are contiguous.
This question already has answers here:
How come an array's address is equal to its value in C?
(6 answers)
Closed 7 years ago.
As a beginner programmer I am dealing with some simple problems related to Pointers. In the following code I found the value of *a and a are same in hexadecimal. But I can't understand the reason.
#include <stdio.h>
#include <stdlib.h>
main(){
int a[5][5];
a[0][0] = 1;
printf("*a=%p a=%p \n", *a, a);
return 0;
}
Here is the output:
*a=0x7ffddb8919f0 a=0x7ffddb8919f0
An array and its first element have the same address.:)
For this declaration
int a[5][5];
expression a used in the printf call is implicitly converted to the pointer to its first element. Expression *a yields the first element of the array that is in turn a one-dimensional array that also is converted to a pointer to its first element.
Thus expressions a and *a have the same value as expression &a[0][0]
In C and C++ languages values of array type T [N] are implicitly converted to values of pointer type T * in most contexts (with few exceptions). The resultant pointer points to the first element of the original array (index 0). This phenomenon is informally known as array type decay.
printf argument is one of those contexts when array type decay happens.
A 2D array of type int [5][5] is nothing else than an "1D array of 1D arrays", i.e. it is an array of 5 elements, with each element itself being an array of 5 ints.
The above array type decay rule naturally applies to this situation.
The expression a, which originally has array type int [5][5], decays to a pointer of type int (*)[5]. The pointer points to element a[0], which is the beginning of sub-array a[0] in memory. This is the first pointer you print.
The expression *a is a dereference operator applied to sub-expression a. Sub-expression a in this context behaves in exactly the same way as before: it decays to pointer of type int (*)[5] that points to a[0]. Thus the result of *a is a[0] itself. But a[0] is also an array. It is an array of int[5] type. It is also subject to array type decay. It decays to pointer of type int *, which points to the first element of a[0], i.e. to a[0][0]. This is the second pointer you print.
The reason both pointer values are the same numerically is that the beginning of sub-array a[0] corresponds to the same memory location as element a[0][0].
a can be considered a pointer to a pointer to an int (in reality, it's an array of array of int, but close enough).
So a and *a both point to the same address (which happens to be a[0][0]).
*a is still a pointer, and a[0] is the same address as a[0][0].
I know you can add a pointer to an int, and subtract two pointers, and a pointer and an int, but can you add a int to a pointer. So 5 + pointer.
You can, but restrictions apply. Pointer arithmetics is only valid within an array (or 1 past the end of an array).
Here's some of the rules:
5.7 Additive operators [expr.add]
5) [...] If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is
undefined.
and
6) When two pointers to elements of the same array object are subtracted, the result is the difference of the
subscripts of the two array elements. [...] Unless both pointers point to elements of the same array object, or
one past the last element of the array object, the behavior is undefined.
pasted here for confirmation.
So
int* x = new int;
int* y = new int;
is okay, but:
y-x;
x + 4;
y - 1;
or even comparisons using binary comparison operators are undefined behavior.
However x+1 and 1+x are okay (a single object counts as an array of size 1)
Adding an int to a pointer is syntactically okay but there are so many issues that you have to watch out for, e.g. overflow errors.
Ideally, you should do it only within an array.