I am a beginner in C++ and I am having a hard time understanding why we use [ ] dealing with pointers that point to arrays.
As far as I know new int[5] returns a pointer that points to an array of size 5.
So if we were to store this pointer in a variable we would do: int *arr = new int[5].
What I am not understanding is: if I want to access index 0 of that array, I would do arr[0].
Why is this syntax correct? Because in my mind arr is a pointer, so I would have to dereference the pointer in order to access the array.
Why don't we have to dereference the pointer?
In my mind I would do something like (*arr)[0], but that is incorrect.
The array subscript operator [] dereferences a pointer implicitly.
This is spelled out in section 8.2.1p1 of the C++17 standard:
The expression E1[E2] is identical (by definition) to *((E1)+(E2))
And section 6.5.2.1p2 of the C11 standard:
The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))
So given your example, arr[0] is exactly the same as *(arr + 0). The value between the brackets is added to the pointer value to point to the desired array element and the resulting pointer is dereferenced to get the object.
Also, it's not quite correct to say your example points to an array, but rather it points to the first element of an array. A pointer to an array would look like this:
int arr[5];
int (*p)[5] = &arr;
One other thing that may be confusing is the fact that an array, in most contexts, decays to a pointer to its first element. This means you can do this:
int arr[5];
int *p = arr;
arr[1] = 5; // sets element 1 of arr
p[1] = 7; // also sets element 1 of arr
Related
I was wondering how *(&array + 1) actually works. I saw this as an easy way to calculate the array length and want to understand it properly before using it. I'm not very experienced with pointer arithmetic, but with my understanding &array gives the address of the first element of the array. (&array + 1) would go to end of the array in terms of address. But shouldn't *(&array + 1) give the value, which is at this address. Instead it prints out the address. I would really appreciate your help to get the pointer stuff clear in my head.
Here is the simple example I'm working on:
int numbers[] = {5,8,9,3,4,6,1};
int length = *(&numbers + 1) - numbers;
(This answer is for C++.)
&numbers is a pointer to the array itself. It has type int (*)[7].
&numbers + 1 is a pointer to the byte right after the array, where another array of 7 ints would be located. It still has type int (*)[7].
*(&numbers + 1) dereferences this pointer, yielding an lvalue of type int[7] referring to the byte right after the array.
*(&numbers + 1) - numbers: Using the - operator forces both operands to undergo the array-to-pointer conversion, so pointers can be subtracted. *(&numbers + 1) is converted to an int* pointing at the byte after the array. numbers is converted to an int* pointing at the first byte of the array. Their difference is the number of ints between the two pointers---which is the number of ints in the array.
Edit: Although there's no valid object pointed to by &numbers + 1, this is what's called a "past the end" pointer. If p is a pointer to T, pointing to a valid object of type T, then it's always valid to compute p + 1, even though *p may be a single object, or the object at the end of an array. In that case, you get a "past the end" pointer, which does not point to a valid object, but is still a valid pointer. You can use this pointer for pointer arithmetic, and even dereference it to yield an lvalue, as long as you do not try to read or write through that lvalue. Note that you can only go one byte past-the-end of an object; attempting to go any further leads to undefined behaviour.
The expression &numbers gives you the address of the array, not the first member (although numerically they are the same). The type of this expression is int (*)[7], i.e. a pointer to an array of size 7.
The expression &numbers + 1 adds sizeof(int[7]) bytes to the address of array. The resulting pointer points right after the array.
The problem however is when you then dereference this pointer with *(&numbers + 1). Dereferencing a pointer that points one element past the end of an array invokes undefined behavior.
The proper way to get the number of elements of an array is sizeof(numbers)/sizeof(numbers[0]). This assumes that the array was defined in the current scope and is not a parameter to a function.
but with my understanding &array gives the address of the first element of the array.
This understanding is misleading. &array gives the address of the array. Sure, the value of that address is the same same as the first element, but the type of the expression is different. The type of the expression &array is "pointer to array of N elements of type T" (where N is the length that you're looking for and T is int).
But shouldn't *(&array + 1) give the value, which is at this address.
Well yes... but it's here that the type of the expression becomes important. Indirecting a pointer to an array (rather than pointer to an element of the array) will result in the array itself.
In the subtraction expression, both array operands decay into pointer to first element. Since the subtraction uses decayed pointers, the unit of the pointer arithmetic is in terms of the element size.
I saw this as an easy way to calculate the array length
There are easier ways:
std::size(numbers)
And in C:
sizeof(numbers)/sizeof(numbers[0])
e.g.
int arr[2][3] = ...
The type of arr[0] is
int (*)[3] // pointer to int[3], which is a pointer.
Or
int[3] // an array whose size is 3, which is an array.
Google tells me nothing about the question.
I know pointer and array are different types(derived types).
Maybe C and C++ treat it differently, I hope to see standard wording.
arr[0] is of type int [3] which is not a pointer.
int (*p)[3] is of type int(*)[3] meaning pointer to an array of 3 elements.
Pointer is not array and array is not pointer.
Now when you pass this 2d array to a function (or any case where decaying occurs) then it decays into pointer to the first element which is int (*)[3].
To be more clear in C 2d array is nothing but array of arrays.
Dissecting
arr is an array each of element of which is again an array with 3 elements.
arr[0] in most of the cases (except sizeof etc) will decay into pointer to first element it contains which is an int*.
arr[0][0] is an int.
At last &arr[0] .. guess what? This is of type int(*)[3].
As far as I know, multidimensional array on stack will occupy continuous memory in row order. Is it undefined behavior to index multidimensional array using a pointer to elements according to ISO C++ Standard? For example:
#include <iostream>
#include <type_traits>
int main() {
int a[5][4]{{1,2,3,4},{},{5,6,7,8}};
constexpr auto sz = sizeof(a) / sizeof(std::remove_all_extents<decltype(a)>::type);
int *p = &a[0][0];
int i = p[11]; // <-- here
p[19] = 20; // <-- here
for (int k = 0; k < sz; ++k)
std::cout << p[k] << ' '; // <-- and here
return 0;
}
Above code will compile and run correctly if pointer does not go out of the boundary of array a. But is this happen because of compiler defined behavior or language standard? Any reference from the ISO C++ Standard would be best.
The problem here is the strict aliasing rule that exists in my draft n3337 for C++11 in 3.10 Lvalues and rvalues [basic.lval] § 10. This is an exhaustive list that does not explicetely allow to alias a multidimensional array to an unidimensional one of the whole size.
So even if it is indeed required that arrays are allocated consecutively in memory, which proves that the size of a multidimensional array, say for example T arr[n][m] is the product of is dimensions by the size of an element: n * m *sizeof(T). When converted to char pointers, you can even do arithmetic pointer operations on the whole array, because any pointer to an object can be converted to a char pointer, and that char pointer can be used to access the consecutive bytes of the object (*).
But unfortunately, for any other type, the standard only allow arithmetic pointer operations inside one array (and by definition dereferening an array element is the same as dereferencing a pointer after pointer arithmetics: a[i] is *(a + i)). So if you both respect the rule on pointer arithmetics and the strict aliasing rule, the global indexing of a multi-dimensional array is not defined by C++11 standard, unless you go through char pointer arithmetics:
int a[3][4];
int *p = &a[0][0]; // perfectly defined
int b = p[3]; // ok you are in same row which means in same array
b = p[5]; // OUPS: you dereference past the declared array that builds first row
char *cq = (((char *) p) + 5 * sizeof(int)); // ok: char pointer arithmetics inside an object
int *q = (int *) cq; // ok because what lies there is an int object
b = *q; // almost the same as p[5] but behaviour is defined
That char pointer arithmetics along with the fear of breaking a lot of existing code explains why all well known compiler silently accept the aliasing of a multi-dimensional array with a 1D one of same global size (it leads to same internal code), but technically, the global pointer arithmetics is only valid for char pointers.
(*) The standard declares in 1.7 The C++ memory model [intro.memory] that
The fundamental storage unit in the C++ memory model is the byte... The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every
byte has a unique address.
and later in 3.9 Types [basic.types] §2
For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object
holds a valid value of type T, the underlying bytes making up the object can be copied into an array
of char or unsigned char.
and to copy them you must access them through a char * or unsigned char *
I believe the behavior in your example is technically undefined.
The standard has no concept of a multidimensional array. What you've actually declared is an "array of 5 arrays of 4 ints". That is a[0] and a[1] are actually two different arrays of 4 ints, both of which are contained in the array a. What this means is that a[0][0] and a[1][0] are not elements of the same array.
[expr.add]/4 says the following (emphasis mine)
When an expression that has integral type is added to or subtracted from a pointer, the result has the type
of the pointer operand. If the pointer operand points to an element of an array object, and the array is
large enough, the result points to an element offset from the original element such that the difference of
the subscripts of the resulting and original array elements equals the integral expression. In other words, if
the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P))
and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array
object, provided they exist. Moreover, if the expression P points to the last element of an array object,
the expression (P)+1 points one past the last element of the array object, and if the expression Q points
one past the last element of an array object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is
undefined
So, since p[11] expands to *(p + 11) and since p and p + 11 are not elements of the same array (one is an element of a[0] and the other is more than one element past the end of a[0]), the behavior of that addition is undefined.
I would, however, be very surprised to find any implementation where such an addition resulted in anything other than the one you expect.
if you declare
int arr[3][4][5];
the type of arr is int[3][4][5], type of arr[3] is int[4][5], etc. Array of array of arrays, but NOT an array of pointers. Let's see what happens if we increment first index? It would shift pointer forward by size of array element, but array element of arr is a two-dimensional array! It is equivalent to incrementing: arr + sizeof(int[4][5])/sizeof(int) or arr + 20.
Iterating this way we'll find that arr[a][b][c] equals to *(*(*(arr + a) + b) + c), provided that there is never any padding with arrays (to comply with mandatory compatibility of POD types with C99):
*((int*)arr + 20*a + 5*b + c)
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integral expression
I came across this construction inside a function (e is a parameter passed to the function):
short (*tt)[][2] = (short (*)[][2])(heater_ttbl_map[e]);
and its use (where i is a counter inside a for loop):
(*tt)[i][0]
I think I got the first part of the assignment:
short (*tt)[][2]
for what I understand tt is declared as a pointer to an array of arrays of shorts.
The second part is confusing me though, looks like some sort of cast but I'm not sure I understand what it does, expecially this: (*). How does it work?
heater_ttbl_map is declared like this (where pointer1 and pointer2 are both bidimensional arrays of shorts):
static void *heater_ttbl_map[2] = {(void*)pointer1, (void*)pointer2};
as for its use I understand that what is pointed at by tt is dereferenced (and its the content of the third index of the i index of the array, which is a short) but why writing it like this:
(*tt)[i][0]
and not like this:
*tt[i][0]
is it because tt is not an array itself but a pointer to an array?
Due to operator precedence ([] has precedence over * operator), there is difference in two statements -
(*tt)[i][0]
In this you access the element at index [i][0] of array to which pointer tt points to .
Whereas, in this -
*tt[i][0]
First the element at index [i][0](may be 2-d array of pointers) is accessed and then dereferenced.
Using them interchangeably can cause access or dereferencing unauthorized memory location and lead to undefined behaviour.
As ameyCU explained, the [] subscript operator has higher precedence than the unary * operator, so the expression *a[i] will be parsed as *(a[i]); IOW, you're indexing into a and dereferencing the result.
This works if a is an array of T (or a pointer to T; more on that below). However, if a is a pointer to an array of T, that won't do what you want. This is probably best explained visually.
Assume the declarations:
int arr[3] = { 0, 1, 2 };
int (*parr)[3] = &arr; // type of &arr is int (*)[3], not int **
Here's what things look like in memory (sort of; addresses are pulled out of thin air):Address Item Memory cell
------- ---- -----------
+---+
0x8000 arr: | 0 | <--------+
+---+ |
0x8004 | 1 | |
+---+ |
0x8008 | 2 | |
+---+ |
... |
+---+ |
0x8080 parr: | | ----------+
+---+
...
So you see the array arr with its three elements, and the pointer parr pointing to arr. We want to access the second element of arr (value 1 at address 0x8004) through the pointer parr. What happens if we write *parr[1]?
First of all, remember that the expression a[i] is defined as *(a + i); that is, given a pointer value a1, offset i elements (not bytes) from a and dereference the result. But what does it mean to offset i elements from a?
Pointer arithmetic is based on the size of the pointed-to type; if p is a pointer to T, then p+1 will give me the location of the next object of type T. So, if p points to an int object at address 0x1000, then p+1 will give me the address of the int object following p - 0x1000 + sizeof (int).
So, if we write parr[1], what does that give us? Since parr points to a 3-element array if int, parr + 1 will give us the address of the next 3-element array of int - 0x8000 + sizeof (int [3]), or 0x800c (assuming 4-byte int type).
Remember from above that [] has higher precedence than unary *, so the expression *parr[1] will be parsed as *(parr[1]), which evaluates to *(0x800c).
That's not what we want. To access arr[1] through parr, we must make sure parr has been dereferenced before the subscript operation is applied by explicitly grouping the * operator with parentheses: (*parr)[1]. *parr evaluates to 0x8000 which has type "3-element array of int"; we then access the second element of that array (0x8000 + sizeof (int), or 0x8004) to get the desired value.
Now, let's look at something - if a[i] is equivalent to *(a+i), then it follows that a[0] is equivalent to *a. That means we can write (*parr)[1] as (parr[0])[1], or just parr[0][1]. Now, you don't want to do that for this case since parr is just a pointer to a 1D array, not a 2D array. But this is how 2D array indexing works. Given a declaration like T a[M][N];, the expression a will "decay" to type T (*)[N] in most circumstances. If I wrote something like
int arr[3][2] = {{1,2},{3,4},{5,6}};
int (*parr)[2] = arr; // don't need the & this time, since arr "decays" to type
// int (*)[2]
then to access an element of arr through parr, all I need to do is write parr[i][j]; parr[i] implicitly dereferences the parr pointer.
This is where things get confusing; arrays are not pointers, and they don't store any pointers internally. Instead, of an array expression is not the operand of the sizeof or unary * operators, its type is converted from "N-element array of T" to "pointer to T", and the value of the expression is the address of the first element of the array. This is why you can use the [] operator on both array and pointer objects.
This is also why we used the & operator to get the address of arr in our code snippet; if it's not the operand of the `&` operator, the expression "decays" from type "3-element array of int" to "pointer to int"
This question already has answers here:
How come an array's address is equal to its value in C?
(6 answers)
Closed 7 years ago.
As a beginner programmer I am dealing with some simple problems related to Pointers. In the following code I found the value of *a and a are same in hexadecimal. But I can't understand the reason.
#include <stdio.h>
#include <stdlib.h>
main(){
int a[5][5];
a[0][0] = 1;
printf("*a=%p a=%p \n", *a, a);
return 0;
}
Here is the output:
*a=0x7ffddb8919f0 a=0x7ffddb8919f0
An array and its first element have the same address.:)
For this declaration
int a[5][5];
expression a used in the printf call is implicitly converted to the pointer to its first element. Expression *a yields the first element of the array that is in turn a one-dimensional array that also is converted to a pointer to its first element.
Thus expressions a and *a have the same value as expression &a[0][0]
In C and C++ languages values of array type T [N] are implicitly converted to values of pointer type T * in most contexts (with few exceptions). The resultant pointer points to the first element of the original array (index 0). This phenomenon is informally known as array type decay.
printf argument is one of those contexts when array type decay happens.
A 2D array of type int [5][5] is nothing else than an "1D array of 1D arrays", i.e. it is an array of 5 elements, with each element itself being an array of 5 ints.
The above array type decay rule naturally applies to this situation.
The expression a, which originally has array type int [5][5], decays to a pointer of type int (*)[5]. The pointer points to element a[0], which is the beginning of sub-array a[0] in memory. This is the first pointer you print.
The expression *a is a dereference operator applied to sub-expression a. Sub-expression a in this context behaves in exactly the same way as before: it decays to pointer of type int (*)[5] that points to a[0]. Thus the result of *a is a[0] itself. But a[0] is also an array. It is an array of int[5] type. It is also subject to array type decay. It decays to pointer of type int *, which points to the first element of a[0], i.e. to a[0][0]. This is the second pointer you print.
The reason both pointer values are the same numerically is that the beginning of sub-array a[0] corresponds to the same memory location as element a[0][0].
a can be considered a pointer to a pointer to an int (in reality, it's an array of array of int, but close enough).
So a and *a both point to the same address (which happens to be a[0][0]).
*a is still a pointer, and a[0] is the same address as a[0][0].