At: http://www.fredosaurus.com/notes-cpp/arrayptr/26arraysaspointers.html
Under: Pointer addition and element size
There is the following code:
// Assume sizeof(int) is 4.
int b[100]; // b is an array of 100 ints.
int* p; // p is a a pointer to an int.
p = b; // Assigns address of first element of b. Ie, &b[0]
p = p + 1; // Adds 4 to p (4 == 1 * sizeof(int)). Ie, &b[1]
How did "p" in the last line become "4"?
Thanks.
(I assume that you mean "1" in the last line, not "p")
Pointer arithmetic in both C and C++ is a logical addition, not a numeric addition. Adding one to a pointer means "produce a pointer to the object that comes in memory right after this one," which means that the compiler automatically scales up whatever you're incrementing the pointer with by the size of the object being pointed at. This prevents you from having a pointer into the middle of an object, or a misaligned pointer, or both.
because p is pointer to a type with size 4 bytes. + operator on pointers is actually pointer shift. compiler knows the size of pointed type and shifts it by appropriate value
if you change int to short, p will be shifted by 2 bytes
The comment in the code you post it explains it: addition of an integer x to a pointer increases the pointer's value by x multiplied by the sizeof the type it is pointing to.
This is convenient because it doesn't usually make sense to change the pointer in smaller increments - you wouldn't want it to point into the middle of one of the elements.
Related
I was wondering how *(&array + 1) actually works. I saw this as an easy way to calculate the array length and want to understand it properly before using it. I'm not very experienced with pointer arithmetic, but with my understanding &array gives the address of the first element of the array. (&array + 1) would go to end of the array in terms of address. But shouldn't *(&array + 1) give the value, which is at this address. Instead it prints out the address. I would really appreciate your help to get the pointer stuff clear in my head.
Here is the simple example I'm working on:
int numbers[] = {5,8,9,3,4,6,1};
int length = *(&numbers + 1) - numbers;
(This answer is for C++.)
&numbers is a pointer to the array itself. It has type int (*)[7].
&numbers + 1 is a pointer to the byte right after the array, where another array of 7 ints would be located. It still has type int (*)[7].
*(&numbers + 1) dereferences this pointer, yielding an lvalue of type int[7] referring to the byte right after the array.
*(&numbers + 1) - numbers: Using the - operator forces both operands to undergo the array-to-pointer conversion, so pointers can be subtracted. *(&numbers + 1) is converted to an int* pointing at the byte after the array. numbers is converted to an int* pointing at the first byte of the array. Their difference is the number of ints between the two pointers---which is the number of ints in the array.
Edit: Although there's no valid object pointed to by &numbers + 1, this is what's called a "past the end" pointer. If p is a pointer to T, pointing to a valid object of type T, then it's always valid to compute p + 1, even though *p may be a single object, or the object at the end of an array. In that case, you get a "past the end" pointer, which does not point to a valid object, but is still a valid pointer. You can use this pointer for pointer arithmetic, and even dereference it to yield an lvalue, as long as you do not try to read or write through that lvalue. Note that you can only go one byte past-the-end of an object; attempting to go any further leads to undefined behaviour.
The expression &numbers gives you the address of the array, not the first member (although numerically they are the same). The type of this expression is int (*)[7], i.e. a pointer to an array of size 7.
The expression &numbers + 1 adds sizeof(int[7]) bytes to the address of array. The resulting pointer points right after the array.
The problem however is when you then dereference this pointer with *(&numbers + 1). Dereferencing a pointer that points one element past the end of an array invokes undefined behavior.
The proper way to get the number of elements of an array is sizeof(numbers)/sizeof(numbers[0]). This assumes that the array was defined in the current scope and is not a parameter to a function.
but with my understanding &array gives the address of the first element of the array.
This understanding is misleading. &array gives the address of the array. Sure, the value of that address is the same same as the first element, but the type of the expression is different. The type of the expression &array is "pointer to array of N elements of type T" (where N is the length that you're looking for and T is int).
But shouldn't *(&array + 1) give the value, which is at this address.
Well yes... but it's here that the type of the expression becomes important. Indirecting a pointer to an array (rather than pointer to an element of the array) will result in the array itself.
In the subtraction expression, both array operands decay into pointer to first element. Since the subtraction uses decayed pointers, the unit of the pointer arithmetic is in terms of the element size.
I saw this as an easy way to calculate the array length
There are easier ways:
std::size(numbers)
And in C:
sizeof(numbers)/sizeof(numbers[0])
As far as I know, multidimensional array on stack will occupy continuous memory in row order. Is it undefined behavior to index multidimensional array using a pointer to elements according to ISO C++ Standard? For example:
#include <iostream>
#include <type_traits>
int main() {
int a[5][4]{{1,2,3,4},{},{5,6,7,8}};
constexpr auto sz = sizeof(a) / sizeof(std::remove_all_extents<decltype(a)>::type);
int *p = &a[0][0];
int i = p[11]; // <-- here
p[19] = 20; // <-- here
for (int k = 0; k < sz; ++k)
std::cout << p[k] << ' '; // <-- and here
return 0;
}
Above code will compile and run correctly if pointer does not go out of the boundary of array a. But is this happen because of compiler defined behavior or language standard? Any reference from the ISO C++ Standard would be best.
The problem here is the strict aliasing rule that exists in my draft n3337 for C++11 in 3.10 Lvalues and rvalues [basic.lval] § 10. This is an exhaustive list that does not explicetely allow to alias a multidimensional array to an unidimensional one of the whole size.
So even if it is indeed required that arrays are allocated consecutively in memory, which proves that the size of a multidimensional array, say for example T arr[n][m] is the product of is dimensions by the size of an element: n * m *sizeof(T). When converted to char pointers, you can even do arithmetic pointer operations on the whole array, because any pointer to an object can be converted to a char pointer, and that char pointer can be used to access the consecutive bytes of the object (*).
But unfortunately, for any other type, the standard only allow arithmetic pointer operations inside one array (and by definition dereferening an array element is the same as dereferencing a pointer after pointer arithmetics: a[i] is *(a + i)). So if you both respect the rule on pointer arithmetics and the strict aliasing rule, the global indexing of a multi-dimensional array is not defined by C++11 standard, unless you go through char pointer arithmetics:
int a[3][4];
int *p = &a[0][0]; // perfectly defined
int b = p[3]; // ok you are in same row which means in same array
b = p[5]; // OUPS: you dereference past the declared array that builds first row
char *cq = (((char *) p) + 5 * sizeof(int)); // ok: char pointer arithmetics inside an object
int *q = (int *) cq; // ok because what lies there is an int object
b = *q; // almost the same as p[5] but behaviour is defined
That char pointer arithmetics along with the fear of breaking a lot of existing code explains why all well known compiler silently accept the aliasing of a multi-dimensional array with a 1D one of same global size (it leads to same internal code), but technically, the global pointer arithmetics is only valid for char pointers.
(*) The standard declares in 1.7 The C++ memory model [intro.memory] that
The fundamental storage unit in the C++ memory model is the byte... The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every
byte has a unique address.
and later in 3.9 Types [basic.types] §2
For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object
holds a valid value of type T, the underlying bytes making up the object can be copied into an array
of char or unsigned char.
and to copy them you must access them through a char * or unsigned char *
I believe the behavior in your example is technically undefined.
The standard has no concept of a multidimensional array. What you've actually declared is an "array of 5 arrays of 4 ints". That is a[0] and a[1] are actually two different arrays of 4 ints, both of which are contained in the array a. What this means is that a[0][0] and a[1][0] are not elements of the same array.
[expr.add]/4 says the following (emphasis mine)
When an expression that has integral type is added to or subtracted from a pointer, the result has the type
of the pointer operand. If the pointer operand points to an element of an array object, and the array is
large enough, the result points to an element offset from the original element such that the difference of
the subscripts of the resulting and original array elements equals the integral expression. In other words, if
the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P))
and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array
object, provided they exist. Moreover, if the expression P points to the last element of an array object,
the expression (P)+1 points one past the last element of the array object, and if the expression Q points
one past the last element of an array object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements of the same array object, or one past
the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is
undefined
So, since p[11] expands to *(p + 11) and since p and p + 11 are not elements of the same array (one is an element of a[0] and the other is more than one element past the end of a[0]), the behavior of that addition is undefined.
I would, however, be very surprised to find any implementation where such an addition resulted in anything other than the one you expect.
if you declare
int arr[3][4][5];
the type of arr is int[3][4][5], type of arr[3] is int[4][5], etc. Array of array of arrays, but NOT an array of pointers. Let's see what happens if we increment first index? It would shift pointer forward by size of array element, but array element of arr is a two-dimensional array! It is equivalent to incrementing: arr + sizeof(int[4][5])/sizeof(int) or arr + 20.
Iterating this way we'll find that arr[a][b][c] equals to *(*(*(arr + a) + b) + c), provided that there is never any padding with arrays (to comply with mandatory compatibility of POD types with C99):
*((int*)arr + 20*a + 5*b + c)
When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integral expression
Let's say i have a 2 D array int m[3][4]
If the typedef is defined as
typedef int array[4];
array *ptr = m;
What does this mean? Is it pointer to array of 4 elements. Or array of 4 pointers?
When we cout << ptr or cout << *ptr, It prints the same address. How is that possible?
ptr is pointer to an array of four elements of type int. m is an array of 3 elements of type (of array of four elements of type int). To give you a picture - m is an array of 3 elements of type array. And that is the reason when ptr is initialized to m - there is no problem because 'm' being an array is implicitly converted to pointer to first element in that array.
ptr is pointer to an array of elements. So that pointer is printed. And when *ptr is used - you get an array which again get's implicitly converted to pointer to the first element of the array, which again is the same starting point - so you get the same value.
As everything starts at the same point you are getting the same value. Thing will start differentiating when one does ++ on pointers. I will leave you with that.
Check out the clockwise/spiral rule. After reading this you can read the declaration as being a pointer to the type-alias array, in other words a pointer to an array of four integers.
As per "clockwise/spiral rule" ptr is a pointer to array of 4 int.
Anyone please elaborate what is happining here?
int main()
{
int **p = 0;
//p=? and why| *p=? and why|**p=? and why
++p;
//p=? and why| *p=? and why|**p=? and why
printf("%d\n", p);
return 1;
}
output:-
4 (why?)
First of all, p is a pointer to a pointer-to-integer.
int **p = 0;
p = 0, *p = nothing, **p = less than nothing.
++p;
Same as p = p + 1. Means the size of one pointer to a pointer-to-int further. A pointer is basically, at least on your OS, 32 bits length (4 bytes). p now points 4 bytes after 0. The value of p is 4.
p is a pointer to a pointer-to-int. It's being initialised to 0, i.e. it's a null pointer.
It's then being incremented to point at the next consecutive pointer-to-int in memory.* The next pointer will be at address 4, because on your platform the size of a pointer is 4 bytes.
Then printf interprets the pointer value as an integer, and so displays "4".
* Note, however, that this is now undefined behaviour.
It is clear. You have a pointer to a pointer to int (int **p means a pointer to a pointer to int), that actually holds the address 0). A pointer in itself, in your architecture, is 32 bits (4 bytes) long, so incrementing p gives you p+4, that is, 0+4 = 4.
Go get a nice C book and learn about pointer arithmetic. You'll be glad the rest of your life! :)
++p is actually undefined behaviour, but what appears to have happened on your implementation is that sizeof(int*) is 4, and a null pointer is address 0. Recall that pointer increment, when it's not UB, adds a number of bytes to the address equal to the size of the referand type. So it's not all that surprising that when you take a null pointer of type int** (hence the referand type is int*) and increment it, you end up at the address 4. It's just not guaranteed.
Passing a pointer when the %d format expects an int is also undefined behavior, but it appears that the representation of int and int** are sufficiently compatible, and the varargs calling convention on your implementation treats them sufficiently similarly, that it has successfully printed 4. That's also not very surprising for implementations where sizeof(int) == sizeof(int**), but also isn't guaranteed.
Of course since it's undefined behavior, there are other possible explanations for what you see.
p is a pointer to pointer to int. And it's initialized to 0, i.e. NULL.
When you increment it, it now points to next pointer to int, which, on 32-bit systems, happens to be 4.
What does this mean: that a pointer increment points to the address of the next base type of the pointer?
For example:
p1++; // p1 is a pointer to an int
Does this statement mean that the address pointed to by p1 should change to the address of the next int or it should just be incremented by 2 (assuming an int is 2 bytes), in which case the particular address may not contain an int?
I mean, if p1 is, say, 0x442012, will p1++ be 0x442014 (which may be part of the address of a double) or will it point to the next int which is in an address like 0x44201F?
Thanks
Pointer arithmetic doesn’t care about the content – or validity – of the pointee. It will simply increment the pointer address using the following formula:
new_value = reinterpret_cast<char*>(p) + sizeof(*p);
(Assuming a pointer to non-const – otherwise the cast wouldn’t work.)
That is, it will increment the pointer by an amount of sizeof(*p) bytes, regardless of things like pointee value and memory alignment.
The compiler will add sizeof(int) (usually 4) to the numeric value of the pointer. If p1 is 0x442012 before the increment, then after the increment it will be 0x442012 + 4 = 0x442016.
Mind you, 0x442012 is not a multiple of 4, so it is unlikely to be the address of a valid four-byte int, though it would be fine for your two-byte ints.
It certainly won't go looking for the next integer. That would require magic.
p1++ gives rise to assembly language instructions which increment p1 by the size of what it points to. So you get
(char *)p1 = (char *)p1 + sizeof (object pointed to by p1)
(When this question was answered) Typically an int is 4 bytes, so it would increment by 4, but it depends on the sizeof() on your machine.
It does not go to "the next int".
An example: assume a 4 byte address and p1 = 0x20424 (where p1 is an int*). Then
p1++
would set the new value of p1 to 0x20428. NOT 0x20425.
If p1 is pointing into the element of index n of an array of objects of type int (a non-array object counts as an array of length 1 for this purpose), then after p1++, p1 is either:
Pointing to the element of index n+1 if the array is of length greater than n+1.
The 'past-the-end' address of the array, if the array is of length exactly n+1.
p1++ causes undefined behavior if p1 is not pointing to an element of an array of objects of type int.
The only meaning that the C and C++ languages give to the notion of "address" is the value of a pointer object.
Any relationship that C/C++'s notion of address has to the notion of a numeric addresses you'd consider in assembly language is purely an implementation detail (albeit, an extremely common implementation detail).
Pointer arithmetic are done in sizoeof(*pointer) multiples - that is, for a pointer to int, increment will advance to the next integer (or 4 bytes for 32 bit integers).