Sizeof Pointer to Array - c++

If I have an array declared like this:
int a[3][2];
then why is:
sizeof(a+0) == 8
whereas:
sizeof(a) == 24
I don't understand how adding 0 to the pointer changes the sizeof output. Is there maybe some implicit type cast?

If you add 0 to a, then a is first converted to a pointer value of type int(*)[2] (pointing to the first element of an array of type int[3][2]). Then 0 is added to that, which adds 0 * sizeof(int[2]) bytes to the address represented by that pointer value. Since that multiplication yields 0, it will yield the same pointer value. Since it is a pointer, sizeof(a+0) yields the size of a pointer, which is 8 bytes on your box.
If you do sizeof(a), there is no reason for the compiler to convert a to a pointer value (that makes only sense if you want to index elements or to do pointer arithmetic involving the address of the elements). So expression a stays being of an array type, and you get the size of int[3][2] instead the size of int(*)[2]. So, 3 * 2 * sizeof(int) which on your box is 24 bytes.
Hope this clarifies things.

sizeof tells you the size of the type of the expression. When you add 0 to a, the type becomes a pointer (8 bytes on 64-bit systems).

Related

How *(&arr + 1) - arr is working to give the array size [duplicate]

This question already has answers here:
How does *(&arr + 1) - arr give the length in elements of array arr?
(5 answers)
Why are the values different? C++ pointer
(2 answers)
Closed 1 year ago.
The community reviewed whether to reopen this question 9 months ago and left it closed:
Duplicate This question has been answered, is not unique, and doesn’t differentiate itself from another question.
int arr[] = { 3, 5, 9, 2, 8, 10, 11 };
int arrSize = *(&arr + 1) - arr;
std::cout << arrSize;
I am not able to get how this is working. So anyone can help me with this.
If we "draw" the array together with the pointers, it will look something like this:
+--------+--------+-----+--------+-----+
| arr[0] | arr[1] | ... | arr[6] | ... |
+--------+--------+-----+--------+-----+
^ ^ ^
| | |
&arr[0] &arr[1] |
| |
&arr &arr + 1
The type of the expressions &arr and &arr + 1 is int (*)[7]. If we dereference either of those pointers, we get a value of type int[7], and as with all arrays, it will decay to a pointer to its first element.
So what's happening is that we take the difference between a pointer to the first element of &arr + 1 (the dereference really makes this UB, but will still work with any sane compiler) and a pointer to the first element of &arr.
All pointer arithmetic is done in the base-unit of the pointed-to type, which in this case is int, so the result is the number of int elements between the two addresses being pointed at.
It might be useful to know that an array will naturally decay to a pointer to its first element, ie the expression arr will decay to &arr[0], which will have the type int *.
Also, for any pointer (or array) p and index i, the expression *(p + i) is exactly equal to p[i]. So *(&arr + 1) is really the same as (&arr)[1] (which makes the UB much more visible).
That program has undefined behaviour. (&arr + 1) is a valid pointer that points "one beyond" arr, and has type int(*)[7], however it doesn't point to an int [7], so dereferencing it is invalid.
It so happens that your implementation assumes there is a second int [7] after the one you declare, and subtracts the location of the first element of that array that exists from the location of the first element of the fictitious array that the pointer arithmetic invented.
You need to explore what the type of the &arr expression is, and how that affects the + 1 operation on it.
Pointer arithmetic works in 'raw units' of the pointed-to type; &arr is the address of your array, so it points to an object of type, "array of 7 int". Adding 1 to that pointer actually adds the size of the type to the address – so 7 * sizeof(int) is added to the address.
However, in the outer expression (subtraction of arr), the operands are pointers to int objects1 (not arrays), so the 'units' are just sizeof(int) – which is 7 times smaller than in the inner expression. Thus, the subtraction results in the size of the array.
1 This is because, in such expressions, an array variable (such as the second operand, arr) decays to a pointer to its first element; further, your first operand is also an array, as the * operator dereferences the modified value of the array pointer.
Note on Possible UB: Other answers (and comments thereto) have suggested that the dereferencing operation, *(&arr + 1), invokes undefined behaviour. However, looking through this Draft C++17 Standard, there is the vaguest of suggestions that it may not:
6.7.2 Compound Types
...
3    … For purposes of pointer arithmetic (8.5.6) and comparison
(8.5.9, 8.5.10), a pointer past the end of the last element of an
array x of n elements is considered to be equivalent to a pointer to a
hypothetical element x[n].
But I won't claim "Language-Lawyer" status here, as there is no explicit mention in that section about dereferencing such a pointer.
If you have a declaration like this
int arr[] = { 3, 5, 9, 2, 8, 10, 11 };
the the expression &arr + 1 will point to the memory after the last element of the array. The value of the expression is equal to the value of the expression arr + 7 where 7 is the number of elements in the array declared above. The only difference is that the expression &arr + 1 has the type int ( * )[7] while the expression arr + 7 has the type int *.
So due to the integer arithmetic the difference ( arr + 7 ) - arr will yield 7: the number of elements in the array.
On the other hand, dereferencing the expression &att + 1 having the type int ( * )[7] we will get lvalue of the type int[7] that in turn used in the expression *(&arr + 1) - arr is converted to a pointer of the type int * and has the same value as arr + 7 as it was pointed out above. So the expression will yield the number of elements in the array.
The only difference between these two expressions
( arr + 7 ) - arr
and
*( &arr + 1 ) - arr
is that in the first case we will need explicitly to specify the number of elements in the array to get the address of the memory after the last element of the array while in the second case the compiler itself will calculate the address of the memory after the last element of the array knowing the array declaration.
As others have mentioned, *(&arr + 1) triggers undefined behavior because &arr + 1 is a pointer to one-past-the end of an array of type int [7] and that pointer is subsequently dereferenced.
An alternate way of doing this would be to convert the relevant pointers to uintptr_t, subtracting, and dividing the element size.
int arrSize = reinterpret_cast<int>((reinterpret_cast<uintptr_t>(&arr + 1) -
reinterpret_cast<uintptr_t>(arr)) / sizeof *arr);
Or using C-style casts:
int arrSize = (int)(((uintptr_t)(&arr + 1) - (uintptr_t)arr) / sizeof *arr);
This one is simple:
arr is just a pointer to the 0'th element of the array (&arr[0]);
&arr gives a pointer to the previous pointer;
&arr+1 gives a pointer to a pointer to arr[0]+sizeof(arr)*1;
*(&arr + 1) turns the previous value into just &arr[0]+sizeof(arr)*1;
*(&arr + 1) - arr also subtracts the pointer to arr[0] leaving just sizeof(arr)*1.
So the only tricks here are that static arrays in C internally preserve all their static type information including their total sizes and that when you increment a pointer by some integer value, C compilers don't just add the value to it, but for whatever reason standards require to increase the pointers by the value of sizeof() of whatever type the pointer is assigned to times the specified value so *(&p+idx) gives the same result as p[idx].
C language is designed to allow for very simplistic compilers so inside it is full of little tricks like this. I would not recommend using them in production code though. Remember about other developers who may need to read and maintain your code later and use the most simple and obvious stuff available instead (for the example it is obviously just using sizeof() directly).

Calculate array length via pointer arithmetic

I was wondering how *(&array + 1) actually works. I saw this as an easy way to calculate the array length and want to understand it properly before using it. I'm not very experienced with pointer arithmetic, but with my understanding &array gives the address of the first element of the array. (&array + 1) would go to end of the array in terms of address. But shouldn't *(&array + 1) give the value, which is at this address. Instead it prints out the address. I would really appreciate your help to get the pointer stuff clear in my head.
Here is the simple example I'm working on:
int numbers[] = {5,8,9,3,4,6,1};
int length = *(&numbers + 1) - numbers;
(This answer is for C++.)
&numbers is a pointer to the array itself. It has type int (*)[7].
&numbers + 1 is a pointer to the byte right after the array, where another array of 7 ints would be located. It still has type int (*)[7].
*(&numbers + 1) dereferences this pointer, yielding an lvalue of type int[7] referring to the byte right after the array.
*(&numbers + 1) - numbers: Using the - operator forces both operands to undergo the array-to-pointer conversion, so pointers can be subtracted. *(&numbers + 1) is converted to an int* pointing at the byte after the array. numbers is converted to an int* pointing at the first byte of the array. Their difference is the number of ints between the two pointers---which is the number of ints in the array.
Edit: Although there's no valid object pointed to by &numbers + 1, this is what's called a "past the end" pointer. If p is a pointer to T, pointing to a valid object of type T, then it's always valid to compute p + 1, even though *p may be a single object, or the object at the end of an array. In that case, you get a "past the end" pointer, which does not point to a valid object, but is still a valid pointer. You can use this pointer for pointer arithmetic, and even dereference it to yield an lvalue, as long as you do not try to read or write through that lvalue. Note that you can only go one byte past-the-end of an object; attempting to go any further leads to undefined behaviour.
The expression &numbers gives you the address of the array, not the first member (although numerically they are the same). The type of this expression is int (*)[7], i.e. a pointer to an array of size 7.
The expression &numbers + 1 adds sizeof(int[7]) bytes to the address of array. The resulting pointer points right after the array.
The problem however is when you then dereference this pointer with *(&numbers + 1). Dereferencing a pointer that points one element past the end of an array invokes undefined behavior.
The proper way to get the number of elements of an array is sizeof(numbers)/sizeof(numbers[0]). This assumes that the array was defined in the current scope and is not a parameter to a function.
but with my understanding &array gives the address of the first element of the array.
This understanding is misleading. &array gives the address of the array. Sure, the value of that address is the same same as the first element, but the type of the expression is different. The type of the expression &array is "pointer to array of N elements of type T" (where N is the length that you're looking for and T is int).
But shouldn't *(&array + 1) give the value, which is at this address.
Well yes... but it's here that the type of the expression becomes important. Indirecting a pointer to an array (rather than pointer to an element of the array) will result in the array itself.
In the subtraction expression, both array operands decay into pointer to first element. Since the subtraction uses decayed pointers, the unit of the pointer arithmetic is in terms of the element size.
I saw this as an easy way to calculate the array length
There are easier ways:
std::size(numbers)
And in C:
sizeof(numbers)/sizeof(numbers[0])

How does sizeof know the size of array? [duplicate]

This question already has answers here:
How does sizeof know the size of the operand array?
(12 answers)
Closed 8 years ago.
I have codes as following:
main() {
int array[5] = {3,6,9,-8,1};
printf("the size of the array is %d\n", sizeof(array));
printf("the address of array is %p\n", array);
printf("the address of array is %p\n", &array);
int * x = array;
printf("the address of x is %p\n", x);
printf("the size of x is %d\n", sizeof(x));
}
The output is
the size of the array is 20
the address of array is 0x7fff02309560
the address of array is 0x7fff02309560
the address of x is 0x7fff02309560
the size of x is 8
I know the variable array will be seen as a pointer to the first element of the array, so I understand the the size of x is 8. But I don't know why the size of the array is 20. Isn't it should be 8 (in a 64-bits machine)?
Besides how does the program know that it is 20? As far as I know in C it doesn't store the number of elements. How come the sizeof(array) and sizeof(x) is different? I tracked several posts pertaining to array decaying but no idea on this problem.
The name of an array decays to a pointer to the first element of the array in most situations. There are a couple of exceptions to that rule though. The two most important are when the array name is used as the operand of either the sizeof operator or the address-of operator (&). In these cases, the name of the array remains an identifier for the array as a whole.
For a non-VLA array, this means that the size of the array can be determined statically (at compile time) and the result of the expression will be the size of the array (in bytes), not the size of a pointer.
When you take the address of the array, you get the same value (i.e., the same address) as if you'd just used the name of the array without taking the address. The type is different though--when you explicitly take the address, what you get is a pointer of type "pointer to array of N items of type T". That means (for one example) that while array+1 points to the second element of the array, &array+1 points to another array just past the end of the entire array.
Assuming an array of at least two items, *(array+1) will refer to the second element of the array. Regardless of the array size, &array+1 will yield an address past the end of the array, so attempting to dereference that address gives undefined behavior.
In your case, given that the size of the array is 20, and the size of one element of the array is 4, if array was, say, 0x1000, then array+1 would be 0x1004 and &array+1 would be 0x1014 (0x14 = 20).
Your array has a static length so it can be determined at compile time. Your compiler knows the sizeof(int) = 4 and your static array length [5]. 4 * 5 = 20
Edit: Your compilers int is probably 32-bit, but addressing 64-bit. That is why sizeof(pointer) returns 8.
Note that sizeof is not a library function. sizeof is
a compile-time unary operator [...] that can be used to compute the
size of any object K&R
So sizeof doesn't know how big the array is, the compiler knows how big the array is, and by definition
when applied to an array, the result is the total number of bytes
in the array.K&R
A pointer and an array are 2 different data types.
Array can hold elements of similar data type. The memory for array is contiguous.
Pointer is used to point to some valid memory location.
sizeof(type) gives you the number of bytes of the type you pass.
Now if you pass array then the compiler knows that this is an array and number of elements in it and it just multiplies that many elements with the respective data-type size value.
In this case:
5*4 = 20
Again the sizeof(int) or sizeof(pointer) is platform dependent. In this case you are seeing sizeof(pointer) as 8.
No, arrays do not decay as operands of the sizeof operator. This is one of the few places where arrays don't decay. If an int is 4 bytes on your machine, then the total number of bytes of the array should be 20 (4 * 5). We don't even need an object to test this.
sizeof(int[5]) // 20
sizeof(int*) // 8 on a 64-bit machine
C11: 6.5.3.4 (p2)
The sizeof operator yields the size (in bytes) of its operand, which may be an
expression or the parenthesized name of a type. The size is determined from the type of
the operand. [...]
In the declaration
int array[5]
the type of array is an array of 5 ints. The compiler will determine the size of array from this type.
Try this
int x = sizeof(array)/sizeof(int);
printf("the size of the array is %d\n", x);

Pointer incrementing in C++

What does this mean: that a pointer increment points to the address of the next base type of the pointer?
For example:
p1++; // p1 is a pointer to an int
Does this statement mean that the address pointed to by p1 should change to the address of the next int or it should just be incremented by 2 (assuming an int is 2 bytes), in which case the particular address may not contain an int?
I mean, if p1 is, say, 0x442012, will p1++ be 0x442014 (which may be part of the address of a double) or will it point to the next int which is in an address like 0x44201F?
Thanks
Pointer arithmetic doesn’t care about the content – or validity – of the pointee. It will simply increment the pointer address using the following formula:
new_value = reinterpret_cast<char*>(p) + sizeof(*p);
(Assuming a pointer to non-const – otherwise the cast wouldn’t work.)
That is, it will increment the pointer by an amount of sizeof(*p) bytes, regardless of things like pointee value and memory alignment.
The compiler will add sizeof(int) (usually 4) to the numeric value of the pointer. If p1 is 0x442012 before the increment, then after the increment it will be 0x442012 + 4 = 0x442016.
Mind you, 0x442012 is not a multiple of 4, so it is unlikely to be the address of a valid four-byte int, though it would be fine for your two-byte ints.
It certainly won't go looking for the next integer. That would require magic.
p1++ gives rise to assembly language instructions which increment p1 by the size of what it points to. So you get
(char *)p1 = (char *)p1 + sizeof (object pointed to by p1)
(When this question was answered) Typically an int is 4 bytes, so it would increment by 4, but it depends on the sizeof() on your machine.
It does not go to "the next int".
An example: assume a 4 byte address and p1 = 0x20424 (where p1 is an int*). Then
p1++
would set the new value of p1 to 0x20428. NOT 0x20425.
If p1 is pointing into the element of index n of an array of objects of type int (a non-array object counts as an array of length 1 for this purpose), then after p1++, p1 is either:
Pointing to the element of index n+1 if the array is of length greater than n+1.
The 'past-the-end' address of the array, if the array is of length exactly n+1.
p1++ causes undefined behavior if p1 is not pointing to an element of an array of objects of type int.
The only meaning that the C and C++ languages give to the notion of "address" is the value of a pointer object.
Any relationship that C/C++'s notion of address has to the notion of a numeric addresses you'd consider in assembly language is purely an implementation detail (albeit, an extremely common implementation detail).
Pointer arithmetic are done in sizoeof(*pointer) multiples - that is, for a pointer to int, increment will advance to the next integer (or 4 bytes for 32 bit integers).

Pointer addition and element size

At: http://www.fredosaurus.com/notes-cpp/arrayptr/26arraysaspointers.html
Under: Pointer addition and element size
There is the following code:
// Assume sizeof(int) is 4.
int b[100]; // b is an array of 100 ints.
int* p; // p is a a pointer to an int.
p = b; // Assigns address of first element of b. Ie, &b[0]
p = p + 1; // Adds 4 to p (4 == 1 * sizeof(int)). Ie, &b[1]
How did "p" in the last line become "4"?
Thanks.
(I assume that you mean "1" in the last line, not "p")
Pointer arithmetic in both C and C++ is a logical addition, not a numeric addition. Adding one to a pointer means "produce a pointer to the object that comes in memory right after this one," which means that the compiler automatically scales up whatever you're incrementing the pointer with by the size of the object being pointed at. This prevents you from having a pointer into the middle of an object, or a misaligned pointer, or both.
because p is pointer to a type with size 4 bytes. + operator on pointers is actually pointer shift. compiler knows the size of pointed type and shifts it by appropriate value
if you change int to short, p will be shifted by 2 bytes
The comment in the code you post it explains it: addition of an integer x to a pointer increases the pointer's value by x multiplied by the sizeof the type it is pointing to.
This is convenient because it doesn't usually make sense to change the pointer in smaller increments - you wouldn't want it to point into the middle of one of the elements.