Anyone please elaborate what is happining here?
int main()
{
int **p = 0;
//p=? and why| *p=? and why|**p=? and why
++p;
//p=? and why| *p=? and why|**p=? and why
printf("%d\n", p);
return 1;
}
output:-
4 (why?)
First of all, p is a pointer to a pointer-to-integer.
int **p = 0;
p = 0, *p = nothing, **p = less than nothing.
++p;
Same as p = p + 1. Means the size of one pointer to a pointer-to-int further. A pointer is basically, at least on your OS, 32 bits length (4 bytes). p now points 4 bytes after 0. The value of p is 4.
p is a pointer to a pointer-to-int. It's being initialised to 0, i.e. it's a null pointer.
It's then being incremented to point at the next consecutive pointer-to-int in memory.* The next pointer will be at address 4, because on your platform the size of a pointer is 4 bytes.
Then printf interprets the pointer value as an integer, and so displays "4".
* Note, however, that this is now undefined behaviour.
It is clear. You have a pointer to a pointer to int (int **p means a pointer to a pointer to int), that actually holds the address 0). A pointer in itself, in your architecture, is 32 bits (4 bytes) long, so incrementing p gives you p+4, that is, 0+4 = 4.
Go get a nice C book and learn about pointer arithmetic. You'll be glad the rest of your life! :)
++p is actually undefined behaviour, but what appears to have happened on your implementation is that sizeof(int*) is 4, and a null pointer is address 0. Recall that pointer increment, when it's not UB, adds a number of bytes to the address equal to the size of the referand type. So it's not all that surprising that when you take a null pointer of type int** (hence the referand type is int*) and increment it, you end up at the address 4. It's just not guaranteed.
Passing a pointer when the %d format expects an int is also undefined behavior, but it appears that the representation of int and int** are sufficiently compatible, and the varargs calling convention on your implementation treats them sufficiently similarly, that it has successfully printed 4. That's also not very surprising for implementations where sizeof(int) == sizeof(int**), but also isn't guaranteed.
Of course since it's undefined behavior, there are other possible explanations for what you see.
p is a pointer to pointer to int. And it's initialized to 0, i.e. NULL.
When you increment it, it now points to next pointer to int, which, on 32-bit systems, happens to be 4.
Related
I was wondering how *(&array + 1) actually works. I saw this as an easy way to calculate the array length and want to understand it properly before using it. I'm not very experienced with pointer arithmetic, but with my understanding &array gives the address of the first element of the array. (&array + 1) would go to end of the array in terms of address. But shouldn't *(&array + 1) give the value, which is at this address. Instead it prints out the address. I would really appreciate your help to get the pointer stuff clear in my head.
Here is the simple example I'm working on:
int numbers[] = {5,8,9,3,4,6,1};
int length = *(&numbers + 1) - numbers;
(This answer is for C++.)
&numbers is a pointer to the array itself. It has type int (*)[7].
&numbers + 1 is a pointer to the byte right after the array, where another array of 7 ints would be located. It still has type int (*)[7].
*(&numbers + 1) dereferences this pointer, yielding an lvalue of type int[7] referring to the byte right after the array.
*(&numbers + 1) - numbers: Using the - operator forces both operands to undergo the array-to-pointer conversion, so pointers can be subtracted. *(&numbers + 1) is converted to an int* pointing at the byte after the array. numbers is converted to an int* pointing at the first byte of the array. Their difference is the number of ints between the two pointers---which is the number of ints in the array.
Edit: Although there's no valid object pointed to by &numbers + 1, this is what's called a "past the end" pointer. If p is a pointer to T, pointing to a valid object of type T, then it's always valid to compute p + 1, even though *p may be a single object, or the object at the end of an array. In that case, you get a "past the end" pointer, which does not point to a valid object, but is still a valid pointer. You can use this pointer for pointer arithmetic, and even dereference it to yield an lvalue, as long as you do not try to read or write through that lvalue. Note that you can only go one byte past-the-end of an object; attempting to go any further leads to undefined behaviour.
The expression &numbers gives you the address of the array, not the first member (although numerically they are the same). The type of this expression is int (*)[7], i.e. a pointer to an array of size 7.
The expression &numbers + 1 adds sizeof(int[7]) bytes to the address of array. The resulting pointer points right after the array.
The problem however is when you then dereference this pointer with *(&numbers + 1). Dereferencing a pointer that points one element past the end of an array invokes undefined behavior.
The proper way to get the number of elements of an array is sizeof(numbers)/sizeof(numbers[0]). This assumes that the array was defined in the current scope and is not a parameter to a function.
but with my understanding &array gives the address of the first element of the array.
This understanding is misleading. &array gives the address of the array. Sure, the value of that address is the same same as the first element, but the type of the expression is different. The type of the expression &array is "pointer to array of N elements of type T" (where N is the length that you're looking for and T is int).
But shouldn't *(&array + 1) give the value, which is at this address.
Well yes... but it's here that the type of the expression becomes important. Indirecting a pointer to an array (rather than pointer to an element of the array) will result in the array itself.
In the subtraction expression, both array operands decay into pointer to first element. Since the subtraction uses decayed pointers, the unit of the pointer arithmetic is in terms of the element size.
I saw this as an easy way to calculate the array length
There are easier ways:
std::size(numbers)
And in C:
sizeof(numbers)/sizeof(numbers[0])
Is this code correct?
int arr[2];
int (*ptr)[2] = (int (*)[2]) &arr[1];
ptr[0][0] = 0;
Obviously ptr[0][1] would be invalid by accessing out of bounds of arr.
Note: There's no doubt that ptr[0][0] designates the same memory location as arr[1]; the question is whether we are allowed to access that memory location via ptr. Here are some more examples of when an expression does designate the same memory location but it is not permitted to access the memory location that way.
Note 2: Also consider **ptr = 0; . As pointed out by Marc van Leeuwen, ptr[0] is equivalent to *(ptr + 0), however ptr + 0 seems to fall foul of the pointer arithmetic section. But by using *ptr instead, that is avoided.
Not an answer but a comment that I can't seem to word well without being a wall of text:
Given arrays are guaranteed to store their contents contiguously so that they can be 'iterated over' using a pointer. If I can take a pointer to the begin of an array and successively increment that pointer until I have accessed every element of the array then surely that makes a statement that the array can be accessed as a series of whatever type it is composed of.
Surely the combination of:
1) Array[x] stores its first element at address 'array'
2) Successive increments of the a pointer to it are sufficient to access the next item
3) Array[x-1] obeys the same rules
Then it should be legal to at least look at the address 'array' as if it were type array[x-1] instead of type array[x].
Furthermore given the points about being contiguous and how pointers to elements in the array have to behave, surely it must be legal to then group any contiguous subset of array[x] as array[y] where y < x and it's upper bound does not exceed the extent of array[x].
Not being a language-lawyer this is just me spouting some rubbish. I am very interested in the outcome of this discussion though.
EDIT:
On further consideration of the original code, it seems to me that arrays are themselves very much a special case in many regards. They decay to a pointer, and I believe can be aliased as per what I just said earlier in this post.
So without any standardese to back up my humble opinion, an array can't really be invalid or 'undefined' as a whole if it doesn't really get treated as a whole uniformly.
What does get treated uniformly are the individual elements. So I think it only makes sense to talk about whether accessing a specific element is valid or defined.
For C++ (I'm using draft N4296) [dcl.array]/7 says in particular that if the result of subscripting is an array, it's immediately converted to pointer. That is, in ptr[0][0] ptr[0] is first converted to int* and only then second [0] is applied to it. So it's perfectly valid code.
For C (C11 draft N1570) 6.5.2.1/3 states the same.
Yes, this is correct code. Quoting N4140 for C++14:
[expr.sub]/1 ... The expression E1[E2] is identical (by definition) to *((E1)+(E2))
[expr.add]/5 ... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
There is no overflow here. &*(*(ptr)) == &ptr[0][0] == &arr[1].
For C11 (N1570) the rules are the same. §6.5.2.1 and §6.5.6
Let me give a dissenting opinion: this is (at least in C++) undefined behaviour, for much the same reason as in the other question that this question linked to.
First let me clarify the example with some typedefs that will simplify the discussion.
typedef int two_ints[2];
typedef int* int_ptr;
typedef two_ints* two_ints_ptr;
two_ints arr;
two_ints_ptr ptr = (two_ints_ptr) &arr[1];
int_ptr temp = ptr[0]; // the two_ints value ptr[0] gets converted to int_ptr
temp[0] = 0;
So the question is whether, although there is no object of type two_ints whose address coincides with that of arr[1] (in the same sense that the adress of arr coincides with that of arr[0]), and therefore no object to which ptr[0] could possibly point to, one can nonetheless convert the value of that expression to one of type int_ptr (here given the name temp) that does point to an object (namely the integer object also called arr[1]).
The point where I think behaviour is undefined is in the evaluation of ptr[0], which is equivalent (per 5.2.1[expr.sub]) to *(ptr+0); more precisely the evaluation of ptr+0 has undefined behaviour.
I'll cite my copy of the C++ which is not official [N3337], but probably the language has not changed; what bothers me slightly is that the section number does not at all match the one mentioned at the accepted answer of the linked question. Anyway, for me it is §5.7[expr.add]
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce overflow; otherwise the behavior is undefined.
Since the pointer operand ptr has type pointer to two_ints, the "array object" mentioned in the cited text would have to be an array of two_ints objects. However there is only one such object here, the fictive array whose unique element is arr that we are supposed to conjure up in such situations (as per: "pointer to nonarray object behaves the same as a pointer to the first element of an array of length one..."), but clearly ptr does not point to its unique element arr. So even though ptr and ptr+0 are no doubt equal values, neither of them point to elements of any array object at all (not even a fictive one), nor one past the end of such an array object, and the condition of the cited phrase is not met. The consequence is (not that overflow is produced, but) that behavior is undefined.
So behavior is already undefined before the indirection operator * is applied. I would not argue for undefined behavior from the latter evaluation, even though the phrase "the result is an lvalue referring to the object or function to which the expression points" is hard to interpret for expressions that do not refer to any object at all. But I would be lenient in interpreting this, since I think dereferencing a pointer past an array should not itself be undefined behavior (for instance if used to initialise a reference).
This would suggest that if instead of ptr[0][0] one wrote (*ptr)[0] or **ptr, then behaviour would not be undefined. This is curious, but it would not be the first time the C++ standard surprises me.
It depends on what you mean by "correct". You are doing a cast on the ptr to arr[1]. In C++ this will probably be a reinterpret_cast. C and C++ are languages which (most of the time) assume that the programmer knows what he is doing. That this code is buggy has nothing to do with the fact that it is valid C/C++ code.
You are not violating any rules in the standards (as far as I can see).
Trying to answer here why the code works on commonly used compilers:
int arr[2];
int (*ptr)[2] = (int (*)[2]) &arr[1];
printf("%p\n", (void*)ptr);
printf("%p\n", (void*)*ptr);
printf("%p\n", (void*)ptr[0]);
All lines print the same address on commonly used compilers. So, ptr is an object for which *ptr represents the same memory location as ptr on commonly used compilers and therefore ptr[0] is really a pointer to arr[1] and therefore arr[0][0] is arr[1]. So, the code assigns a value to arr[1].
Now, let's suppose a perverse implementation where a pointer to an array (NOTE: I'm saying pointer to an array, i.e. &arr which has the type int(*)[], not arr which means the same as &arr[0] and has the type int*) is the pointer to the second byte within the array. Then dereferencing ptr is the same as subtracting 1 from ptr using char* arithmetic. For structs and unions, it is guaranteed that pointer to such types is the same as pointer to the first element of such types, but in casting pointer to array into pointer no such guarantee was found for arrays (i.e. that pointer to an array would be the same as pointer to the first element of the array) and as a matter of fact #FUZxxl planned to file a defect report about the standard. For such a perverse implementation, *ptr i.e. ptr[0] would not be the same as &arr[1]. On RISC processors, it would as a matter of fact cause problems due to data alignment.
Some additional fun:
int arr[2] = {0, 0};
int *ptr = (int*)&arr;
ptr[0] = 5;
printf("%d\n", arr[0]);
Should that code work? It prints 5.
Even more fun:
int arr[2] = {0, 0};
int (*ptr)[3] = (int(*)[3])&arr;
ptr[0][0] = 6;
printf("%d\n", arr[0]);
Should this work? It prints 6.
This should obviously work:
int arr[2] = {0, 0};
int (*ptr)[2] = &arr;
ptr[0][0] = 7;
printf("%d\n", arr[0]);
I have been programming c/c++ for many years, but todays accidental discovery made me somewhat curious... Why does both outputs produce the same result in the code below? (arr is of course the address of arr[0], i.e. a pointer to arr[0]. I would have expected &arr to be the adress of that pointer, but it has the same value as arr)
int arr[3];
cout << arr << endl;
cout << &arr << endl;
Remark: This question was closed, but now it is opened again. (Thanks ?)
I know that &arr[0] and arr evaluates to the same number, but that is not my question! The question is why &arr and arr evaluates to the same number. If arr is a literal (not stored anyware), then the compiler should complain and say that arr is not an lvalue. If the address of the arr is stored somewhere then &arr should give me the address of that location. (but this is not the case)
if I write
const int* arr2 = arr;
then arr2[i]==arr[i] for any integer i, but &arr2 != arr.
#include <cassert>
struct foo {
int x;
int y;
};
int main() {
foo f;
void* a = &f.x;
void* b = &f;
assert(a == b);
}
For the same reason the two addresses a and b above are the same. The address of an object is the same as the address of its first member (Their types however, are different).
arr
_______^_______
/ \
| [0] [1] [2] |
--------------------+-----+-----+-----+--------------------------
some memory | | | | more memory
--------------------+-----+-----+-----+--------------------------
^
|
the pointers point here
As you can see in this diagram, the first element of the array is at the same address as the array itself.
They're not the same. They just are at the same memory location. For example, you can write arr+2 to get the address of arr[2], but not (&arr)+2 to do the same.
Also, sizeof arr and sizeof &arr are different.
The two have the same value but different types.
When it's used by itself (not the operand of & or sizeof), arr evaluates to a pointer to int holding the address of the first int in the array.
&arr evaluates to a pointer to array of three ints, holding the address of the array. Since the first int in the array has to be at the very beginning of the array, those addresses must be equal.
The difference between the two becomes apparent if you do some math on the results:
arr+1 will be equal to arr + sizeof(int).
((&arr) + 1) will be equal to arr + sizeof(arr) == arr + sizeof(int) * 3
Edit: As to how/why this happens, the answer is fairly simple: because the standard says so. In particular, it says (§6.3.2.1/3):
Except when it is the operand of the sizeof operator or the unary & operator, or is a
string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue.
[note: this particular quote is from the C99 standard, but I believe there's equivalent language in all versions of both the C and C++ standards].
In the first case (arr by itself), arr is not being used as the operand of sizeof, unary &, etc., so it is converted (not promoted) to the type "pointer to type" (in this case, "pointer to int").
In the second case (&arr), the name obviously is being used as the operand of the unary & operator -- so that conversion does not take place.
The address is the same but both expressions are different. They just start at the same memory location. The types of both expressions are different.
The value of arr is of type int * and the value of &arr is of type int (*)[3].
& is the address operator and the address of an object is a pointer to that object. The pointer to an object of type int [3] is of type int (*)[3]
They are not the same.
A bit more strict explanation:
arr is an lvalue of type int [3]. An attempt to use
arr in some expressions like cout << arr will result in lvalue-to-rvalue conversion which, as there are no rvalues of array type, will convert it to an rvalue of type int * and with the value equal to &arr[0]. This is what you can display.
&arr is an rvalue of type int (*)[3], pointing to the array object itself. No magic here :-) This pointer points to the same address as &arr[0] because the array object and its first member start in the exact same place in the memory. That's why you have the same result when printing them.
An easy way to confirm that they are different is comparing *(arr) and *(&arr): the first is an lvalue of type int and the second is an lvalue of type int[3].
Pointers and arrays can often be treated identically, but there are differences. A pointer does have a memory location, so you can take the address of a pointer. But an array has nothing pointing to it, at runtime. So taking the address of an array is, to the compiler, syntactically defined to be the same as the address of the first element. Which makes sense, reading that sentence aloud.
I found Graham Perks' answer to be very insightful, I even went ahead and tested this in an online compiler:
int main()
{
int arr[3] = {1,2,3};
int *arrPointer = arr; // this is equivalent to: int *arrPointer = &arr;
printf("address of arr: %p\n", &arr);
printf("address of arrPointer: %p\n", &arrPointer);
printf("arr: %p\n", arr);
printf("arrPointer: %p\n", arrPointer);
printf("*arr: %d\n", *arr);
printf("*arrPointer: %d\n", *arrPointer);
return 0;
}
Outputs:
address of arr: 0x7ffed83efbac
address of arrPointer: 0x7ffed83efba0
arr: 0x7ffed83efbac
arrPointer: 0x7ffed83efbac
*arr: 1
*arrPointer: 1
It seems the confusion was that arr and arrPointer are equivalent. However, as Graham Parks detailed in his answer, they are not.
Visually, the memory looks something like this:
[Memory View]
[memory address: value stored]
arrPointer:
0x7ffed83efba0: 0x7ffed83efbac
arr:
0x7ffed83efbac: 1
0x7ffed83efbb0: 2
0x7ffed83efbb4: 3
As you can see, arrPointer is a label for memory address 0x7ffed83efba0 which has 4 bytes of allocated memory which hold the memory address of arr[0].
On the other hand, arr is a label for memory address 0x7ffed83efbac, and as per Jerry Coffin's answer, since the type of variable arr is "array of type", it gets converted to a "pointer of type" (which points to the array's starting address), and thus printing arr yields 0x7ffed83efbac.
The key difference is arrPointer is an actual pointer and has its own memory slot allocated to hold the value of the memory it's pointing to, so &arrPointer != arrPointer. Since arr is not technically a pointer but an array, the memory address we see when printing arr is not stored elsewhere, but rather determined by the conversion mentioned above. So, the values (not types) of &arrPointer and arrPointer are equal.
What does this mean: that a pointer increment points to the address of the next base type of the pointer?
For example:
p1++; // p1 is a pointer to an int
Does this statement mean that the address pointed to by p1 should change to the address of the next int or it should just be incremented by 2 (assuming an int is 2 bytes), in which case the particular address may not contain an int?
I mean, if p1 is, say, 0x442012, will p1++ be 0x442014 (which may be part of the address of a double) or will it point to the next int which is in an address like 0x44201F?
Thanks
Pointer arithmetic doesn’t care about the content – or validity – of the pointee. It will simply increment the pointer address using the following formula:
new_value = reinterpret_cast<char*>(p) + sizeof(*p);
(Assuming a pointer to non-const – otherwise the cast wouldn’t work.)
That is, it will increment the pointer by an amount of sizeof(*p) bytes, regardless of things like pointee value and memory alignment.
The compiler will add sizeof(int) (usually 4) to the numeric value of the pointer. If p1 is 0x442012 before the increment, then after the increment it will be 0x442012 + 4 = 0x442016.
Mind you, 0x442012 is not a multiple of 4, so it is unlikely to be the address of a valid four-byte int, though it would be fine for your two-byte ints.
It certainly won't go looking for the next integer. That would require magic.
p1++ gives rise to assembly language instructions which increment p1 by the size of what it points to. So you get
(char *)p1 = (char *)p1 + sizeof (object pointed to by p1)
(When this question was answered) Typically an int is 4 bytes, so it would increment by 4, but it depends on the sizeof() on your machine.
It does not go to "the next int".
An example: assume a 4 byte address and p1 = 0x20424 (where p1 is an int*). Then
p1++
would set the new value of p1 to 0x20428. NOT 0x20425.
If p1 is pointing into the element of index n of an array of objects of type int (a non-array object counts as an array of length 1 for this purpose), then after p1++, p1 is either:
Pointing to the element of index n+1 if the array is of length greater than n+1.
The 'past-the-end' address of the array, if the array is of length exactly n+1.
p1++ causes undefined behavior if p1 is not pointing to an element of an array of objects of type int.
The only meaning that the C and C++ languages give to the notion of "address" is the value of a pointer object.
Any relationship that C/C++'s notion of address has to the notion of a numeric addresses you'd consider in assembly language is purely an implementation detail (albeit, an extremely common implementation detail).
Pointer arithmetic are done in sizoeof(*pointer) multiples - that is, for a pointer to int, increment will advance to the next integer (or 4 bytes for 32 bit integers).
At: http://www.fredosaurus.com/notes-cpp/arrayptr/26arraysaspointers.html
Under: Pointer addition and element size
There is the following code:
// Assume sizeof(int) is 4.
int b[100]; // b is an array of 100 ints.
int* p; // p is a a pointer to an int.
p = b; // Assigns address of first element of b. Ie, &b[0]
p = p + 1; // Adds 4 to p (4 == 1 * sizeof(int)). Ie, &b[1]
How did "p" in the last line become "4"?
Thanks.
(I assume that you mean "1" in the last line, not "p")
Pointer arithmetic in both C and C++ is a logical addition, not a numeric addition. Adding one to a pointer means "produce a pointer to the object that comes in memory right after this one," which means that the compiler automatically scales up whatever you're incrementing the pointer with by the size of the object being pointed at. This prevents you from having a pointer into the middle of an object, or a misaligned pointer, or both.
because p is pointer to a type with size 4 bytes. + operator on pointers is actually pointer shift. compiler knows the size of pointed type and shifts it by appropriate value
if you change int to short, p will be shifted by 2 bytes
The comment in the code you post it explains it: addition of an integer x to a pointer increases the pointer's value by x multiplied by the sizeof the type it is pointing to.
This is convenient because it doesn't usually make sense to change the pointer in smaller increments - you wouldn't want it to point into the middle of one of the elements.