What does this mean: that a pointer increment points to the address of the next base type of the pointer?
For example:
p1++; // p1 is a pointer to an int
Does this statement mean that the address pointed to by p1 should change to the address of the next int or it should just be incremented by 2 (assuming an int is 2 bytes), in which case the particular address may not contain an int?
I mean, if p1 is, say, 0x442012, will p1++ be 0x442014 (which may be part of the address of a double) or will it point to the next int which is in an address like 0x44201F?
Thanks
Pointer arithmetic doesn’t care about the content – or validity – of the pointee. It will simply increment the pointer address using the following formula:
new_value = reinterpret_cast<char*>(p) + sizeof(*p);
(Assuming a pointer to non-const – otherwise the cast wouldn’t work.)
That is, it will increment the pointer by an amount of sizeof(*p) bytes, regardless of things like pointee value and memory alignment.
The compiler will add sizeof(int) (usually 4) to the numeric value of the pointer. If p1 is 0x442012 before the increment, then after the increment it will be 0x442012 + 4 = 0x442016.
Mind you, 0x442012 is not a multiple of 4, so it is unlikely to be the address of a valid four-byte int, though it would be fine for your two-byte ints.
It certainly won't go looking for the next integer. That would require magic.
p1++ gives rise to assembly language instructions which increment p1 by the size of what it points to. So you get
(char *)p1 = (char *)p1 + sizeof (object pointed to by p1)
(When this question was answered) Typically an int is 4 bytes, so it would increment by 4, but it depends on the sizeof() on your machine.
It does not go to "the next int".
An example: assume a 4 byte address and p1 = 0x20424 (where p1 is an int*). Then
p1++
would set the new value of p1 to 0x20428. NOT 0x20425.
If p1 is pointing into the element of index n of an array of objects of type int (a non-array object counts as an array of length 1 for this purpose), then after p1++, p1 is either:
Pointing to the element of index n+1 if the array is of length greater than n+1.
The 'past-the-end' address of the array, if the array is of length exactly n+1.
p1++ causes undefined behavior if p1 is not pointing to an element of an array of objects of type int.
The only meaning that the C and C++ languages give to the notion of "address" is the value of a pointer object.
Any relationship that C/C++'s notion of address has to the notion of a numeric addresses you'd consider in assembly language is purely an implementation detail (albeit, an extremely common implementation detail).
Pointer arithmetic are done in sizoeof(*pointer) multiples - that is, for a pointer to int, increment will advance to the next integer (or 4 bytes for 32 bit integers).
Related
I have the following question.
Given that a pointer holds the value of a memory address, why is it permitted to add an integer
data type value to a pointer variable but not a double data type?
My thoughts: Is it because we assume that the pointer is an int as well, or maybe because if we add a double will increase its length?
Thank you for your time.
You almost answered your question yourself: a pointer is a memory address. A memory address is an integer. You can add integers to integers and get integers as a result. Adding a float to an integer gives you a float, which cannot be used as a memory address.
For example, char *x = 0; is the address of a single byte; What would char *y = 0.5; mean? A byte that's somehow made up of the second half of the byte at address 0 and the first half of the byte at address 1?? This may make sense, but what about char *x = 3.1415926; or any similar floating-point number??
My thoughts: Is it because we assume that the pointer is an int as well, or maybe because if we add a double will increase its length?
If you look to documentation it says:
Certain addition, subtraction, increment, and decrement operators are defined for pointers to elements of arrays: such pointers satisfy the LegacyRandomAccessIterator requirements and allow the C++ library algorithms to work with raw arrays.
(emphasis is mine) and you should remember that:
*(ptr + 1)
is equal to:
ptr[1]
and indexes for arrays are integers so language does not define operations on pointers with floating point operands as that does not make any sense.
You can not add a double* (pointer) to an int* (pointer) via the conventions of C. A pointer holds a value of a memory address ["stores/points to the address of another variable"] that value in essence is determined by its type in this case int(4 byte-block of memory if I recall). A double is a double-precision, 64-bit floating-point data type. Just can't do it from the most "hardware" of levels.
I was wondering how *(&array + 1) actually works. I saw this as an easy way to calculate the array length and want to understand it properly before using it. I'm not very experienced with pointer arithmetic, but with my understanding &array gives the address of the first element of the array. (&array + 1) would go to end of the array in terms of address. But shouldn't *(&array + 1) give the value, which is at this address. Instead it prints out the address. I would really appreciate your help to get the pointer stuff clear in my head.
Here is the simple example I'm working on:
int numbers[] = {5,8,9,3,4,6,1};
int length = *(&numbers + 1) - numbers;
(This answer is for C++.)
&numbers is a pointer to the array itself. It has type int (*)[7].
&numbers + 1 is a pointer to the byte right after the array, where another array of 7 ints would be located. It still has type int (*)[7].
*(&numbers + 1) dereferences this pointer, yielding an lvalue of type int[7] referring to the byte right after the array.
*(&numbers + 1) - numbers: Using the - operator forces both operands to undergo the array-to-pointer conversion, so pointers can be subtracted. *(&numbers + 1) is converted to an int* pointing at the byte after the array. numbers is converted to an int* pointing at the first byte of the array. Their difference is the number of ints between the two pointers---which is the number of ints in the array.
Edit: Although there's no valid object pointed to by &numbers + 1, this is what's called a "past the end" pointer. If p is a pointer to T, pointing to a valid object of type T, then it's always valid to compute p + 1, even though *p may be a single object, or the object at the end of an array. In that case, you get a "past the end" pointer, which does not point to a valid object, but is still a valid pointer. You can use this pointer for pointer arithmetic, and even dereference it to yield an lvalue, as long as you do not try to read or write through that lvalue. Note that you can only go one byte past-the-end of an object; attempting to go any further leads to undefined behaviour.
The expression &numbers gives you the address of the array, not the first member (although numerically they are the same). The type of this expression is int (*)[7], i.e. a pointer to an array of size 7.
The expression &numbers + 1 adds sizeof(int[7]) bytes to the address of array. The resulting pointer points right after the array.
The problem however is when you then dereference this pointer with *(&numbers + 1). Dereferencing a pointer that points one element past the end of an array invokes undefined behavior.
The proper way to get the number of elements of an array is sizeof(numbers)/sizeof(numbers[0]). This assumes that the array was defined in the current scope and is not a parameter to a function.
but with my understanding &array gives the address of the first element of the array.
This understanding is misleading. &array gives the address of the array. Sure, the value of that address is the same same as the first element, but the type of the expression is different. The type of the expression &array is "pointer to array of N elements of type T" (where N is the length that you're looking for and T is int).
But shouldn't *(&array + 1) give the value, which is at this address.
Well yes... but it's here that the type of the expression becomes important. Indirecting a pointer to an array (rather than pointer to an element of the array) will result in the array itself.
In the subtraction expression, both array operands decay into pointer to first element. Since the subtraction uses decayed pointers, the unit of the pointer arithmetic is in terms of the element size.
I saw this as an easy way to calculate the array length
There are easier ways:
std::size(numbers)
And in C:
sizeof(numbers)/sizeof(numbers[0])
Signed overflow is undefined. Unsigned overflow is defined as modulo arithmetic.
So my question is, is the following defined or undefined:
#include <assert.h>
#include <cstdint>
struct X { int x;/* ... anything ... */ };
X array[3] = { 2, 3, 4 /* or other init values that is compatible with X */};
X* element = array + 1;
std::uintptr_t plus1 = 1;
std::uintptr_t minus1 = 0-plus1;
int main()
{
printf("%p\n%p\n", element + plus1, array + 2);
printf("\n");
printf("%p\n%p\n", element + minus1, array);
assert(element + plus1 == array + 2);
assert(element + minus1 == array);
}
Though I state plus1/minus1, I really mean any +/- value. If I understand it correctly, this should work. Am I correct?
Pointer arithmetic is defined in the abstract machine.
In the abstract machine, ptr+x is only valid if ptr exists within an object such that its address is within -x of the edge.
This abstract machine does not care about the specific size of pointers or signed or unsigned integers. In this abstract machine, signed and unsigned integers have values that are real integers, or unspecified values.
minus1 with a 32 bit uintptr_t is equal to 0xffffffff, a large positive integer.
Does element point within an object that is large enough that 0xffffffff*sizeof(X) later it is still within the object? No it does not.
So element+minus1 is an undefined operation. Anything can happen.
On your hardware, a naive interpritation of pointer arithmetic into machine code may result in it wrapping around. But relying on this isn't safe.
For one thing, optimizers sometimes like proving things. If they prove element is greater than array's address, then no unsigned addition to element can possibly make it equal to array. So the compiler could optimize element+unsigned value == array to false.
Such optimizations can occur if you change optimization settings, upgrade your compiler, or completely innocuous things like change where it is inlined, or other code that is inlined, or heuristics at link time optimization, or the phase of the moon changing.
Relying on it working is dangerous, even when it does, as you now become responsible for auditing not the source code, but the machine code it generates.
Pointer arithmetic is only well defined as long as the pointers involved remain within a single array object or just past the end of the array. So technically, the expression element + minus1 is undefined behavior -- because minus1 is a very large value, it runs past the end of the array.
Now in practice, this is likely to work, but it is still technically undefined.
std::uintptr_t is an unsigned integral.
std::intptr_t is an signed integral.
And so overflow of std::uintptr_t is defined, whereas the one of std::intptr_t lead to UB.
In addition, pointer arithmetic is only valid within array (to one past the end of an array).
minus1 is the greatest number that std::uintptr_t can hold.
From http://en.cppreference.com/w/cpp/language/operator_arithmetic
If the pointer P points to the ith element of an array, then the expressions P+n, n+P, and P-n are pointers of the same type that point to the i+nth, i+nth, and i-nth element of the same array, respectively. The result of pointer addition may also be a one-past-the-end pointer (that is, pointer P such that the expression P-1 points to the last element of the array). Any other situations (that is, attempts to generate a pointer that isn't pointing at an element of the same array or one past the end) invoke undefined behavior.
element + minus1 is not the same as element - 1.
element + minus1 is outside valid range of array, and so lead to UB.
Anyone please elaborate what is happining here?
int main()
{
int **p = 0;
//p=? and why| *p=? and why|**p=? and why
++p;
//p=? and why| *p=? and why|**p=? and why
printf("%d\n", p);
return 1;
}
output:-
4 (why?)
First of all, p is a pointer to a pointer-to-integer.
int **p = 0;
p = 0, *p = nothing, **p = less than nothing.
++p;
Same as p = p + 1. Means the size of one pointer to a pointer-to-int further. A pointer is basically, at least on your OS, 32 bits length (4 bytes). p now points 4 bytes after 0. The value of p is 4.
p is a pointer to a pointer-to-int. It's being initialised to 0, i.e. it's a null pointer.
It's then being incremented to point at the next consecutive pointer-to-int in memory.* The next pointer will be at address 4, because on your platform the size of a pointer is 4 bytes.
Then printf interprets the pointer value as an integer, and so displays "4".
* Note, however, that this is now undefined behaviour.
It is clear. You have a pointer to a pointer to int (int **p means a pointer to a pointer to int), that actually holds the address 0). A pointer in itself, in your architecture, is 32 bits (4 bytes) long, so incrementing p gives you p+4, that is, 0+4 = 4.
Go get a nice C book and learn about pointer arithmetic. You'll be glad the rest of your life! :)
++p is actually undefined behaviour, but what appears to have happened on your implementation is that sizeof(int*) is 4, and a null pointer is address 0. Recall that pointer increment, when it's not UB, adds a number of bytes to the address equal to the size of the referand type. So it's not all that surprising that when you take a null pointer of type int** (hence the referand type is int*) and increment it, you end up at the address 4. It's just not guaranteed.
Passing a pointer when the %d format expects an int is also undefined behavior, but it appears that the representation of int and int** are sufficiently compatible, and the varargs calling convention on your implementation treats them sufficiently similarly, that it has successfully printed 4. That's also not very surprising for implementations where sizeof(int) == sizeof(int**), but also isn't guaranteed.
Of course since it's undefined behavior, there are other possible explanations for what you see.
p is a pointer to pointer to int. And it's initialized to 0, i.e. NULL.
When you increment it, it now points to next pointer to int, which, on 32-bit systems, happens to be 4.
At: http://www.fredosaurus.com/notes-cpp/arrayptr/26arraysaspointers.html
Under: Pointer addition and element size
There is the following code:
// Assume sizeof(int) is 4.
int b[100]; // b is an array of 100 ints.
int* p; // p is a a pointer to an int.
p = b; // Assigns address of first element of b. Ie, &b[0]
p = p + 1; // Adds 4 to p (4 == 1 * sizeof(int)). Ie, &b[1]
How did "p" in the last line become "4"?
Thanks.
(I assume that you mean "1" in the last line, not "p")
Pointer arithmetic in both C and C++ is a logical addition, not a numeric addition. Adding one to a pointer means "produce a pointer to the object that comes in memory right after this one," which means that the compiler automatically scales up whatever you're incrementing the pointer with by the size of the object being pointed at. This prevents you from having a pointer into the middle of an object, or a misaligned pointer, or both.
because p is pointer to a type with size 4 bytes. + operator on pointers is actually pointer shift. compiler knows the size of pointed type and shifts it by appropriate value
if you change int to short, p will be shifted by 2 bytes
The comment in the code you post it explains it: addition of an integer x to a pointer increases the pointer's value by x multiplied by the sizeof the type it is pointing to.
This is convenient because it doesn't usually make sense to change the pointer in smaller increments - you wouldn't want it to point into the middle of one of the elements.