I don't understand C++ pointer arithmetic

I don't understand C++ pointer arithmetic - c++

I have the following program which defines 2 integers and a pointer to an integer.
#include <stdio.h>
int main() {
int bla=999;
int a=42;
int* pa=&a;
printf("%d \n", *pa);
printf("%d \n", pa);
pa++;
//*pa=666; //runs (no error), but the console is showing nothing at all
printf("%d \n", *pa);
printf("%d \n", pa);
pa++;
//*pa=666; //runs and changes the value of *pa to 666;
printf("%d \n", *pa);
printf("%d \n", pa);
}
The output is:
42
2686740
2686744
2686744 //this value is strange, I think
999
2686748
The adresses are making sense to me, but the fourth value is strange, because it is exactly the adress of the int. Can somebody explain that behaviour ?
When I comment *pa=666 (the first apperance) in, the console shows nothing, so here is some sort of error, but the compiler does not show an error. Maybe this is because of the size of int on my system, I have a 64bit-windows-os, so maybe the int is 64 bit and not 32 ? And because of that the *pa-value is 999 after the second increment and not the first ?
I am sure, there are a lot of C-programmers out there who can explain what is going on :)

int* pa=&a;
pa is pointer to an integer and accessing *pa is defined.
Once you increment your pointer, then the pointer is pointing to some memory(after p) which is not allocated by you or not known to you so dereferencing it leads to undefined beahvior.
pa++;
*pa is UB
Edit:
Use proper format specifier to print the pointer value %p as pointed out by #haccks

The output is not strange, it is to be expected: You have three variables in main(), all of which are stored on the stack, and which happen to be right one after the other. One of these variables is the pointer itself. So, when you dereference the pointer in the third line, you get the current value of the pointer itself.
Nevertheless, this output is not predictable, it is undefined behavior: You are only allowed to use pointer arithmetic to access data within a single memory object, and in your case, the memory object is just a single int. Consequently, accessing *pa after the first pa++ is illegal, and the program is allowed to do anything from that point on.
More specifically, there is no guarantee which other variables follow a certain variable, in which order they follow, or if there is accessable memory at all. Even reading *pa after the first pa++ is allowed to crash your program. As you have witnessed, you will not experience a crash in many cases (which would be easy to debug), yet the code is still deeply broken.

You are using wrong format specifier to print address. This will invoke undefined behavior and once the UB is invoked, all bets are off. Use %p instead.
printf("%p \n", (void *)pa);
Another problem is that after execution of pa++;, you are accessing unallocated memory and another reason for UB.

You're not smarter than your compiler.
As said by another answer what you do is Undefined Behaviour. With pa you are just doing non-sense, it does'nt correspond to any reasonlable algorithm for a defined goal: it's non sense.
However I will propose you a possible scenario of what's happenning. Though much of it could be false because compilers do optimizations.
int bla=999;
int a=42;
int* pa=&a;
These variables are allocated on the stack.
When writing pa = &a you say "I want pointer pa to be equal to the address of a".
Probably the compiler could have allocated the memory in the order or declaration, which would give something like:
bla would have address 0x00008880
a would have address 0x00008884
pa would have address 0x00008888
when you do pa++ you're telling: move my pointer of int to the next position of int in memory.
As ints are 32 bits, you're doing pa = pa + 4bytes i.e. pa = 0x00008888
Notice that, by chance !,you're probably pointing to the address of the pa pointer.
So now the pointer pa contains its own address... which is pretty esoteric and could be called ouroboros.
Then you're asking again pa++... so pa = pa + 4 bytes i.e. pa = 0x0000888c
So now you are probably accessing an unknown memory zone. It could be an access violation. It's undefined behaviour if you ever want to read or write.

When you first assigned the pointer it pointed to 2686740. The pointer is an integer pointer and integers use 4 bytes (usually, on your machine it used 4). That means pa++ is going to increase the value to be 4 more which is 2686744. Doing it again resulted in 2686748
If you were to look at the resulting assembly code the order of your local variables would be switched around. The ordering was a, pa, bla when the code ran. Because you don't have explicit control over this ordering the output of your printing is considered to be undefined
After the first time you did pa++ the pointer pointed at itself, that is why you got the "strange value"
As mentioned by many of the other answers, this is not good use of pointers and should be avoided. You don't have control over what the pointer is pointing to in this situation. A much better use of pointer arithmetic would be pointing at the beginning of an array and then doing pa++ to point to the next element in the array. The only problem you could experience then would be incrementing past the last element of the array

Are you trying to increment the value in a through the pointer *pa?
If so, do: (*pa)++. The brackets are crucial as they mean "take the value of the pointer", then use that address to increment whatever it is referencing.
This is completely different to *pa++ which simply returns the value pointed to by *pa and then increments the pointer (not the thing that it is referencing).
One of the little traps of C syntax. K&R has a few pages devoted to this, I suggest you try some of the examples there.

Related

Can anybody explain why *var=i is valid [duplicate]

char a[] = "hello";
My understanding is that a acts like a constant pointer to a string. I know writing a++ won't work, but why?

No, it's not OK to increment an array. Although arrays are freely convertible to pointers, they are not pointers. Therefore, writing a++ will trigger an error.
However, writing
char *p = a;
p++;
is fine, becuase p is a pointer, with value equal to the location of a's initial element.

a++ is not well-formed since a decays to a pointer, and the result of the decay is not an lvalue (so there is no persistent object whose state could be "incremented").
If you want to manipulate pointers to the array, you should first create such a pointer:
char* p = a; // decayed pointer initializes p
a++; // OK
++a; // even OKer

This is a very good question actually. Before discussing this, let's back to the basic concepts.
What happens when we declare a variable ?
int a=10;
Well, we get a memory location to store the variable a. Apart from this an entry is created into Symbol table that contains the address of the variable and the name of the memory location (a in this case).
Once the entry is created, you can never change anything into the symbol table, means you can't update the address. Getting an address for a variable is not in our hand, it's done by our computer system.
Let's say, we get address 400 for our variable a.
Now computer has assigned an address for the variable a, so at a later point, we can't ask computer to change this address 400 because again, it's not in our hand, our computer system does it.
Now you have an idea about what happens when we declare a variable.let's come to our question.
Let's declare an array.
int arr[10]
So, when we declare this array, we create the entry into the symbol table and, store the address and the name of the array into the symbol table.
let's assume we get address 500 for this variable.
Let's see what happens when we want to do something like this :
arr++
when we increment arr, we want to increment 500, that is not possible and not in our hand, because it has been decided by the computer system, so we can't change it.
instead of doing this we can declare a pointer variable
int * p= &arr;
What happens in this situation is: again an entry is created into the symbol table that stores the name and the address of the pointer variable p.
So when we try to increment p by doing p++, we are not changing the value into the symbol table, instead we are changing the value of the address of the pointer variable, that we can do and we are allowed to do.
Also it's very obvious that if we will increment the a the ultimately we are going to loss the address of our array. if we loss the address of array then how will we access the array at a later point ?

It is never legal in C to assign to an expression of array type. Increment (++) involves assignment, and is thus also not legal.
What you showed at the top is a special syntax for initializing a char array variable.

I think this answer here explains "why" it's not a good idea;
It's because array is treated as a constant pointer in the function it is declared.
There is a reason for it. Array variable is supposed to point to the first element of the array or first memory instance of the block of the contiguous memory locations in which it is stored. So, if we will have the liberty to to change(increment or decrement ) the array pointer, it won't point to the first memory location of the block. Thus it will loose it's purpose.

Why is there a value printed and not NULL/0 after incrementing a pointer in C++?

I am fairly new to C++, so excuse if this is quite basic.
I am trying to understand the value printed after I increment my pointer in the following piece of code
int main()
{
int i = 5;
int* pointeri = &i;
cout << pointeri << "\n";
pointeri++;
i =7;
cout << *pointeri << "\n";
}
When I deference the pointer, it prints a random Integer. I am trying to understand, what is really happening here, why isn't the pointer pointing at NULL and does the random integer have a significance ?

The C++ language has a concept of Undefined Behavior. It means that it is possible to write code that does not constitute a valid program, and the compiler won't stop or even warn you. What such code does when executed is unknown.
Your program is a typical example. After the line int* pointeri = &i;, the pointer is pointing to the value i. After pointeri++ it is pointing to the memory location after the value i. What is stored at that location is unknown and the behavior of such code is undefined.
Needless to say, great care should be taken when coding in C++ in order to stay in the realm of defined behavior, in order to have meaningful and predictable results when running the program.

why isn't the pointer pointing at NULL
Because you haven't assigned or initialised the pointer to null.
and does the random integer have a significance ?
No.
Why is there a value printed ...
Because the behaviour of the program is undefined.

As you know, a "pointer" is simply an integer variable whose value is understood to be a memory address. If that value is zero, by convention we call it NULL and understand this to mean that "it doesn't point at anything." Otherwise, the value is presumed to be valid.
If you "increment" a pointer, its value is non-zero and therefore presumed to be valid. If you dereference it, you will either get "unpredictable data" or a memory-addressing fault.

Does free() operator delete the address from the dynamic variable?

Let's consider below program:
int main ()
{
int *p, *r;
p = (int*)malloc(sizeof(int));
cout<<"Addr of p = "<<p <<endl;
cout<<"Value of p = "<<*p <<endl;
free(p);
cout<<"After free(p)"<<endl;
r = (int*)malloc(sizeof(int));
cout<<"Addr of r = "<<r <<endl;
cout<<"Value of r = "<<*r <<endl;
*p = 100;
cout<<"Value of p = "<<*p <<endl;
cout<<"Value of r = "<<*r <<endl;
return 0;
}
Output:
Addr of p = 0x2f7630
Value of p = 3111728
free(p)
Addr of r = 0x2f7630
Value of r = 3111728
*p = 100
Value of p = 100
Value of r = 100
In the above code, p and r are dynamically created.
p is created and freed. r is created after p is freed.
On changing the value in p, r's value also gets changed. But I have already freed p's memory, then why on changing p's value, r's value also gets modified with the same value as that of p?
I have come to below conclusion. Please comment if I am right?
Explanation:
Pointer variables p and q are dynamically declared. Garbage values are stored initially. Pointer variable p is freed/deleted. Another pointer variable r is declared. The addresses allocated for r is same as that of p (p still points to the old address). Now if the value of p is modified, r’s value also gets modified with the same value as that of p (since both variables are pointing to the same address).
The operator free() only frees the memory address from the pointer variable and returns the address to the operating system for re-use, but the pointer variable (p in this case) still points to the same old address.

The free() function and the delete operator do not change the content of a pointer, as the pointer is passed by value.
However, the stuff in the location pointed to by the pointer may not be available after using free() or delete.
So if we have memory location 0x1000:
+-----------------+
0x1000 | |
| stuff in memory |
| |
+-----------------+
Lets assume that the pointer variable p contains 0x1000, or points to the memory location 0x1000.
After the call to free(p), the operating system is allowed to reuse the memory at 0x1000. It may not use it immediately or it could allocate the memory to another process, task or program.
However, the variable p was not altered, so it still points to the memory area. In this case, the variable p still has a value, but you should not dereference (use the memory) because you don't own the memory any more.

Your analysis is superficially close in some ways but not correct.
p and r are defined to be pointers in the first statement of main(). The are not dynamically created. They are defined as variables of automatic storage duration with main(), so they cease to exist when (actually if, in the case of your program) main() returns.
It is not p that is created and freed. malloc() dynamically allocates memory and, if it succeeds, returns a pointer which identifies that dynamically allocated memory (or a NULL pointer if the dynamic allocation fails) but does not initialise it. The value returned by malloc() is (after conversion into a pointer to int, which is required in C++) assigned to p.
Your code then prints the value of p.
(I have highlighted the next para in italic, since I'll refer back to it below).
The next statement prints the value of *p. Doing that means accessing the value at the address pointed to by p. However, that memory is uninitialised, so the result of accessing *p is undefined behaviour. With your implementation (compiler and library), at this time, that happens to result in a "garbage value", which is then printed. However, that behaviour is not guaranteed - it could actually do anything. Different implementations could give different results, such as abnormal termination (crash of your program), reformatting a hard drive, or [markedly less likely in practice] playing the song "Crash" by the Primitives through your computer's loud speakers.
After calling free(p) your code goes through a similar sequence with the pointer r.
The assignment *p = 100 has undefined behaviour, since p holds the value returned by the first malloc() call, but that has been passed to free(). So, as far as your program is concerned, that memory is no longer guaranteed to exist.
The first cout statement after that accesses *p. Since p no longer exists (having being passed to free()) that gives undefined behaviour.
The second cout statement after that accesses *r. That operation has undefined behaviour, for exactly the same reason I described in the italic paragraph above (for p, as it was then).
Note, however, that there have been five occurrences of undefined behaviour in your code. When even a single instance of undefined behaviour occurs, all bets are off for being able to predict behaviour of your program. With your implementation, the results happen to be printing p and r with the same value (since malloc() returns the same value 0x2f7630 in both cases), printing a garbage value in both cases, and then (after the statement *p = 100) printing the value of 100 when printing *p and *r.
However, none of those results are guaranteed. The reason for no guarantee is that the meaning of "undefined behaviour" in the C++ standard is that the standard describes no limits on what is permitted, so an implementation is free to do anything. Your analysis might be correct, for your particular implementation, at the particular time you compiled, linked, and ran your code. It might even be correct next week, but be incorrect a month from now after updating your standard library (e.g. applying bug fixes). It is probably incorrect for other implementations.
Lastly, a couple of minor points.
Firstly, your code is incomplete, and would not even compile in the form you have described it. In discussion above, I have assumed your code is actually preceded by
#include <iostream>
#include <cstdlib>
using namespace std;
Second, malloc() and free() are functions in the standard library. They are not operators.

Your analysis of what actually happened is correct; however, the program is not guaranteed to behave this way reliably. Every use of p after free(p) "provokes undefined behavior". (This also happens when you access *p and *r without having written anything there first.) Undefined behavior is worse than just producing an unpredictable result, and worse than just potentially causing the program to crash, because the compiler is explicitly allowed to assume that code that provokes undefined behavior will never execute. For instance, it would be valid for the compiler to treat your program as identical to
int main() {}
because there is no control flow path in your program that does not provoke undefined behavior, so it must be the case that the program will never run at all!

free() frees the heap memory to be re-used by OS. But the contents present in the memory address are not erased/removed.

Array values at negative one

Are there any repricussions having a value in an array stored at -1? could it affect the program or computer in a bad way? I am really curious, I'm new to programming and any clarification I can get really helps, thanks.

There's no way to store anything in an array object at index -1. A mere attempt to obtain a pointer to that non-existing element results in undefined behavior.
Negative indices (like -1) may appear in array-like contexts in situations when base pointer is not the array object itself, but rather an independent pointer pointing into the middle of another array object, as in
int a[10];
int *p = &a[5];
p[-1] = 42; // OK, sets `a[4]`
p[-2] = 5; // OK, sets `a[3]`
But any attempts to access non-existent elements before the beginning of the actual array result in undefined behavior
a[-1]; // undefined behavior
p[-6]; // undefined behavior

You see if you are trying to take element of array by pointing some value in brackets, basically you're specifying offset (multiplied by size of allocated type) from memory address. If you've allocated array in a typical way like int *a = new int[N], memory you're allowed to use is limited from address a until a + <size of memory allocated> (which in this case sizeof (int) * N), so by trying to get value with index -1 you are getting out of bounds of your array and it certainly will lead you to error or possible program crash.
There's of course a chance that your memory pointer is not the one at the beginning of some allocated sequence, like (considering previous example) int *b = a + 1, in this case you may (at least compiler allows that) take value of a[-1] and it would be valid, but since it's pretty hard to manage correctness of code like this I would still recommend against it.

Why is it allowed to cast a pointer to a reference?

Originally being the topic of this question, it emerged that the OP just overlooked the dereference. Meanwhile, this answer got me and some others thinking - why is it allowed to cast a pointer to a reference with a C-style cast or reinterpret_cast?
int main() {
char c = 'A';
char* pc = &c;
char& c1 = (char&)pc;
char& c2 = reinterpret_cast<char&>(pc);
}
The above code compiles without any warning or error (regarding the cast) on Visual Studio while GCC will only give you a warning, as shown here.
My first thought was that the pointer somehow automagically gets dereferenced (I work with MSVC normally, so I didn't get the warning GCC shows), and tried the following:
#include <iostream>
int main() {
char c = 'A';
char* pc = &c;
char& c1 = (char&)pc;
std::cout << *pc << "\n";
c1 = 'B';
std::cout << *pc << "\n";
}
With the very interesting output shown here. So it seems that you are accessing the pointed-to variable, but at the same time, you are not.
Ideas? Explanations? Standard quotes?

Well, that's the purpose of reinterpret_cast! As the name suggests, the purpose of that cast is to reinterpret a memory region as a value of another type. For this reason, using reinterpret_cast you can always cast an lvalue of one type to a reference of another type.
This is described in 5.2.10/10 of the language specification. It also says there that reinterpret_cast<T&>(x) is the same thing as *reinterpret_cast<T*>(&x).
The fact that you are casting a pointer in this case is totally and completely unimportant. No, the pointer does not get automatically dereferenced (taking into account the *reinterpret_cast<T*>(&x) interpretation, one might even say that the opposite is true: the address of that pointer is automatically taken). The pointer in this case serves as just "some variable that occupies some region in memory". The type of that variable makes no difference whatsoever. It can be a double, a pointer, an int or any other lvalue. The variable is simply treated as memory region that you reinterpret as another type.
As for the C-style cast - it just gets interpreted as reinterpret_cast in this context, so the above immediately applies to it.
In your second example you attached reference c to the memory occupied by pointer variable pc. When you did c = 'B', you forcefully wrote the value 'B' into that memory, thus completely destroying the original pointer value (by overwriting one byte of that value). Now the destroyed pointer points to some unpredictable location. Later you tried to dereference that destroyed pointer. What happens in such case is a matter of pure luck. The program might crash, since the pointer is generally non-defererencable. Or you might get lucky and make your pointer to point to some unpredictable yet valid location. In that case you program will output something. No one knows what it will output and there's no meaning in it whatsoever.
One can rewrite your second program into an equivalent program without references
int main(){
char* pc = new char('A');
char* c = (char *) &pc;
std::cout << *pc << "\n";
*c = 'B';
std::cout << *pc << "\n";
}
From the practical point of view, on a little-endian platform your code would overwrite the least-significant byte of the pointer. Such a modification will not make the pointer to point too far away from its original location. So, the code is more likely to print something instead of crashing. On a big-endian platform your code would destroy the most-significant byte of the pointer, thus throwing it wildly to point to a totally different location, thus making your program more likely to crash.

It took me a while to grok it, but I think I finally got it.
The C++ standard specifies that a cast reinterpret_cast<U&>(t) is equivalent to *reinterpret_cast<U*>(&t).
In our case, U is char, and t is char*.
Expanding those, we see that the following happens:
we take the address of the argument to the cast, yielding a value of type char**.
we reinterpret_cast this value to char*
we dereference the result, yielding a char lvalue.
reinterpret_cast allows you to cast from any pointer type to any other pointer type. And so, a cast from char** to char* is well-formed.

I'll try to explain this using my ingrained intuition about references and pointers rather than relying on the language of the standard.
C didn't have reference types, it only had values and pointer types (addresses) - since, physically in memory, we only have values and addresses.
In C++ we've added references to the syntax, but you can think of them as a kind of syntactic sugar - there is no special data structure or memory layout scheme for holding references.
Well, what "is" a reference from that perspective? Or rather, how would you "implement" a reference? With a pointer, of course. So whenever you see a reference in some code you can pretend it's really just a pointer that's been used in a special way: if int x; and int& y{x}; then we really have a int* y_ptr = &x; and if we say y = 123; we merely mean *(y_ptr) = 123;. This is not dissimilar from how, when we use C array subscripts (a[1] = 2;) what actually happens is that a is "decayed" to mean pointer to its first element, and then what gets executed is *(a + 1) = 2.
(Side note: Compilers don't actually always hold pointers behind every reference; for example, the compiler might use a register for the referred-to variable, and then a pointer can't point to it. But the metaphor is still pretty safe.)
Having accepted the "reference is really just a pointer in disguise" metaphor, it should now not be surprising that we can ignore this disguise with a reinterpret_cast<>().
PS - std::ref is also really just a pointer when you drill down into it.

Its allowed because C++ allows pretty much anything when you cast.
But as for the behavior:
pc is a 4 byte pointer
(char)pc tries to interpret the pointer as a byte, in particular the last of the four bytes
(char&)pc is the same, but returns a reference to that byte
When you first print pc, nothing has happened and you see the letter you stored
c = 'B' modifies the last byte of the 4 byte pointer, so it now points to something else
When you print again, you are now pointing to a different location which explains your result.
Since the last byte of the pointer is modified the new memory address is nearby, making it unlikely to be in a piece of memory your program isn't allowed to access. That's why you don't get a seg-fault. The actual value obtained is undefined, but is highly likely to be a zero, which explains the blank output when its interpreted as a char.

when you're casting, with a C-style cast or with a reinterpret_cast, you're basically telling the compiler to look the other way ("don't you mind, I know what I'm doing").
C++ allows you to tell the compiler to do that. That doesn't mean it's a good idea...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js