After reading a lot of questions about null pointers, I still have confusion about memory allocation in null pointer.
If I type following code-
int a=22;
int *p=&a;//now p is pointing towards a
std::cout<<*p;//outputs 22
std::cout<<p;//outputs memory address of object a;
int *n=nullptr;// pointer n is initialized to null
std::cout<<n;
After compiling this code pointer n outputs literal constant 0, and if i try this,
std::cout<<*n;
this line of code is compiled by compiler but it is unable to execute, what is wrong in this code, it should print memory location of this pointer.
std::cout<<p;
does this output location of pointer in memory or location of an object in memory.
Since many or all of these answers are already answered in previous questions but somehow i am unable to understand because I am beginner in C++.
A nullptr pointer doesn't point to anything. It doesn't contain a valid address but a "non-address". It's conceptual, you shouldn't worry about the value it has.
The only thing that matters is that you can't dereference a nullptr pointer, because this will cause undefined behavior, and that's why your program fails at runtime (std::cout<<*n)
std::cout<<p;
In general outputs value of variable p, what that value means depends on p's type. In your case type or p is pointer to int (int *) so value of it is address of int. As pointer itself is an lvalue you can get address of it, so if you want to see where your pointer n located in memory just output it's address:
std::cout << &n << std::endl;
As said on many other answers do not dereference null pointer, as it leads to UB. So again:
std::cout << n << std::endl; // value of pointer n, ie address, in your case 0
std::cout << &n << std::endl; // address of pointer n, will be not 0
std::cout << *n << std::endl; // undefined behavior, you try to dereference nullptr
If you want to see address of nullptr itself, you cannot - it is a constant, not lvalue, and does not have address:
std::cout << &nullptr << std::endl; // compile error, nullptr is not lvalue
When you compile:
std::cout << *n;
The compiler will typically build some code like this:
mov rax, qword ptr [rbp - 0x40]
mov esi, dword ptr [rax]
call cout
The first line looks up the address of the pointer (rdp - 0x40) and stores it in the CPU register RAX. In this case the address of the nullptr is 0. RAX now contains 0.
The second line tries to read memory from the location (0) specified by RAX. On a typical computer setup memory location 0 is protected (it isn't a valid data memory location). This causes an invalid operation and you get a crash*.
It never reaches the third line.
*However, this isn't necessary true in all circumstances: on a micro-controller where you don't have an operating system in place, this might successfully dereference and read the value of memory location 0. However *nullptr wouldn't be a good way of expressing this intention! See http://c-faq.com/null/machexamp.html for more discussion. If you want the full detail on nullptr: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2431.pdf
nullptr is a special value, selected in such a way that no valid pointer could get this value. On many systems, the value is equal to numeric zero, but it is not a good idea to think of nullptr in terms of its numeric value.
To understand the meaning of nullptr you should first consider the meaning of a pointer: it is a variable that refers to something else, which may also refer to nothing at all. You need to be able to distinguish the state "my pointer refers to something" from the state "my pointer refers to nothing at all". This is where nullptr comes in: if a pointer is equal to nullptr, you know that it references "nothing at all".
Note: dereferencing nullptr (i.e. applying the unary asterisk operator to it) is undefined behavior. It may crash, or it may print some value, but it would be a "garbage value".
Dereferencing a null pointer is undefined behavior, so anything at all can happen. But, a null pointer still has to have a place in memory. So what you're seeing is just that. Typically compilers implement a null pointer as its value being all 0's.
Just because it's a gold quote, here's what Scott Meyer's has to say about UD behavior, from his book Effective C++ 2nd Ed.
"Nevertheless, there is something very troubling here. Your program's
behavior is undefined -- you have no way of knowing what will
happen... That means compilers may generate code to do whatever they
like: reformat your disk, send suggestive email to your boss, fax
source code to your competitors, whatever."
It is undefined behavior to dereference a null pointer. Any behavior that the compiler chooses or unintentionally happens is valid.
Changing the program in other places may also change the behavior of this code line.
I suspect that your confusion revolves around the fact that there are two memory locations involved here.
In this code:
int *n=nullptr;// pointer n is initialized to null
There is one variable, n, and that variable occupies space in memory. You can take the address of n and prove this to yourself:
std::cout << &n << "\n";
And you'll see that the address-of n is something legitimate. As in, not NULL.
n happens to be of type pointer-to-int, and the thing it points to is NULL. That means it doesn't point to anything at all; it's in a state where you can't dereference it.
But "dereference it" is exactly what you are doing here:
std::cout<<*n;
n is valid, but the thing it points to is not. That is why your program is ill-formed.
Related
I am fairly new to C++, so excuse if this is quite basic.
I am trying to understand the value printed after I increment my pointer in the following piece of code
int main()
{
int i = 5;
int* pointeri = &i;
cout << pointeri << "\n";
pointeri++;
i =7;
cout << *pointeri << "\n";
}
When I deference the pointer, it prints a random Integer. I am trying to understand, what is really happening here, why isn't the pointer pointing at NULL and does the random integer have a significance ?
The C++ language has a concept of Undefined Behavior. It means that it is possible to write code that does not constitute a valid program, and the compiler won't stop or even warn you. What such code does when executed is unknown.
Your program is a typical example. After the line int* pointeri = &i;, the pointer is pointing to the value i. After pointeri++ it is pointing to the memory location after the value i. What is stored at that location is unknown and the behavior of such code is undefined.
Needless to say, great care should be taken when coding in C++ in order to stay in the realm of defined behavior, in order to have meaningful and predictable results when running the program.
why isn't the pointer pointing at NULL
Because you haven't assigned or initialised the pointer to null.
and does the random integer have a significance ?
No.
Why is there a value printed ...
Because the behaviour of the program is undefined.
As you know, a "pointer" is simply an integer variable whose value is understood to be a memory address. If that value is zero, by convention we call it NULL and understand this to mean that "it doesn't point at anything." Otherwise, the value is presumed to be valid.
If you "increment" a pointer, its value is non-zero and therefore presumed to be valid. If you dereference it, you will either get "unpredictable data" or a memory-addressing fault.
Is the following code guaranteed to be working?
int* arr = new int[2];
std::cout << &arr[0x100];
Is this considered good practice or would it be cleaner to add an offset the regular way?
Edit: By "working" I mean that it should print the pointer to the theoretical member at 0x100. Basically if this is equivalent to "std::cout << ((unsigned int)arr + 0x100*sizeof(int));".
With my compiler (Cygwin GCC) getting the address at this value is the same as doing pointer arithmetic, although each is undefined behavior (UB). As mentioned in the comment below by Jens, at http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html, I found the following helpful.
It is also worth pointing out that both Clang and GCC nail down a few behaviors that the C standard leaves undefined. The things I'll describe are both undefined according to the standard and treated as undefined behavior by both of these compilers in their default modes.
Dereferences of Wild Pointers and Out of Bounds Array Accesses: Dereferencing random pointers (like NULL, pointers to free'd memory, etc) and the special case of accessing an array out of bounds is a common bug in C applications which hopefully needs no explanation. To eliminate this source of undefined behavior, array accesses would have to each be range checked, and the ABI would have to be changed to make sure that range information follows around any pointers that could be subject to pointer arithmetic. This would have an extremely high cost for many numerical and other applications, as well as breaking binary compatibility with every existing C library.
The pointer arithmetic is also UB. So you have an address, but you cannot dereference the pointer to it. So there is really no use in having this address at all. Just getting the address is UB and should not be used in code.
See this answer for out-of-bounds pointers:
Why is out-of-bounds pointer arithmetic undefined behaviour?
My sample code:
int* arr = new int[2];
std::cout << arr << std::endl;
std::cout << &(arr[0])<< std::endl;
std::cout << &(arr[1])<< std::endl;
std::cout << &arr[0x100] << std::endl; // UB, cannot be dereferenced
std::cout << &arr[256] << std::endl; // cannot be dereferenced, so no use in having it
std::cout << arr + 0x100; // UB here too, no use in having this address
Sample Output:
0x60003ae50
0x60003ae50
0x60003ae54
0x60003b250
0x60003b250
0x60003b250
In the first line you allocate 2 integer values. In the second line, you access memory outside this range. This is not allowed at all.
Edit: Some interesting comments here. But I cannot understand, why it should be needed to cite the standard for such a simple answer and why is pointer arithmetic discussed here so much?
From a logical view, std::cout << &arr[0x100] consists of 3 steps:
1. access the non existing member of an array
2. get the address of the non existing member
3. use the address of the non existing member
If the first step is invalid, aren't all the following undefined?
Let's consider below program:
int main ()
{
int *p, *r;
p = (int*)malloc(sizeof(int));
cout<<"Addr of p = "<<p <<endl;
cout<<"Value of p = "<<*p <<endl;
free(p);
cout<<"After free(p)"<<endl;
r = (int*)malloc(sizeof(int));
cout<<"Addr of r = "<<r <<endl;
cout<<"Value of r = "<<*r <<endl;
*p = 100;
cout<<"Value of p = "<<*p <<endl;
cout<<"Value of r = "<<*r <<endl;
return 0;
}
Output:
Addr of p = 0x2f7630
Value of p = 3111728
free(p)
Addr of r = 0x2f7630
Value of r = 3111728
*p = 100
Value of p = 100
Value of r = 100
In the above code, p and r are dynamically created.
p is created and freed. r is created after p is freed.
On changing the value in p, r's value also gets changed. But I have already freed p's memory, then why on changing p's value, r's value also gets modified with the same value as that of p?
I have come to below conclusion. Please comment if I am right?
Explanation:
Pointer variables p and q are dynamically declared. Garbage values are stored initially. Pointer variable p is freed/deleted. Another pointer variable r is declared. The addresses allocated for r is same as that of p (p still points to the old address). Now if the value of p is modified, r’s value also gets modified with the same value as that of p (since both variables are pointing to the same address).
The operator free() only frees the memory address from the pointer variable and returns the address to the operating system for re-use, but the pointer variable (p in this case) still points to the same old address.
The free() function and the delete operator do not change the content of a pointer, as the pointer is passed by value.
However, the stuff in the location pointed to by the pointer may not be available after using free() or delete.
So if we have memory location 0x1000:
+-----------------+
0x1000 | |
| stuff in memory |
| |
+-----------------+
Lets assume that the pointer variable p contains 0x1000, or points to the memory location 0x1000.
After the call to free(p), the operating system is allowed to reuse the memory at 0x1000. It may not use it immediately or it could allocate the memory to another process, task or program.
However, the variable p was not altered, so it still points to the memory area. In this case, the variable p still has a value, but you should not dereference (use the memory) because you don't own the memory any more.
Your analysis is superficially close in some ways but not correct.
p and r are defined to be pointers in the first statement of main(). The are not dynamically created. They are defined as variables of automatic storage duration with main(), so they cease to exist when (actually if, in the case of your program) main() returns.
It is not p that is created and freed. malloc() dynamically allocates memory and, if it succeeds, returns a pointer which identifies that dynamically allocated memory (or a NULL pointer if the dynamic allocation fails) but does not initialise it. The value returned by malloc() is (after conversion into a pointer to int, which is required in C++) assigned to p.
Your code then prints the value of p.
(I have highlighted the next para in italic, since I'll refer back to it below).
The next statement prints the value of *p. Doing that means accessing the value at the address pointed to by p. However, that memory is uninitialised, so the result of accessing *p is undefined behaviour. With your implementation (compiler and library), at this time, that happens to result in a "garbage value", which is then printed. However, that behaviour is not guaranteed - it could actually do anything. Different implementations could give different results, such as abnormal termination (crash of your program), reformatting a hard drive, or [markedly less likely in practice] playing the song "Crash" by the Primitives through your computer's loud speakers.
After calling free(p) your code goes through a similar sequence with the pointer r.
The assignment *p = 100 has undefined behaviour, since p holds the value returned by the first malloc() call, but that has been passed to free(). So, as far as your program is concerned, that memory is no longer guaranteed to exist.
The first cout statement after that accesses *p. Since p no longer exists (having being passed to free()) that gives undefined behaviour.
The second cout statement after that accesses *r. That operation has undefined behaviour, for exactly the same reason I described in the italic paragraph above (for p, as it was then).
Note, however, that there have been five occurrences of undefined behaviour in your code. When even a single instance of undefined behaviour occurs, all bets are off for being able to predict behaviour of your program. With your implementation, the results happen to be printing p and r with the same value (since malloc() returns the same value 0x2f7630 in both cases), printing a garbage value in both cases, and then (after the statement *p = 100) printing the value of 100 when printing *p and *r.
However, none of those results are guaranteed. The reason for no guarantee is that the meaning of "undefined behaviour" in the C++ standard is that the standard describes no limits on what is permitted, so an implementation is free to do anything. Your analysis might be correct, for your particular implementation, at the particular time you compiled, linked, and ran your code. It might even be correct next week, but be incorrect a month from now after updating your standard library (e.g. applying bug fixes). It is probably incorrect for other implementations.
Lastly, a couple of minor points.
Firstly, your code is incomplete, and would not even compile in the form you have described it. In discussion above, I have assumed your code is actually preceded by
#include <iostream>
#include <cstdlib>
using namespace std;
Second, malloc() and free() are functions in the standard library. They are not operators.
Your analysis of what actually happened is correct; however, the program is not guaranteed to behave this way reliably. Every use of p after free(p) "provokes undefined behavior". (This also happens when you access *p and *r without having written anything there first.) Undefined behavior is worse than just producing an unpredictable result, and worse than just potentially causing the program to crash, because the compiler is explicitly allowed to assume that code that provokes undefined behavior will never execute. For instance, it would be valid for the compiler to treat your program as identical to
int main() {}
because there is no control flow path in your program that does not provoke undefined behavior, so it must be the case that the program will never run at all!
free() frees the heap memory to be re-used by OS. But the contents present in the memory address are not erased/removed.
I have the following program which defines 2 integers and a pointer to an integer.
#include <stdio.h>
int main() {
int bla=999;
int a=42;
int* pa=&a;
printf("%d \n", *pa);
printf("%d \n", pa);
pa++;
//*pa=666; //runs (no error), but the console is showing nothing at all
printf("%d \n", *pa);
printf("%d \n", pa);
pa++;
//*pa=666; //runs and changes the value of *pa to 666;
printf("%d \n", *pa);
printf("%d \n", pa);
}
The output is:
42
2686740
2686744
2686744 //this value is strange, I think
999
2686748
The adresses are making sense to me, but the fourth value is strange, because it is exactly the adress of the int. Can somebody explain that behaviour ?
When I comment *pa=666 (the first apperance) in, the console shows nothing, so here is some sort of error, but the compiler does not show an error. Maybe this is because of the size of int on my system, I have a 64bit-windows-os, so maybe the int is 64 bit and not 32 ? And because of that the *pa-value is 999 after the second increment and not the first ?
I am sure, there are a lot of C-programmers out there who can explain what is going on :)
int* pa=&a;
pa is pointer to an integer and accessing *pa is defined.
Once you increment your pointer, then the pointer is pointing to some memory(after p) which is not allocated by you or not known to you so dereferencing it leads to undefined beahvior.
pa++;
*pa is UB
Edit:
Use proper format specifier to print the pointer value %p as pointed out by #haccks
The output is not strange, it is to be expected: You have three variables in main(), all of which are stored on the stack, and which happen to be right one after the other. One of these variables is the pointer itself. So, when you dereference the pointer in the third line, you get the current value of the pointer itself.
Nevertheless, this output is not predictable, it is undefined behavior: You are only allowed to use pointer arithmetic to access data within a single memory object, and in your case, the memory object is just a single int. Consequently, accessing *pa after the first pa++ is illegal, and the program is allowed to do anything from that point on.
More specifically, there is no guarantee which other variables follow a certain variable, in which order they follow, or if there is accessable memory at all. Even reading *pa after the first pa++ is allowed to crash your program. As you have witnessed, you will not experience a crash in many cases (which would be easy to debug), yet the code is still deeply broken.
You are using wrong format specifier to print address. This will invoke undefined behavior and once the UB is invoked, all bets are off. Use %p instead.
printf("%p \n", (void *)pa);
Another problem is that after execution of pa++;, you are accessing unallocated memory and another reason for UB.
You're not smarter than your compiler.
As said by another answer what you do is Undefined Behaviour. With pa you are just doing non-sense, it does'nt correspond to any reasonlable algorithm for a defined goal: it's non sense.
However I will propose you a possible scenario of what's happenning. Though much of it could be false because compilers do optimizations.
int bla=999;
int a=42;
int* pa=&a;
These variables are allocated on the stack.
When writing pa = &a you say "I want pointer pa to be equal to the address of a".
Probably the compiler could have allocated the memory in the order or declaration, which would give something like:
bla would have address 0x00008880
a would have address 0x00008884
pa would have address 0x00008888
when you do pa++ you're telling: move my pointer of int to the next position of int in memory.
As ints are 32 bits, you're doing pa = pa + 4bytes i.e. pa = 0x00008888
Notice that, by chance !,you're probably pointing to the address of the pa pointer.
So now the pointer pa contains its own address... which is pretty esoteric and could be called ouroboros.
Then you're asking again pa++... so pa = pa + 4 bytes i.e. pa = 0x0000888c
So now you are probably accessing an unknown memory zone. It could be an access violation. It's undefined behaviour if you ever want to read or write.
When you first assigned the pointer it pointed to 2686740. The pointer is an integer pointer and integers use 4 bytes (usually, on your machine it used 4). That means pa++ is going to increase the value to be 4 more which is 2686744. Doing it again resulted in 2686748
If you were to look at the resulting assembly code the order of your local variables would be switched around. The ordering was a, pa, bla when the code ran. Because you don't have explicit control over this ordering the output of your printing is considered to be undefined
After the first time you did pa++ the pointer pointed at itself, that is why you got the "strange value"
As mentioned by many of the other answers, this is not good use of pointers and should be avoided. You don't have control over what the pointer is pointing to in this situation. A much better use of pointer arithmetic would be pointing at the beginning of an array and then doing pa++ to point to the next element in the array. The only problem you could experience then would be incrementing past the last element of the array
Are you trying to increment the value in a through the pointer *pa?
If so, do: (*pa)++. The brackets are crucial as they mean "take the value of the pointer", then use that address to increment whatever it is referencing.
This is completely different to *pa++ which simply returns the value pointed to by *pa and then increments the pointer (not the thing that it is referencing).
One of the little traps of C syntax. K&R has a few pages devoted to this, I suggest you try some of the examples there.
Originally being the topic of this question, it emerged that the OP just overlooked the dereference. Meanwhile, this answer got me and some others thinking - why is it allowed to cast a pointer to a reference with a C-style cast or reinterpret_cast?
int main() {
char c = 'A';
char* pc = &c;
char& c1 = (char&)pc;
char& c2 = reinterpret_cast<char&>(pc);
}
The above code compiles without any warning or error (regarding the cast) on Visual Studio while GCC will only give you a warning, as shown here.
My first thought was that the pointer somehow automagically gets dereferenced (I work with MSVC normally, so I didn't get the warning GCC shows), and tried the following:
#include <iostream>
int main() {
char c = 'A';
char* pc = &c;
char& c1 = (char&)pc;
std::cout << *pc << "\n";
c1 = 'B';
std::cout << *pc << "\n";
}
With the very interesting output shown here. So it seems that you are accessing the pointed-to variable, but at the same time, you are not.
Ideas? Explanations? Standard quotes?
Well, that's the purpose of reinterpret_cast! As the name suggests, the purpose of that cast is to reinterpret a memory region as a value of another type. For this reason, using reinterpret_cast you can always cast an lvalue of one type to a reference of another type.
This is described in 5.2.10/10 of the language specification. It also says there that reinterpret_cast<T&>(x) is the same thing as *reinterpret_cast<T*>(&x).
The fact that you are casting a pointer in this case is totally and completely unimportant. No, the pointer does not get automatically dereferenced (taking into account the *reinterpret_cast<T*>(&x) interpretation, one might even say that the opposite is true: the address of that pointer is automatically taken). The pointer in this case serves as just "some variable that occupies some region in memory". The type of that variable makes no difference whatsoever. It can be a double, a pointer, an int or any other lvalue. The variable is simply treated as memory region that you reinterpret as another type.
As for the C-style cast - it just gets interpreted as reinterpret_cast in this context, so the above immediately applies to it.
In your second example you attached reference c to the memory occupied by pointer variable pc. When you did c = 'B', you forcefully wrote the value 'B' into that memory, thus completely destroying the original pointer value (by overwriting one byte of that value). Now the destroyed pointer points to some unpredictable location. Later you tried to dereference that destroyed pointer. What happens in such case is a matter of pure luck. The program might crash, since the pointer is generally non-defererencable. Or you might get lucky and make your pointer to point to some unpredictable yet valid location. In that case you program will output something. No one knows what it will output and there's no meaning in it whatsoever.
One can rewrite your second program into an equivalent program without references
int main(){
char* pc = new char('A');
char* c = (char *) &pc;
std::cout << *pc << "\n";
*c = 'B';
std::cout << *pc << "\n";
}
From the practical point of view, on a little-endian platform your code would overwrite the least-significant byte of the pointer. Such a modification will not make the pointer to point too far away from its original location. So, the code is more likely to print something instead of crashing. On a big-endian platform your code would destroy the most-significant byte of the pointer, thus throwing it wildly to point to a totally different location, thus making your program more likely to crash.
It took me a while to grok it, but I think I finally got it.
The C++ standard specifies that a cast reinterpret_cast<U&>(t) is equivalent to *reinterpret_cast<U*>(&t).
In our case, U is char, and t is char*.
Expanding those, we see that the following happens:
we take the address of the argument to the cast, yielding a value of type char**.
we reinterpret_cast this value to char*
we dereference the result, yielding a char lvalue.
reinterpret_cast allows you to cast from any pointer type to any other pointer type. And so, a cast from char** to char* is well-formed.
I'll try to explain this using my ingrained intuition about references and pointers rather than relying on the language of the standard.
C didn't have reference types, it only had values and pointer types (addresses) - since, physically in memory, we only have values and addresses.
In C++ we've added references to the syntax, but you can think of them as a kind of syntactic sugar - there is no special data structure or memory layout scheme for holding references.
Well, what "is" a reference from that perspective? Or rather, how would you "implement" a reference? With a pointer, of course. So whenever you see a reference in some code you can pretend it's really just a pointer that's been used in a special way: if int x; and int& y{x}; then we really have a int* y_ptr = &x; and if we say y = 123; we merely mean *(y_ptr) = 123;. This is not dissimilar from how, when we use C array subscripts (a[1] = 2;) what actually happens is that a is "decayed" to mean pointer to its first element, and then what gets executed is *(a + 1) = 2.
(Side note: Compilers don't actually always hold pointers behind every reference; for example, the compiler might use a register for the referred-to variable, and then a pointer can't point to it. But the metaphor is still pretty safe.)
Having accepted the "reference is really just a pointer in disguise" metaphor, it should now not be surprising that we can ignore this disguise with a reinterpret_cast<>().
PS - std::ref is also really just a pointer when you drill down into it.
Its allowed because C++ allows pretty much anything when you cast.
But as for the behavior:
pc is a 4 byte pointer
(char)pc tries to interpret the pointer as a byte, in particular the last of the four bytes
(char&)pc is the same, but returns a reference to that byte
When you first print pc, nothing has happened and you see the letter you stored
c = 'B' modifies the last byte of the 4 byte pointer, so it now points to something else
When you print again, you are now pointing to a different location which explains your result.
Since the last byte of the pointer is modified the new memory address is nearby, making it unlikely to be in a piece of memory your program isn't allowed to access. That's why you don't get a seg-fault. The actual value obtained is undefined, but is highly likely to be a zero, which explains the blank output when its interpreted as a char.
when you're casting, with a C-style cast or with a reinterpret_cast, you're basically telling the compiler to look the other way ("don't you mind, I know what I'm doing").
C++ allows you to tell the compiler to do that. That doesn't mean it's a good idea...