Pointer to deallocated variable changes address - c++

This code:
#include <iostream>
using namespace std;
int* fun()
{
int a = 5;
int* pointerA = &a;
cout << pointerA << endl;
return pointerA;
}
int main()
{
int* p = fun();
cout << p << endl;
return 0;
}
Prints the following:
0x[some address]
0
I understand that the variable a is deallocated when the function fun() returns, but why does cout << p << endl; return 0? Shouldn't it still point to the same address in memory, even though variable is technically not there anymore? Is this a compiler feature or undefined behavior?
repro case
EDIT: I found the culprit. I am using CodeBlocks, and in this project's build options, there is a flag "optimize even more (for speed) [-O2]". If it is checked, I get 0, and if I uncheck the flag, I get the same address 0x[some address], which is expected behavior.
I apologize for not mentioning my IDE.

Accessing the return value of fun has implementation-defined behavior, as it returns an invalid pointer value (see the quote below, why). In particular platforms, it may even generate a runtime fault. So, p's value is implementation-defined as well. Most likely, it will became invalid pointer value, so accessing it is implementation-defined.
basic.std/4:
When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of that region of storage become invalid pointer values. Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.

It is probably a compiler feature. In this case it is very easy to see that the pointer returned by fun will be invalid and thus further usage of the pointer will result in undefined behaviour. If you try a different compiler it might be different. E.g. for me in Visual Studio 2012 it does return the actual address instead of 0.

Related

Constexpr version of function gives different result when called the same way

main.cpp:
#include <iostream>
static constexpr bool f1()
{
auto p = new int(1);
delete p;
auto q = new int(2);
delete q;
return p == q;
}
static bool f2() // Same body as f1
{
auto p = new int(1);
delete p;
auto q = new int(2);
delete q;
return p == q;
}
int main()
{
constexpr bool i1 = f1();
std::cout << i1 << std::endl;
auto i2 = f2();
std::cout << i2 << std::endl;
}
Compilation command line:
clang++ -std=c++20 -pedantic-errors main.cpp -o prog
Output from running prog (this is what I got, but may be different for you):
0
1
How is this possible? How is it even possible that I am allowed to define f1 that way given that it has unspecified behaviour?
Deleting a pointer invalidates it.
Any use of an invalid pointer value has implementation-defined behavior (except for indirecting through and passing to a deallocation function, which have undefined behaviour; neither is done in the example).
In the example that behaviour happened to be different in two slightly different cases.
How is this possible?
The compiler produced a program that outputs "0\n1". It is possible.
If you want to know if this conforms to the standard: Yes.
Whether this is intentional by the implementation... I suspect not directly, but rather by coincidence. My entirely hypothetical guess about the implementation:
There may be a piece of logic that sets invalid pointers to null. This has the useful side-effect that programs that have "use after free" bug (undefined behaviour) are less likely to read/write arbitrary memory (heap smashing) and instead avoid that due to null pointer check, or outright crash due to indirecting through null pointer. This potentially would reduce the severity of security vulnerabilities caused by such bug. As a side-effect, two unspecified values that would happen to be null pointers would also happen to compare equal.
But in constexpr case there may be another piece of logic which analyses that the pointers never point to same object and therefore are never equal, and constant-fold the return value as false before the null "protection" occurs.
Standard quote:
[basic.stc]
When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of that region of storage become invalid pointer values. Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.31
Some implementations might define that copying an invalid pointer value causes a system-generated runtime fault.
тое

Why is there a value printed and not NULL/0 after incrementing a pointer in C++?

I am fairly new to C++, so excuse if this is quite basic.
I am trying to understand the value printed after I increment my pointer in the following piece of code
int main()
{
int i = 5;
int* pointeri = &i;
cout << pointeri << "\n";
pointeri++;
i =7;
cout << *pointeri << "\n";
}
When I deference the pointer, it prints a random Integer. I am trying to understand, what is really happening here, why isn't the pointer pointing at NULL and does the random integer have a significance ?
The C++ language has a concept of Undefined Behavior. It means that it is possible to write code that does not constitute a valid program, and the compiler won't stop or even warn you. What such code does when executed is unknown.
Your program is a typical example. After the line int* pointeri = &i;, the pointer is pointing to the value i. After pointeri++ it is pointing to the memory location after the value i. What is stored at that location is unknown and the behavior of such code is undefined.
Needless to say, great care should be taken when coding in C++ in order to stay in the realm of defined behavior, in order to have meaningful and predictable results when running the program.
why isn't the pointer pointing at NULL
Because you haven't assigned or initialised the pointer to null.
and does the random integer have a significance ?
No.
Why is there a value printed ...
Because the behaviour of the program is undefined.
As you know, a "pointer" is simply an integer variable whose value is understood to be a memory address. If that value is zero, by convention we call it NULL and understand this to mean that "it doesn't point at anything." Otherwise, the value is presumed to be valid.
If you "increment" a pointer, its value is non-zero and therefore presumed to be valid. If you dereference it, you will either get "unpredictable data" or a memory-addressing fault.

assembly code for a reference [duplicate]

Is there any way to find the address of a reference?
Making it more specific: The address of the variable itself and not the address of the variable it is initialized with.
References don't have their own addresses. Although references may be implemented as pointers, there is no need or guarantee of this.
The C++ FAQ says it best:
Unlike a pointer, once a reference is
bound to an object, it can not be
"reseated" to another object. The
reference itself isn't an object (it
has no identity; taking the address of
a reference gives you the address of
the referent; remember: the reference
is its referent).
Please also see my answer here for a comprehensive list of how references differ from pointers.
The reference is its referent
NO. There is no way to get the address of a reference.
That is because a reference is not an object, it is an alias (this means it is another name for an object).
int x = 5;
int& y = x;
std::cout << &x << " : " << &y << "\n";
This will print out the same address.
This is because 'y' is just another name (an alias) for the object 'x'.
The ISO standard says it best:
There shall be no references to references, no arrays of references, and no pointers to references.
I don't like the logic a lot of people are using here, that you can't do it because the reference isn't "guaranteed to be just a pointer somewhere anyway." Just as int x may be only a processor register with no address, but magically becomes a memory location when & x is used, it still may be possible for the compiler to allow what you want.
In the past, many compilers did allow exactly what you're asking for, eg
int x, y;
int &r = x;
&r = &y; // use address as an lvalue; assign a new referent
I just checked and GCC will compile it, but with a strongly worded warning, and the resulting program is broken.
No.
As Bjarne Stroustrup says in TC++PL, a reference can be thought of as just another name for an existing entity (object or function). While this is not always the most precise description of the underlying low-level mechanism that implements references, it is a very good description of the concept the references are intended to implement at the language level. Not surprisingly, the language provides no means to obtain the address of reference itself.
At language level reference is not guaranteed to occupy a place in storage, and therefore in general case it has no address.
Just use the '&' operator.
e.g :
int x = 3;
int &y = x;
cout<<&y<<endl;
This will return the address of x since y is nothing more than the address of x.
From another instance of this same question: $8.3.2/3 - "It is unspecified whether or not a reference requires storage (3.7).".
So the C++ standard allows the compiler/runtime implementor to choose whether or not a reference lives in a separate memory location. However note that if it does live in a separate memory location, you can't find its address in a standard-compliant manner. So don't do it.
If you take an address of a reference, by definition in the C++ standard, you will get the address of what it refers to, rather than the address of the reference, if in fact that reference even exists as a separate entity in the runtime environment or not.
Not reliably, as references don't have to have a unique location in addressable memory.
Not by itself. If you want its "address", shove it in a struct or class. Even then that isn't necessarily guaranteed to get you within the vicinity of what you probably want to do which is using a pointer. If you want proof, the sizeof of a reference is equal to the referent type. Try it with char & and see.
It is possible, but not strictly using C++. Since the reference is passed as a parameter of a function, its value will be stored on the stack or in a register. This is hardware architecture dependent. Access to these values will require inline assembly. Consult the reference manual for the processor you are using to determine stack behavior and register addresses. Corrupting the stack or registers can very easily cause BSOD, data loss, or even permanent damage to your system. Proceed with extreme caution.
If you implement a reference as a member of a struct, you then can get its address:
struct TestRef{
int& r;
int i;
TestRef(int& ref): r(ref){
}
};
The reference indeed a pointer (in my case using Xcode compiler) and you can update it's value to re-assign the reference to a new variable.
To do so we need to find out the address of the reference and trick it value to address of other variable
Now the address of the reference TestRef.r is the address of TestRef object.Because r is the first member of TestRef.
You can re-assign the reference by updating the value store in the memory of TestRef.r.
This code below shows that you can get address of reference and you and re-assign a reference to a difference variable. Note: my OS is X64 OS (I use Xcode MacBook Pro 2015, MacOs 10.15.1).
#include <iostream>
using namespace std;
struct TestRef{
int& r;
int i;
TestRef(int& ref): r(ref){}
};
int main(int argc, const char * argv[]) {
int i = 10;
int j = 11;
TestRef r(i); // r.r is reference to i
cout << r.r << " " << i << " " << j << endl; // Output: 10 10 11
int64_t* p = (int64_t*)&r; // int32_t in 32 bit OS;
// Note:
// p is the address of TestRef r and also the address of the reference r.r
// *p is the address of i variable
//
// Difficult to understand? r.r indeed a pointer to i variable
// *p will return the address inside the memory of r.r
// that is the address of i variable
// this statement is true: *p == &i
// ------>
// now we change the value of *p to the address of j
// then r.r will be the reference of j instead the reference of i
*p = (int64_t)&j; // int32_t in 32 bit OS;
cout << r.r << " " << i << " " << j << endl; // Output: 11 10 11
return 0;
}
So in fact you can work around to re-assign a reference, like a hacker.

Returning a reference and returning a value

I am not quite sure that I have understood why there is a problem when we return the reference of a local random variable. So let's say that we have this example.
int *myFunc() {
int phantom = 4;
return &phantom;
}
Then the usual argument is that when the function is used, the memory of the variable phantom is no longer available after the execution of the code line int phantom = 4; so it cannot be returned (at least this is what I have understood so far). On the other hand, for the function,
int myFunc() {
int phantom = 4;
return phantom;
}
the value of the integer variable phantom will return. (I see the returning of the value as the dereferencing of an underlying pointer for the variable phantom).
What do I miss here?? Why in the first case there is a compilation error and in the second case everything works??
The first doesn't return a reference, it returns a pointer. A pointer to a local variable who will go out of scope once the function ends, leaving you with a stray pointer to a variable that doesn't exist anymore. That's why you get a compiler warning (usually not an actual error).
The second code copies the value. The local variable inside the function will never need to be referenced or used once the return statement finished.
You dont miss much. Only that in the first case there will be no compiler error.
[...] the memory of the variable phantom is no longer available after the execution of the code line int phantom = 4; so it cannot be returned
No, it can be returned and compilers may issue a warning for that but afaik no error. However, you should not!!
Btw, the memory is available, but it is undefined behaviour to access it after the function returned (not after the line int phantom = 4;).
In the second case:
I see the returning of the value as the dereferencing of an underlying pointer for the variable phantom
You are thinking too complicated here. Returning a value from a function may be implemented by using a pointer, but thats an implementation detail. The only thing you have to care about here is that what is returned is the value. So no problems in the second case.
I am not quite sure that I have understood why there is a problem when
we return the reference of a local random variable
Because the C++ standard says it is undefined behaviour if you use such a function, and you want to avoid undefined behaviour in your program. What your program does should be determined by the rules of the C++ language, not be random.
Note that you return a pointer and not a reference. But it's undefined behaviour in both cases.
Then the usual argument is that when the function is used, the memory
of the variable phantom is no longer available after the execution of
the code line int phantom = 4; so it cannot be returned (at least this
is what I have understood so far).
This is an implementation-centric point of view and may aid you in understanding the problem.
Nevertheless, it's important to distinguish between the observable behaviour of a program and the compiler's internal tricks to produce that behaviour. You don't even know whether any memory is occupied by a variable anyway. Considering the "as-if" rule and compiler optimisations, the entire function may have been removed, even if the behaviour was defined. That's just one example of what might really happen behind the scenes.
But again, it's undefined behaviour anyway, so anything may happen.
The question is, then, why does the C++ standard not define a behaviour for the case when you return a pointer like this and then try to access the pointee? The answer to that would be that it doesn't make sense. The object named by the local variable phantom ends its life when the function returns. So you would have a pointer to something which no longer exists, yet it's still an int*, and dereferencing a non-nullptr int* should yield an int. That's a contradiction, and the C++ standard just does not bother to resolve such a meaningless situation.
Note how this observation is based on C++ language rules, not on compiler-implementation issues.
Why in the first case there is a compilation error and in the second
case everything works??
It's certainly a warning and not an error, unless your compiler options are such that every warning is turned into an error. The compiler must not reject the code, because it's not ill-formed.
Still, your compiler tries to be helpful in the first case, because it wants to keep you from creating a program with undefined behaviour.
In the second case, the behaviour is not undefined. Returning by value means that a copy is made of the object which you want to return. The copy is made before the original is destroyed, and the caller then receives that copy. That's not meaningless and not a contradiction in any way, so it's safe and defined behaviour.
In the first case, returning by value does not help you, because although the pointer itself is safely copied, its contents are what eventually causes undefined behaviour.
The first case
int* myFunc()
{
int phantom = 4;
return &phantom; // you are returning the address of phantom
} // but phantom will not "exist" outside of myfunc
Doesn't work because variable phantom is a local variable and it lives only during the execution of myfunc. After that, it's gone.
You are returning an address of a variable, that will practically "not exist" any more.
The rule: never return pointers or references to local variables.
This is OK:
int myFunc()
{
int phantom = 4;
return phantom; // you are returning by value;
// it doesn't matter where phantom "lives"
}
int main()
{
int x = myFunc(); // the value returned by myFunc will be copied to x
}
Try returning the pointer and dereference it to get the value.
#include <iostream>
using namespace std;
int *myFunc()
{
int number = 4;
int *phantom = &number;
return phantom;
}
int main()
{
cout << myFunc() << endl; //0x....
cout << *myFunc() << endl; //4
return 0;
}

What is the result of the Reference Operator "&" on const variables?

I was asked how can a value of a const variable can be changed.
My my obvious answer was "pointers!" but I tried the next piece of code and I'm puzzled...
int main()
{
const int x = 5;
int *ptr = (int *)(&x); // "Cast away" the const-ness..
cout << "Value at " << ptr << ":"<< (*ptr) <<endl;
*ptr = 6;
cout << "Now the value of "<< ptr << " is: " << (*ptr) <<endl;
cout << "But the value of x is still " << x <<endl;
return 0;
}
And the output was:
Value at <some address> :5
Now the value of <same address> is: 6
But the value of x is still 5
Now, I'm not sure exactly what is returned from '&x' but it's definitely not the actual address of x, since the value at x wasn't changed!
But on the over hand, ptr did contain the value of x at the beginning!
So, what is it exactly?
EDIT compiled with VS2010
Your program invokes undefined behavior (writing to a const variable through a pointer is undefined behavior), so anything might happen. That being said here's the most likely explanation why you get the behavior you see on your particular implementation:
When you do &x, you do get the address of x. When you do *ptr = 6, you do write 6 to x's memory location. However when you do cout << x, you don't actually read from x's memory location because your compiler optimized the code by replacing x with 5 here. Since x is const the compiler is allowed to do that since there is no legal C++ program in which doing so would change the program's behavior.
Compiler caches x in a register, so the value in memory changes, but the last print-out is still the same. Check out generated assembly (compile with -s).
First of all, this behavior is undefined. That said, here's what's probably going on:
When you do this:
int *ptr = (int *)(&x);
The 5 is stored at some address at somewhere. That's why the pointer seems to work properly. (although casting away the const is still undefined behavior)
However, due to compiler optimizations x = 5 is just inlined as a literal in the final print statement. The compiler thinks it's safe because x is declared const.
cout << "But the value of x is still " << x <<endl;
That's why you print out the original value 5.
Maybe you are experiencing a side effect of code optimization, try to run the same code by disabling all optimization, or check at the asm generated code. I guess the compiler is reusing the value it has in some registry along the function since he bet on the const, so even if you are actually changing the value, the changed value is not propagated properly. The reasons for that as Keith noticed in the comemnts, is that you are palying with an undefined behavior.
What is returned from &x is a pointer to const int (i.e. int const*). Now pointers are inded implemented as holding the address, but pointers are not addresses, and your example shows quite nicely why: The type of the pointer, even though not present at run time, still plays an important role.
In your case, you are casting away the const, and thus lying to the compiler "this pointer points to a non-const int". However the compiler knows from the declaration that the value of x cannot change (it was declared const), and makes freely use of that fact (and the standard allows it: Your attempt to change it through a pointer to non-const int is undefined behaviour and therefore the compiler is allowed to do anything).