Trying to access pointer after resetting - c++

Debugging an application and experimenting a bit I came to a quite strange behaviour that can be reproduced with a following code:
#include <iostream>
#include <memory>
int main()
{
std::unique_ptr<int> p(new int);
*p = 10;
int& ref = *p;
int* direct_p = &(*p);
p.reset();
std::cout << *p << "\n"; // a) SIGSEGV
std::cout << ref << "\n"; // b) 0
std::cout << *direct_p << "\n"; // c) 0
return 0;
}
As I see it, all three variants have to cause undefined behaviour. Keeping that in the mind, I have these questions:
Why do ref and direct_p nevertheless point to zero? (not 10) (I mean, the mechanism of int's destruction seems strange to me, what's the point for compiler to rewrite on unused memory?)
Why don't b) and c) fire SIGSEGV?
Why does behaviour of a) differ from b) and c)?

p.reset(); is the equivalent of p.reset(nullptr);. So the unique_ptr's internal pointer is being set to null. Consequently doing *p ends up with the same result as trying to dereference a raw pointer that's null.
On the other hand, ref and direct_p are still left pointing at the memory formerly occupied by that int. Trying to use them to read that memory gets into Undefined Behavior territory, so in principle we can't conclude anything...
But in practice, there are a few things we can make educated assumptions and guesses about.
Since that memory location was valid shortly before, it's most likely still present (hasn't been unmapped from the address space, or other such implementation-specific things) when your program accesses it through ref and direct_p. C++ doesn't demand that the memory should become completely inaccessible. So in this case you simply end up "successfully" reading whatever happens to be at that memory location at that point during the program's execution.
As for why the value happens to be 0, well there are a couple possibilities. One is that you could be running in a debug mode which purposefully zeroes out deallocated memory. Another possibility is that by the time you access that memory through ref and direct_p something else has already re-used it for a different purpose which ended up leaving it with that value. Your std::cout << *p << "\n"; line could potentially have done that.

Undefined behaviour does not mean that code must trigger an abnormal termination. It means that anything can happen. Abnormal termination is only one possible result. Inconsistency of behaviour between different instances of undefined behaviour is another. Another possible (albeit rare in practice) is appearing to "work correctly" (however one defines "work correctly") until the next full moon, and then mysteriously behaving differently.
From a perspective of increasing average programmer skill and increasing software quality, electrocuting the programmer whenever they write code with undefined behaviour might be considered desirable.

As others have said undefined behavior means quite literally anything can happen. The code is unpredictable. But let me try to shed some light on question 'b' with an example.
SIGSEGV is attributed to a hardware fault reported by hardware with an MMU (Memory management unit). Your level of memory protection and therefore your level of SIGSEGV thrown can depend greatly on the MMU your hardware is using (source). If your un-allocated pointer happens to point to an ok address you will be able to read the memory their, if it points somewhere bad then your MMU will freak out and raise a SIGSEGV with your program.
Take for example though the MPC5200. This processor is quite old and has a somewhat rudimentary MMU. It can be quite difficult to get it to crash causing a segfault.
For example the following will not necessarily cause a SIGSEGV on the MPC5200:
int *p = NULL;
*p;
*p = 1;
printf("%d", *p); // This actually prints 1 which is insane
The only way i could get this to throw a segfault was with the following code:
int *p = NULL;
while (true) {
*(--p) = 1;
}
To wrap up, undefined behavior really does mean undefined.

Why nevertheless ref and direct_p point to zero? (not 10) (I mean, the
mechanism of int's destruction seems strange to me, what's the point
for compiler to rewrite on unused memory?)
It's not the compiler, it's C++/C libraries that changes memory. In your particular case, libc does something funny, as it reallocates heap data, when the value is freed:
Hardware watchpoint 3: *direct_p
_int_free (have_lock=0, p=0x614c10, av=0x7ffff7535b20 <main_arena>) at malloc.c:3925
3925 while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2)) != old2);
Why b) and c) don't fire SIGSEGV?
SIGSEGV is triggered by the kernel if an attempt to access memory outside of allocated address space is made. Normally, libc won't actually remove the pages after deallocating memory - it would be too expensive. You are writing to an address that is unmapped by libc - but kernel doesn't know about that. You can use a memory barrier library (e.g. ElectricFence, great for debugging) to have that happen.
Why behavior of a) differs from b) and c)?
You made value of p point to some memory, say 100. You then effectively created aliases for that memory location, so direct_p and ref will point to 100. Note, that they aren't variable references, they are memory references. So changes you make to p have no effect on them. You then deallocated p, it's value becomes 0 (i.e. it now points to a memory address 0). Attempting to read a value from memory address 0 guarantees a SIGSEGV. Reading values from memory address 100 is bad idea, but is not fatal (as explained above).

Related

accidental left shift of long long data type modifies seemingly unrelated variable

I have the following code in which I accidently did left shifting instead of right shifting of the variable p.
However, when I ran the code, the pointer root was being reset to null (for input 2 and many others).
Shouldn't there be a segmentation fault, because of the array bits?
Could someone please explain this behaviour?
Thanks in advance.
#include <bits/stdc++.h>
using namespace std;
typedef struct $ {
struct $* left;
struct $* right;
$(){
left = NULL;
right = NULL;
}
} vertex;
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
long long p;
cin >> p;
bool bits[30];
vertex* root = new vertex();
cout << "root: " << root << endl;
int j = 0;
while(p){
bits[29 - j] = p&1;
j++;
p <<= 1;
}
cout << "root: " << root << endl;
return 0;
}
If you expect a segmentation fault in some siutation or another, you are very often disappointed. There is no guarantee for that, it would be too easy. Or to put it differently, guaranteeing it would be too costly.
Accessing an array outside of its range causes undefined behaviour and that means you are guaranteed that there is no guarantee.
https://en.wikipedia.org/wiki/Undefined_behavior
So because of UB, guessing at why anything in your case happens is actually wrong and does not really explain anything. It is kind of forbidden to even guess.
However, as long as you do not rely on it, here is some guessing, offered to minimise your frustration.
The way you define your variables is relevant.
bool bits[30];
vertex* root = new vertex();
In this order, the pointer root will, by many compilers/linkers, be put in memory on an address just lower than the array bits.
In that location it falls victim to being overwritten if you start writing at lower addresses than the array. Which you do here
bits[29 - j] = p&1;
when j is greater than 29. And that is very likely happening, because the loop condition does not guarantee to stop before that.
By the way, even with right shift, some inputs to a long long variable would require more than 29 shifts to end up 0.
In modern computer systems, memory is managed and allocated by the operating system in pages (usually of a size around 4 KB). A segmentation fault occurs when the memory management hardware (generally located within the CPU) catches a process accessing a memory page not belonging to it. This causes an interrupt, which the operating system then handles, usually by killing the violating process.
However, accessing data on your stack (whether on purpose or accidentally) cannot possibly cause a memory access violation, because - obviously - the stack is located within a memory page owned by you; otherwise you couldn't access data on the stack at all.
Notice, though, that if the top of the stack is near the border of two memory pages, it might be possible to cause a segmentation fault by accessing a memory location near the top of the stack (on the next page).

Accessing unavailable memory location in character array (Out of range value)

I have a piece of code:
#include <iostream>
using namespace std;
int main(){
char a[5][5] = {{'x','y','z','a','v'},{'d','g','h','v','x'}};
for(int i=0; i<2; i++){
for(int j = 0; j<6; j++)
{
cout << a[i][j];
}
}
return 0;
}
As you can see the first and second dimensions are or size 5 elements each. With the double for loop's I am just printing what is initialized for variable a.
The size of int "j", as it increases the output changes dramatically.
Why is this happen?
Does pointer is the solution to this? If yes how? If no what can we do to avoid run time errors caused by this incorrect access?
You might be treating this issue like an out-of-bounds error in Java, where the behavior is strictly defined: you'll get an ArrayIndexOutOfBoundsException, and the program will immediately terminate, unless the exception is caught and handled.
In C++, this kind of out-of-bounds error is undefined behavior, which means the compiler is allowed to do whatever silly thing it thinks will achieve the best performance. Generally speaking, this results in the compiler just blindly performing the same pointer arithmetic it would perform on array accesses that are in-bounds, regardless of whether the memory is valid or not.
In your case, because you've allocated 25 chars worth of memory, you'll access valid memory (in most environments, UB withstanding) at least until i * 5 + j >= 25, at which point any number of things could happen:
You could get garbage data off the stack
You could crash the program with a Segmentation Fault (Access Violation in Windows/Visual Studio)
The loop could refuse to terminate at the index you expect it to terminate.
That last one is an incredible bug: If aggressive loop optimization is occurring, you could get some very odd behavior when you make mistakes like this in your code.
What's almost certainly happening in the code you wrote is that that first point: though you allocated space for 25 chars, you only defined the contents of 10 of them, meaning any accesses beyond those first 10 will invoke a different kind of undefined behavior (access of an uninitialized variable), which the vast majority of the time, results in their values being filled in with whatever coincidentally was in that memory space before the variable was used.

Undefined behaviour observed in C++/memory allocation

#include <iostream>
using namespace std;
int main()
{
int a=50;
int b=50;
int *ptr = &b;
ptr++;
*ptr = 40;
cout<<"a= "<<a<<" b= "<<b<<endl;
cout<<"address a "<<&a<<" address b= "<<&b<<endl;
return 0;
}
The above code prints :
a= 50 b= 50
address a 0x7ffdd7b1b710 address b= 0x7ffdd7b1b714
Whereas when I remove the following line from the above code
cout<<"address a "<<&a<<" address b= "<<&b<<endl;
I get output as
a= 40 b= 50
My understanding was that the stack grows downwards, so the second answers seems to be the correct one. I am not able to understand why the print statement would mess up the memory layout.
EDIT:
I forgot to mention, I am using 64 bit x86 machine, with OS as ubuntu 14.04 and gcc version 4.8.4
First of all, it's all undefined behavior. The C++ standard says that you can increment pointers only as long as you are in array boundaries (plus one element after), with some more exceptions for standard layout classes, but that's about it. So, in general, snooping around with pointers is uncharted territory.
Coming to your actual code: since you are never asking for its address, probably the compiler either just left a in a register, or even straight propagated it as a constant throughout the code. For this reason, a never touches the stack, and you cannot corrupt it using the pointer.
Notice anyhow that the compiler isn't restricted to push/pop variables on the stack in the order of their declaration - they are reordered in whatever order they seem fit, and actually they can even move in the stack frame (or be replaced) throughout the function - and a seemingly small change in the function may make the compiler to alter completely the stack layout. So, even comparing the addresses as you did says nothing about the direction of stack growth.
UB - You have taken a pointer to b, you move that pointer ptr++ which means you are pointing to some unknown, un-assigned memory and you try to write on that memory region, which will cause an Undefined Behavior.
On VS 2008, debugging it step-by-step will throw this message for you which is very self-explanatory::

Why do I get a random number when increasing the integer value of a pointer?

I am an expert C# programmer, but I am very new to C++. I get the basic idea of pointers just fine, but I was playing around. You can get the actual integer value of a pointer by casting it as an int:
int i = 5;
int* iptr = &i;
int ptrValue = (int)iptr;
Which makes sense; it's a memory address. But I can move to the next pointer, and cast it as an int:
int i = 5;
int* iptr = &i;
int ptrValue = (int)iptr;
int* jptr = (int*)((int)iptr + 1);
int j = (int)*iptr;
and I get a seemingly random number (although this is not a good PSRG). What is this number? Is it another number used by the same process? Is it possibly from a different process? Is this bad practice, or disallowed? And if not, is there a use for this? It's kind of cool.
What is this number? Is it another number used by the same process? Is it possibly from a different process?
You cannot generally cast pointers to integers and back and expect them to be dereferencable. Integers are numbers. Pointers are pointers. They are totally different abstractions and are not compatible.
If integers are not large enough to be able to store the internal representation of pointers (which is likely the case; integers are usually 32 bits long and pointers are usually 64 bits long), or if you modify the integer before casting it back to a pointer, your program exhibits undefined behaviour and as such anything can happen.
See C++: Is it safe to cast pointer to int and later back to pointer again?
Is this bad practice, or disallowed?
Disallowed? Nah.
Bad practice? Terrible practice.
You move beyond i pointer by 4 or 8 bytes and print out the number, which might be another number stored in your program space. The value is unknown and this is Undefined Behavior. Also there is a good chance that you might get an error (that means your program can blow up) [Ever heard of SIGSEGV? The Segmentation violation problem]
You are discovering that random places in memory contain "unknown" data. Not only that, but you may find yourself pointing to memory that your process does not have "rights" to so that even the act of reading the contents of an address can cause a segmentation fault.
In general is you allocate some memory to a pointer (for example with malloc) you may take a look at these locations (which may have random data "from the last time" in them) and modify them. But data that does not belong explicitly to a pointer's block of memory can behave all kings of undefined behavior.
Incidentally if you want to look at the "next" location just to
NextValue = *(iptr + 1);
Don't do any casting - pointer arithmetic knows (in your case) exactly what the above means : " the contents of the next I refer location".
int i = 5;
int* iptr = &i;
int ptrValue = (int)iptr;
int* jptr = (int*)((int)iptr + 1);
int j = (int)*iptr;
You can cast int to pointer and back again, and it will give you same value
Is it possibly from a different process? no it's not, and you can't access memory of other process except using readProcessMemmory and writeProcessMemory under win32 api.
You get other number because you add 1 to the pointer, try to subtract 1 and you will same value.
When you define an integer by
int i = 5;
it means you allocate a space in your thread stack, and initialize it as 5. Then you get a pointer to this memory, which is actually a position in you current thread stack
When you increase your pointer by 1, it means you point to the next location in your thread stack, and you parse it again as an integer,
int* jptr = (int*)((int)iptr + 1);
int j = (int)*jptr;
Then you will get an integer from you thread stack which is close to where you defined your int i.
Of course this is not suggested to do, unless you want to become an hacker and want to exploit stack overflow (here it means what it is, not the site name, ha!)
Using a pointer to point to a random address is very dangerous. You must not point to an address unless you know what you're doing. You could overwrite its content or you may try to modify a constant in read-only memory which leads to an undefined behaviour...
This for example when you want to retrieve the elements of an array. But cannot cast a pointer to integer. You just point to the start of the array and increase your pointer by 1 to get the next element.
int arr[5] = {1, 2, 3, 4, 5};
int *p = arr;
printf("%d", *p); // this will print 1
p++; // pointer arithmetics
printf("%d", *p); // this will print 2
It's not "random". It just means that there are some data on the next address
Reading a 32-bit word from an address A will copy the 4 bytes at [A], [A+1], [A+2], [A+3] into a register. But if you dereference an int at [A+1] then the CPU will load the bytes from [A+1] to [A+4]. Since the value of [A+4] is unknown it may make you think that the number is "random"
Anyway this is EXTREMELY dangerous 💀 since
the pointer is misaligned. You may see the program runs fine because x86 allows for unaligned accesses (with some performance penalty). But most other architectures prohibit unaligned operations and your program will just end in segmentation fault. For more information read Purpose of memory alignment, Data Alignment: Reason for restriction on memory address being multiple of data type size
you may not be allowed to touch the next byte as it may be outside of your address space, is write-only, is used for another variable and you changed its value, or whatever other reasons. You'll also get a segfault in that case
the next byte may not be initialized and reading it will crash your application on some architectures
That's why the C and C++ standard state that reading memory outside an array invokes undefined behavior. See
How dangerous is it to access an array out of bounds?
Access array beyond the limit in C and C++
Is accessing a global array outside its bound undefined behavior?

Alocating local variables on stack & using pointer arithemtic

I read that in function the local variables are put on stack as they are defined after the parameters has been put there first.
This is mentioned also here
5 .All function arguments are placed on the stack. 6.The instructions
inside of the function begin executing. 7.Local variables are pushed
onto the stack as they are defined.
So I excpect that if the C++ code is like this:
#include "stdafx.h"
#include <iostream>
int main()
{
int a = 555;
int b = 666;
int *p = &a;
std::cout << *(p+1);
return 0;
}
and if integer here has 4 bytes and we call the memory space on stack that contains first 8 bits of int 555 x, then 'moving' another 4 bytes to the top of the stack via *(p+1) we should be looking into memory at address x + 4.
However, the output of this is -858993460 - an that is always like that no matter what value int b has. Evidently its some standard value. Of course I am accessing a memory which I should not as for this is the variable b. It was just an experiment.
How come I neither get the expected value nor an illegal access error?
Where is my assumption wrong?
What could -858993460 represent?
What everyone else has said (i.e. "don't do that") is absolutely true. Don't do that. However, to actually answer your question, p+1 is most likely pointing at either a pointer to the caller's stack frame or the return address itself. The system-maintained stack pointer is decremented when you push something on it. This is implementation dependent, officially speaking, but every stack pointer I've ever seen (this is since the 16-bit era) has been like this. Thus, if as you say, local variables are pushed on the stack as they are initialized, &a should == &b + 1.
Perhaps an illustration is in order. Suppose I compile your code for 32 bit x86 with no optimizations, and the stack pointer esp is 20 (this is unlikely, for the record) before I call your function. This is what memory looks like right before the line where you invoke cout:
4: 12 (value of p)
8: 666 (value of b)
12: 555 (value of a)
16: -858993460 (return address)
p+1, since p is an int*, is 16. The memory at this location isn't read protected because it's needed to return to the calling function.
Note that this answer is academic; it's possible that the compiler's optimizations or differences between processors caused the unexpected result. However, I would not expect p+1 to == &b on any processor architecture with any calling convention I've ever seen because the stack usually grows downward.
Your assumptions are true in theory (From the CS point of view).
In practice there is no guarantee to do pointer arithmetic in that way expecting those results.
For example, your asumption "All function arguments are placed on the stack" is not true: The allocation of function argumments is implementation-defined (Depending on the architecture, it could use registers or the stack), and also the compiler is free to allocate local variables in registers if it feels necesary.
Also the asumption "int size is 4 bytes, so adding 4 to the pointer goes to b" is false. The compiler could have added padding between a and b to ensure memory aligment.
The conclusion here is: Don't use low-level tricks, they are implementation-defined. Even if you have to (Regardless of our advises) do it, you have to know how the compiler works and how it generates the code.