Undefined behaviour observed in C++/memory allocation - c++

#include <iostream>
using namespace std;
int main()
{
int a=50;
int b=50;
int *ptr = &b;
ptr++;
*ptr = 40;
cout<<"a= "<<a<<" b= "<<b<<endl;
cout<<"address a "<<&a<<" address b= "<<&b<<endl;
return 0;
}
The above code prints :
a= 50 b= 50
address a 0x7ffdd7b1b710 address b= 0x7ffdd7b1b714
Whereas when I remove the following line from the above code
cout<<"address a "<<&a<<" address b= "<<&b<<endl;
I get output as
a= 40 b= 50
My understanding was that the stack grows downwards, so the second answers seems to be the correct one. I am not able to understand why the print statement would mess up the memory layout.
EDIT:
I forgot to mention, I am using 64 bit x86 machine, with OS as ubuntu 14.04 and gcc version 4.8.4

First of all, it's all undefined behavior. The C++ standard says that you can increment pointers only as long as you are in array boundaries (plus one element after), with some more exceptions for standard layout classes, but that's about it. So, in general, snooping around with pointers is uncharted territory.
Coming to your actual code: since you are never asking for its address, probably the compiler either just left a in a register, or even straight propagated it as a constant throughout the code. For this reason, a never touches the stack, and you cannot corrupt it using the pointer.
Notice anyhow that the compiler isn't restricted to push/pop variables on the stack in the order of their declaration - they are reordered in whatever order they seem fit, and actually they can even move in the stack frame (or be replaced) throughout the function - and a seemingly small change in the function may make the compiler to alter completely the stack layout. So, even comparing the addresses as you did says nothing about the direction of stack growth.

UB - You have taken a pointer to b, you move that pointer ptr++ which means you are pointing to some unknown, un-assigned memory and you try to write on that memory region, which will cause an Undefined Behavior.
On VS 2008, debugging it step-by-step will throw this message for you which is very self-explanatory::

Related

Code running successfully using vector but showing error using array

I was practicing an array manipulation question. While solving I declared an array (array A in code).
For some test cases, I got a segmentation fault. I replaced the array with vector and got AC. I don't know the reason for this. Plz, explain.
#include <bits/stdc++.h>
using namespace std;
int main()
{
int n,m,a,b,k;
cin>>n>>m;
vector<long int> A(n+2);
//long int A[n+2]={0};
for(int i=0;i<m;i++)
{
cin>>a>>b>>k;
A[a]+=k;
A[b+1]-=k;
}
long res=0;
for(int i=1;i<n+2;i++)
{
A[i]+=A[i-1];
if(res<A[i])
res=A[i];
}
cout<<res;
return 0;
}
Since it looks like you haven't being programming in C++ for very long I will try to break it down for you to make it simpler to understand:
First of all c++ does not intialize any values for you this is not Java, so please do not do:
int n,m,a,b,k;
And then use:
A[a]+=k;
A[b+1]-=k;
At this point we have no idea what a and b are it might be -300 for all we know, you never intialized it. Hence, occasically you get lucky and the number that is initalized by the compiler does not cause a segmentation fault, and other times you are not so lucky and the value intialized by the compiler does cause a segmentation fault.
long int A[n+2]={0}; is not legal in Standard C++. There are a bunch of reasons for this and I think you stumbled over one of them.
Compilers that allow Variable Length Arrays follow the example of C99 and the array is allocated on the stack. Stack is a limited resource, usually between 1 and 10 MB for a desktop computer. If the user inputs an n of sufficient size, the array will take up too much of the stack or breach the bounds of the stack resulting in Undefined Behaviour. of then this behaviour manifests in a segmentation fault from accessing memory that is so far off the end of the stack that it's not controlled by the program. There are typically no warnings when you overflow the stack. Often a program crash or corrupted data is the the way you find out, and it's too late to salvage the program by then.
On the other hand, a vector allocates it's internal buffer from the freestore, and on a modern PC with virtual memory and 64 bit addressing the freestore is fantastically huge and throws an exception if you attempt to exceed what it can allocate.
Another important difference is
long int A[n+2]={0};
likely did not zero initialize the array. This is the case with g++. The first byte will be set to zero and the remainder are uninitialized. Such is the curse of using non-Standard extensions. You cannot count on the behaviour guaranteed by the Standard.
std::vector will zero initialize the whole array or set the array to whatever value you tell it to use.

C++/Address Space: 2 Bytes per adress?

I was just trying something and i was wondering how this could be. I have the following Code:
int var1 = 132;
int var2 = 200;
int *secondvariable = &var2;
cout << *(secondvariable+2) << endl << sizeof(int) << endl;
I get the Output
132
4
So how is it possible that the second int is only 2 addresses higher? I mean shouldn't it be 4 addresses? I'm currently under WIN10 x64.
Regards
With cout << *(secondvariable+2) you don't print a pointer, you print the value at secondvariable[2], which is an invalid indexing and lead to undefined behavior.
If you want to print a pointer then drop the dereference and print secondvariable+2.
While you already are far in the field of undefined behaviour (see Some programmer dude's answer) due to indexing an array out of bounds (a single variable is considered an array of length 1 for such matters), some technical background:
Alignment! Compilers are allowed to place variables at addresses such that they can be accessed most efficiently. As you seem to have gotten valid output by adding 2*sizeof(int) to the second variable's address, you apparently have reached the first one by accident. Apparently, the compiler decided to leave a gap in between the two variables so that both can be aligned to addresses dividable by 8.
Be aware, though, that you don't have any guarantee for such alignment, different compilers might decide differently (or same compiler on another system), and alignment even might be changed via compiler flags.
On the other hand, arrays are guaranteed to occupy contiguous memory, so you would have gotten the expected result in the following example:
int array[2];
int* a0 = &array[0];
int* a1 = &array[1];
uintptr_t diff = static_cast<uintptr_t>(a1) - static_cast<uintptr_t>(a0);
std::cout << diff;
The cast to uintptr_t (or alternatively to char*) assures that you get address difference in bytes, not sizes of int...
This is not how C++ works.
You can't "navigate" your scope like this.
Such pointer antics have completely undefined behaviour and shall not be relied upon.
You are not punching holes in tape now, you are writing a description of a program's semantics, that gets converted by your compiler into something executable by a machine.
Code to these abstractions and everything will be fine.

Trying to access pointer after resetting

Debugging an application and experimenting a bit I came to a quite strange behaviour that can be reproduced with a following code:
#include <iostream>
#include <memory>
int main()
{
std::unique_ptr<int> p(new int);
*p = 10;
int& ref = *p;
int* direct_p = &(*p);
p.reset();
std::cout << *p << "\n"; // a) SIGSEGV
std::cout << ref << "\n"; // b) 0
std::cout << *direct_p << "\n"; // c) 0
return 0;
}
As I see it, all three variants have to cause undefined behaviour. Keeping that in the mind, I have these questions:
Why do ref and direct_p nevertheless point to zero? (not 10) (I mean, the mechanism of int's destruction seems strange to me, what's the point for compiler to rewrite on unused memory?)
Why don't b) and c) fire SIGSEGV?
Why does behaviour of a) differ from b) and c)?
p.reset(); is the equivalent of p.reset(nullptr);. So the unique_ptr's internal pointer is being set to null. Consequently doing *p ends up with the same result as trying to dereference a raw pointer that's null.
On the other hand, ref and direct_p are still left pointing at the memory formerly occupied by that int. Trying to use them to read that memory gets into Undefined Behavior territory, so in principle we can't conclude anything...
But in practice, there are a few things we can make educated assumptions and guesses about.
Since that memory location was valid shortly before, it's most likely still present (hasn't been unmapped from the address space, or other such implementation-specific things) when your program accesses it through ref and direct_p. C++ doesn't demand that the memory should become completely inaccessible. So in this case you simply end up "successfully" reading whatever happens to be at that memory location at that point during the program's execution.
As for why the value happens to be 0, well there are a couple possibilities. One is that you could be running in a debug mode which purposefully zeroes out deallocated memory. Another possibility is that by the time you access that memory through ref and direct_p something else has already re-used it for a different purpose which ended up leaving it with that value. Your std::cout << *p << "\n"; line could potentially have done that.
Undefined behaviour does not mean that code must trigger an abnormal termination. It means that anything can happen. Abnormal termination is only one possible result. Inconsistency of behaviour between different instances of undefined behaviour is another. Another possible (albeit rare in practice) is appearing to "work correctly" (however one defines "work correctly") until the next full moon, and then mysteriously behaving differently.
From a perspective of increasing average programmer skill and increasing software quality, electrocuting the programmer whenever they write code with undefined behaviour might be considered desirable.
As others have said undefined behavior means quite literally anything can happen. The code is unpredictable. But let me try to shed some light on question 'b' with an example.
SIGSEGV is attributed to a hardware fault reported by hardware with an MMU (Memory management unit). Your level of memory protection and therefore your level of SIGSEGV thrown can depend greatly on the MMU your hardware is using (source). If your un-allocated pointer happens to point to an ok address you will be able to read the memory their, if it points somewhere bad then your MMU will freak out and raise a SIGSEGV with your program.
Take for example though the MPC5200. This processor is quite old and has a somewhat rudimentary MMU. It can be quite difficult to get it to crash causing a segfault.
For example the following will not necessarily cause a SIGSEGV on the MPC5200:
int *p = NULL;
*p;
*p = 1;
printf("%d", *p); // This actually prints 1 which is insane
The only way i could get this to throw a segfault was with the following code:
int *p = NULL;
while (true) {
*(--p) = 1;
}
To wrap up, undefined behavior really does mean undefined.
Why nevertheless ref and direct_p point to zero? (not 10) (I mean, the
mechanism of int's destruction seems strange to me, what's the point
for compiler to rewrite on unused memory?)
It's not the compiler, it's C++/C libraries that changes memory. In your particular case, libc does something funny, as it reallocates heap data, when the value is freed:
Hardware watchpoint 3: *direct_p
_int_free (have_lock=0, p=0x614c10, av=0x7ffff7535b20 <main_arena>) at malloc.c:3925
3925 while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2)) != old2);
Why b) and c) don't fire SIGSEGV?
SIGSEGV is triggered by the kernel if an attempt to access memory outside of allocated address space is made. Normally, libc won't actually remove the pages after deallocating memory - it would be too expensive. You are writing to an address that is unmapped by libc - but kernel doesn't know about that. You can use a memory barrier library (e.g. ElectricFence, great for debugging) to have that happen.
Why behavior of a) differs from b) and c)?
You made value of p point to some memory, say 100. You then effectively created aliases for that memory location, so direct_p and ref will point to 100. Note, that they aren't variable references, they are memory references. So changes you make to p have no effect on them. You then deallocated p, it's value becomes 0 (i.e. it now points to a memory address 0). Attempting to read a value from memory address 0 guarantees a SIGSEGV. Reading values from memory address 100 is bad idea, but is not fatal (as explained above).

How c++ new statement work?

Why the last "cout" line always output "16" ??
On my machine, sizeof(int) is 4.
#include <iostream>
using namespace std;
int main() {
int *pint1 = new int;
int *pint2 = new int;
cout<<pint1 <<endl;
cout<<pint2 <<endl;
cout<<(int)pint2 - (int)pint1 <<endl;
}
You cannot really make any assumptions about the location of pint1 and pint2 on the stack (the pointers are stack allocated but the things they point to are allocated on the heap) as it's not specified by the C++ standard. They certainly do not have to be in contiguous memory.
For example, using pointer arithmetic pint1 + 1 does not, in general, land you on pint2.
Pointer arithmetic could be used if you'd written int* pint = new int[2];. Then, pint + 1 points to the second element.
To keep things standard, using
cout << (std::ptrdiff_t)(pint2 - pint1) << endl;
would be preferred, where I'm using (std::ptrdiff_t) explicitly. The way you've currently written it could give you undefined behaviour if you have a 32 bit int on a 64 bit architecture (e.g. Win64, MSVC).
(And don't forget to delete the allocated memory).
Because malloc addresses are 16-byte aligned by default on your platform. And new just uses malloc.
The result can highly depends on the used platform and the malloc implementation on your system. But most systems will print 16.
Because of INTERNAL compiler behaviour. It MAY allocate next memory block just after previous one.
But this is completely realisation option. Compiler MAY stop doing so EVERY MOMENT.
By the way, my compiler (gcc on ubuntu) prints 32
0x12b2010
0x12b2030
32
possibly due to alignment, but also the fact that every memory-chunk allocated with new or malloc have to remember its own size, and thus is larger than what you requested. the 'extra' info is usually right before the pointer returned

Alocating local variables on stack & using pointer arithemtic

I read that in function the local variables are put on stack as they are defined after the parameters has been put there first.
This is mentioned also here
5 .All function arguments are placed on the stack. 6.The instructions
inside of the function begin executing. 7.Local variables are pushed
onto the stack as they are defined.
So I excpect that if the C++ code is like this:
#include "stdafx.h"
#include <iostream>
int main()
{
int a = 555;
int b = 666;
int *p = &a;
std::cout << *(p+1);
return 0;
}
and if integer here has 4 bytes and we call the memory space on stack that contains first 8 bits of int 555 x, then 'moving' another 4 bytes to the top of the stack via *(p+1) we should be looking into memory at address x + 4.
However, the output of this is -858993460 - an that is always like that no matter what value int b has. Evidently its some standard value. Of course I am accessing a memory which I should not as for this is the variable b. It was just an experiment.
How come I neither get the expected value nor an illegal access error?
Where is my assumption wrong?
What could -858993460 represent?
What everyone else has said (i.e. "don't do that") is absolutely true. Don't do that. However, to actually answer your question, p+1 is most likely pointing at either a pointer to the caller's stack frame or the return address itself. The system-maintained stack pointer is decremented when you push something on it. This is implementation dependent, officially speaking, but every stack pointer I've ever seen (this is since the 16-bit era) has been like this. Thus, if as you say, local variables are pushed on the stack as they are initialized, &a should == &b + 1.
Perhaps an illustration is in order. Suppose I compile your code for 32 bit x86 with no optimizations, and the stack pointer esp is 20 (this is unlikely, for the record) before I call your function. This is what memory looks like right before the line where you invoke cout:
4: 12 (value of p)
8: 666 (value of b)
12: 555 (value of a)
16: -858993460 (return address)
p+1, since p is an int*, is 16. The memory at this location isn't read protected because it's needed to return to the calling function.
Note that this answer is academic; it's possible that the compiler's optimizations or differences between processors caused the unexpected result. However, I would not expect p+1 to == &b on any processor architecture with any calling convention I've ever seen because the stack usually grows downward.
Your assumptions are true in theory (From the CS point of view).
In practice there is no guarantee to do pointer arithmetic in that way expecting those results.
For example, your asumption "All function arguments are placed on the stack" is not true: The allocation of function argumments is implementation-defined (Depending on the architecture, it could use registers or the stack), and also the compiler is free to allocate local variables in registers if it feels necesary.
Also the asumption "int size is 4 bytes, so adding 4 to the pointer goes to b" is false. The compiler could have added padding between a and b to ensure memory aligment.
The conclusion here is: Don't use low-level tricks, they are implementation-defined. Even if you have to (Regardless of our advises) do it, you have to know how the compiler works and how it generates the code.