Related
I understand pointer allocation of memory fully, but deallocation of memory only on a higher level. What I'm most curious about is how C++ keeps track of what memory has already been deallocated?
int* ptr = new int;
cout << ptr;
delete ptr;
cout << ptr;
// still pointing to the same place however it knows you can't access it or delete it again
*ptr // BAD
delete ptr // BAD
How does C++ know I deallocated that memory. If it just turns it to arbitrary garbage binary numbers, wouldn't I just be reading in that garbage number when I dereference the pointer?
Instead, of course, c++ knows that these are segfaults somehow.
C++ does not track memory for you. It doesn't know, it doesn't care. It is up to you: the programmer. (De)allocation is a request to the underlying OS. Or more precisely it is a call to libc++ (or possibly some other lib) which may or may not access the OS, that is an implementation detail. Either way the OS (or some other library) tracks what parts of memory are available to you.
When you try to access a memory that the OS did not assigned to you, then the OS will issue segfault (technically it is raised by the CPU, assuming it supports memory protection, it's a bit complicated). And this is a good situation. That way the OS tells you: hey, you have a bug in your code. Note that the OS doesn't care whether you use C++, C, Rust or anything else. From the OS' perspective everything is a machine code.
However what is worse is that even after delete the memory may still be owned by your process (remember those libs that track memory?). So accessing such pointer is an undefined behaviour, anything can happen, including correct execution of the code (that's why it is often hard to find such bugs).
If it just turns it to arbitrary garbage binary numbers, wouldn't I just be reading in that garbage number when I dereference the pointer?
Who says it turns into garbage? What really happens to the underlying memory (whether the OS reclaims it, or it is filled with zeros or some garbage, or maybe nothing) is none of your concern. Everything you need to know is that after delete it is no longer safe to use the pointer. Even (or especially) when it looks ok.
How does C++ know I deallocated that memory.
When you use a delete expression, "C++ knows" that you deallocated that memory.
If it just turns it to arbitrary garbage binary numbers
C++ doesn't "turn [deallocated memory] to arbitrary garbage binary numbers". C++ merely makes the memory available for other allocations. Changing the state of that memory may be a side effect of some other part of the program using that memory - which it is now free to do.
wouldn't I just be reading in that garbage number when I dereference the pointer?
When you indirect through the pointer, the behaviour of the program is undefined.
Instead, of course, c++ knows that these are segfaults somehow.
This is where your operating system helpfully stepped in. You did something that did not make sense, and the operating system killed the misbehaving process. This is one of the many things that may but might not happen when the behaviour of the program is undefined.
I take it that you wonder what delete actually does. Here it is:
First of all, it destructs the object. If the object has a destructor, it is called, and does whatever it is programmed to do.
delete then proceeds to deallocate the memory itself. This means that the deallocator function (::operator delete() in most cases in C++) typically takes the memory object, and adds it to its own, internal data structures. I.e. it makes sure that the next call to ::operator new() can find the deallocated memory slab. The next new might then reuse that memory slab for other purposes.
The entire management of memory happens by using data structures that you do not see, or need to know that they exist. How an implementation of ::operator new() and ::operator delete() organizes its internal data is strictly and fully up to the implementation. It doesn't concern you.
What concerns you is, that the language standard defines that any access to a memory object is undefined behavior after you have passed it to the delete operator. Undefined behavior does not mean that the memory needs to vanish magically, or that it becomes inaccessible, or that it is filled with garbage. Usually none of these happens immediately, because making the memory inaccessible or filling it with garbage would require explicit action from the CPU, so implementations don't generally touch what's written in the memory. You are just forbidden to make further accesses, because it's now up to system to use the memory for any other purpose it likes.
C++ still has a strong C inheritance when it comes to memory addressing. And C was invented to build an OS (first version of Unix) where it makes sense to use well known register addresses or to whatever low level operation. That means that when you address memory through a pointer, you as the programmer are supposed to know what lies there and the language just trusts you.
On common implementations, the language requests chunks of memory from the OS for new dynamic objects, and keeps track of used and unused memory block. The goal is to re-use free blocks for new dynamic objects instead of asking the OS for each and every allocation and de-allocation.
Still for common implementation, nothing changes in a freshly allocated or deallocated block, but the pointers maintaining a list of free blocks. AFAIK few return memory to the OS until the end of the process. But a free block could be later re-used, that is the reason why when a careless programmer tries to access a block of memory containing pointers that has been re-used, SEGFAULT is not far, because the program could try to use arbitrary memory addresses that could not be mapped for the process.
BTW, the only point required by the standard is that accessing an object past its end of life, specifically here using the pointer after the delete statement invokes Undefined Behaviour. Said differently anything can happen from an immediate crash to normal results, passing through later crash or abnormal result in unrelated places of the program...
It is said that the memory allocated by new should be freed by delete, but a modern desktop OS will reclaim the memory even though you don't delete it. So why should we delete the memory allocated by new?
Also assert is known as not calling the destructors, and it seems like it's widely used in STL (at least VS2015 does that). If it's advised to delete the memory allocated by new (classes like string, map and vector use the destructor to delete the allocated memory), why the developers still use lots of assert then?
Why should we delete the memory allocated by new?
Because otherwise
the memory is leaked. Not leaking memory is absolutely crucial for long running software such as servers and daemons because the leaks will accumulate and consume all available memory.
the destructors of the objects will not be called. The logic of the program may depend on the destructors being called. Not calling some destructors may cause non-memory resources being leaked as well.
Also assert is known as not calling the destructors
A failed assert terminates the entire process, so it doesn't really matter much whether the logic of the program remains consistent, nor whether memory or other resources are leaked since the process isn't going to reuse those resources anyway.
and it seems like it's widely used in STL (at least VS2015 does that)
To be accurate, I don't think the standard library is specified to use the assert macro. The only situation where it could use it is if you have undefined behaviour. And if you have UB, then leaked memory is the least of your worries.
If you know that the destructor of the object is trivial, and you know that the object is used throughout the program (so, it's essentially a singleton), then it's quite safe to leak the object on purpose. This does have a drawback that it will be detected by a memory leak sanitizer that you would probably want to use to detect accidental memory leaks.
It is said that the memory allocated by new should be freed by
delete, but a modern desktop OS will reclaim the memory even though
you don't delete it. So why should we delete the memory allocated
by new?
Careful! The OS reclaims the memory only after your program has finished. This is not like garbage collection in Java or C#, which frees memory while the program is running.
If you don't delete (or more precisely, if you don't make sure that delete is called by resource-managing classes like std::unique_ptr, std::string or std::vector), then memory usage will continue to grow until you run out of memory.
Not to mention that destructors will not run, which matters if you have objects of types whose destructors perform more than just releasing memory.
Also assert is known as not calling the destructors,
More precisely, assert causes the program to terminate in a way that destructors are not called, unless the corresponding translation unit was compiled with the NDEBUG preprocessor macro defined, in which case the assert does nothing.
and it seems like it's widely used in STL (at least VS2015 does that).
Yes, the standard-library implementation of Visual C++ 2015 does that a lot. You should also use it liberally in your own code to detect bugs.
The C++ standard itself does not specify how and if asserts should appear in the implementation of a standard-library function. It does specify situations where the behaviour of the program is undefined; those often correspond to an assert in the library implementation (which makes sense, because if the behaviour is undefined anyway, then the implementation is free to do anything, so why not use that liberty in a useful way to give you bug detection?).
If it's advised to delete the memory allocated by new (classes like
string, map and vector use the destructor to delete the allocated
memory), why the developers still use lots of assert then?
Because if an assertion fails, then you want your program to terminate immediately because you have detected a bug. If you have detected a bug, then your program is by definition in an unknown state. Allowing your program to continue in an unknown state by running destructors is possibly dangerous and may compromise system integrity and data consistency.
After all, as I said above, destructors may not only call delete a few times. Destructors close files, flush buffers, write into logs, close network connections, clear the screen, join on a thread or commit or rollback database transactions. Destructors can do a lot of things which can modify and possibly corrupt system resources.
It is a common pattern that applications - in the course of their execution -dynamically create objects that will not be used throughout the program execution. If an application creates a lot of such objects of temporary lifetime, it somehow has to manage memory in order not to run out of it. Note that memory is still limited, since operating systems usually do not assign all available memory to an application. Operating systems, especially those driving limited devices like mobile phones, may even kill applications once the produce a too high pressure on memory.
Hence, you should free the memory of those objects that are not used any more. And C++ offers storage class specifiers to make this handling easier. automatic storage duration, which is the default, deletes objects once they run out of scope (i.e. their enclosing block, e.g. the function in which they are defined, finishes). static objects remain until the end of normal program execution (if reached), and dynamically allocated objects remain until you call delete.
Note that - in no way - any object will survive the end of program execution, as the operating system will free the complete application memory. For normal program terminations, destructors of static objects will be called (but not for objects of dynamically created objects that have not been deleted before). For abnormal program terminations, like triggered by assert, exit or the operating system, no destructors are called; you can rather think of a program terminates because you turn off the power.
If you don't delete You introduce a memory leak. Each time this operator is invoked the process will waste some portion of its address space until it ultimately runs out of memory.
After your program finishes you do not need to care about memory leaks, so in principle this would be fine:
int main(){
int* x = new int(1);
}
However, thats not how one usually uses memory. Often you need to allocate memory for something that you use only for a short time and then you want to free that memory when you dont need it anymore. Consider this example:
int main(){
while ( someCondition ) {
Foo* x = new Foo();
doSomething(*x);
} // <- already here we do not need x anymore
}
That code will accumulate more and more memory for x even if all that is used is a single instance of x. Thats why one should free memory latest at the end of the scope where it is needed (once you left the scope you have no way to free it!). Because forgetting a delete isnt nice, one should make use of RAII whenever possible:
int main(){
while ( someCondition ) {
Foo x;
doSomething(x);
} // <- memory for x is freed automatically
}
How do memory leak detectors actually work? What are the underlying concepts in general? Can take C++ as the language to explain this.
There are a couple of different ways that leak detectors work. You can replace the implementation of malloc and free with ones that can track more information during allocation and are not concerned with performance. This is similar to how dmalloc works. In general, any address that is malloc'ed but not free'd is leaked.
The basic implementation is actually pretty simple. You just maintain a lookup table of every allocation and its line number, and remove the entry when it is freed. Then when the program is done you can list all leaked memory. The hard part is determining when and where the allocation should have been freed. This is even harder when there are multiple pointers to the same address.
In practice, you'll probably want more than just the single line number, but rather a stack trace for the lost allocations.
Another approach is how valgrind works which implements an entire virtual machine to keep track of addresses and memory references and associated bookkeeping. The valgrind approach is much more expensive, but also much more effective as it can also tell you about other types of memory errors like out of bounds reads or writes.
Valgrind essentially instruments the underlying instructions and can track when a given memory address has no more references. It can do this by tracking assignments of addresses, and so it can tell you not just that a piece of memory was lost, but exactly when it became lost.
C++ makes things a little harder for both types of leak detectors because it adds the new and delete operators. Technically new can be a completely different source of memory than malloc. However, in practice many real C++ implementations just use malloc to implement new or have an option to use malloc instead of the alternate approach.
Also higher level languages like C++ tend to have alternative higher level ways of allocating memory like std::vector or std::list. A basic leak detector would report the potentially many allocations made by the higher level modes separately. That's much less useful than saying the entire container was lost.
Here's a published technical paper on how our CheckPointer tool works.
Fundamentally it tracks the lifetimes of all values (heap and stack), and their sizes according their types as defined by the language. This allows CheckPointer to find not only leaks, but out-of-array bound accesses, even for arrays in the stack, which valgrind won't do.
In particular, it analyzes the source code to find all pointer uses.
(This is quite the task just by itself).
It keeps track of pointer meta data for each pointer, consisting of
A reference to the object meta data for the heap-allocated object or global or local variable orfunction pointed to by the pointer and
The address range of the (sub)object of the object that the pointer may currently access. This may be smaller than the address range of the
whole object; e.g. if you take the address of a struct member, the instrumented source code will only allow access to that member when using the resulting pointer.
It also tracks the kind and location of each object, i.e. whether
it is a function, a global, thread-local or local variable, heap-allocated memory, or a string literal constant:
The address range of the object that may be safely accessed, and
For each pointer stored in the heap-allocated object or variable, a reference to the pointer metadata for that pointer.
All this tracking is accomplished by transforming the original program source, into a program which does what the original program does, and interleaves various meta-data checking or updating routines. The resulting program is compiled and run. Where a meta-data check fails at runtime, a backtrace is provided with a report of the type of failure (invalid pointer, pointer outside valid bounds, ...)
This is tagged C and C++ and no operating system is mentioned. This answer is for Windows.
C
Windows has the concept of virtual memory. Any memory a process can get is virtual memory. This is done through VirtualAlloc() [MSDN]. You can imagine the leak detector to put a breakpoint on that function and whenever it is called, it gets the callstack and saves it somewhere. Then it can do similar for VirtualFree()[MSDN].
The difference can then be identified and shown along with the callstacks that have been saved.
C++
C++ has a different concept: it takes the large 64kb blocks which it gets from VirtualAlloc() and splits it into smaller pieces, called the Heap. The C++ heap manager comes from Microsoft and offers new methods HeapAlloc() [MSDN] and HeapFree()[MSDN].
Then, you could do the same as before, but actually, that feature is already built-in. Microsoft's GFlags [MSDN] tool can enable the tracking:
In this case it will save up to 50 MB of callstack information for C++ heap manager calls.
Since that settings can also be enabled via the Windows Registry, a memory leak detector can make use of it easily.
General concept
As you can see, the general concept is to keep track of allocations and deallocations, compare them and show the callstacks of the difference.
Well, I am sorry if this feels like a repetition of old questions, I have gone through several questions on Stack Overflow, the Modern Operating Systems Book by tanenbaum, and have still to clear my doubts regarding this.
First off, I would appreciate any book/resource that I should go through in more detail to better understand this structure. I fail to understand if these are concepts generally explained in OS books, or Programming Languages or Architecture books.
Before I ask my questions, I will list out my findings based on readings about stacks/heaps
Heap
Contains all Instance Variables, Dynamically Allocated (new/malloc), and Global Variables only
Does not use the data structure heap anymore, uses more complex structures
Access through memory locations, individual process responsible for memory allocated on it
Defragmentation, and allocation of memory is done by the OS (If yes or no, please answer my question on who manages the heap, os or runtime environ)
Shared among all threads within the process which have access to its reference
Stack
Contains all local variables only. (Pushed on at function call)
Uses an actual Stack Data Structure for operation
Faster to access due to contiguous nature
Now, For a few of my questions regarding the same.
Global Variables, where do they get allocated? (My belief is that they get allocated on the heap, If so, When do they get allocated, at runtime or compile time, and one further question, can this memory be cleared (as in using delete)? )
What is the structure of the heap? How is the heap organized (is it managed by the os or the run time environment (as set up by the C/C++ compiler) ).
Does the stack hold ONLY method, and their local variables?
Each application (Process) is given a separate heap, but if you exceed heap allocations, then does it mean that the os was not able to allocate more memory? (I am assuming lack of memory causes the OS to reallocate to avoid fragmentation)
The Heap is accessible from all threads within the process (I believe this to be true). If yes all threads can access Instance Variables, Dynamically allocated variables, global variables (If they have a reference to it)
Different processes, cannot access each others heap (even if they are passed the address)
A Stack overflow crashes
only the current thread
current process
all processes
In C/C++, does memory get allocated during run time on the stack for block variables within a function (For example, if a sub-block (eg. For loop) of code creates a new variable is that allocated during run-time on the stack (or the heap) or is it preallocated?) when are they removed (Block level scope, how is that maintained). My belief on this is, all additions to stack are made at runtime before the start of a block, whenever the end of that block is reached, all elements added till that point are pushed.
The CPU's support for the stack register is limited to a stack pointer that can be incremented (pop) and decremented (push) via normal access to memory. (Is this true?)
Last, are both stack, and heap structures generated by the OS/Runtime environment that exist on Main Memory (As an abstraction?)
I know this is a lot, and I appear to be very confused throughout, I would appreciate it if you could point me in the right direction to get these things cleared up!
Global variables are allocated in a static section of memory that's laid out at compile time. The values are initialized during startup before main is entered. The initialization may, of course, allocate on the heap (i.e. a statically allocated std::string will have the structure itself sit in the statically laid out memory, but the string data it contains is allocated on the heap during startup). These things are deleted during normal program shutdown. You can't free them before then, if you wish to, you may want to wrap the value in a pointer, and initialize the pointer on program startup.
The heap is managed by an allocator library. There's one that comes with the C runtime, but also custom ones like tcmalloc or jemalloc that you can use in place of the standard allocator. These allocator get large pages of memory from the OS using system calls, and then give you portions of these pages when you call malloc. The organization of the heap is somewhat complex and varies between allocators, you can look up how they work on their websites.
Yes-ish. Though you can use library functions like alloca to make a chunk of space on the stack, and use that for whatever you want.
Each process has a separate memory space, that is, it thinks it is all alone and no other process exists. Generally the OS will give you more memory if you ask for it, but it can also enforce limits (like ulimit on linux), at which time it can refuse to give you more memory. Fragmentation isn't an issue for the OS because it gives memory in pages. However fragmentation in your process may cause your allocator to ask for more pages, even if there's empty space.
Yes.
Yes, however there's generally OS specific ways to create shared-memory regions that multiple processes can access.
stack overflows doesn't crash anything itself, it causes memory values to be written in places that may hold other values, thus corrupting it. Acting on corrupted memory causes crashes. When your process accesses unmapped memory (see note below) it crashes, not just a thread, but the whole process. It would not affect other processes since their memory spaces are isolated. (This is not true in old operating systems like Windows 95, where all processes shared the same memory space).
In C++, stack-allocated objects are created when the block is entered, and destroyed when the block is exited. The actual space on the stack may be allocated less precisely though, but the construction and destruction will take place at those particular points.
The stack pointer on x86 processes can be arbitrarily manipulated. It's common for compilers to generate code that simply add the amount of space to the stack pointer, and then set the memory for values on the stack, instead of doing a bunch of push operations.
The stacks and heap of the process all live in the same memory space.
An overview of how memory is organized may be helpful:
You have physical memory which the kernel sees.
The kernel maps pages of physical memory to pages of virtual memory when a process asks for it.
A process operates in its own virtual memory space, ignorant of other processes on the system.
When a process starts, it puts sections of the executable (code, globals, etc) into some of these virtual memory pages.
The allocator requests pages from the process in order to satisfy malloc calls, this memory constitutes the heap.
when a thread starts (or the initial thread for the process), it asks the OS for a few pages that form the stack. (You can also ask your heap allocator, and use the space it gives you as a stack).
When the program is running, it can freely access to all memory in its address space, heap, stack, whatever.
When you attempt to access a region of your memory space that is not mapped, your program crashes. (more specifically, you get a signal from the OS, which you can choose to handle).
Stack overflows tend to cause your programs to access such unmapped regions, which is why stack overflows tend to crash your program.
Where global variables are allocated is actually system dependent. Some systems will place them statically in the binary, some will allocate them on the heap, some will allocate them on the stack. If a global variable is a pointer, you can delete the value it points to, but there is no way to clear that memory otherwise. Destructors for global variables will be called automatically when the application exits (well, maybe not with a SIGTERM)
I'm not positive, but I imagine it's managed by the operating system, specifically the kernel.
Yes, and only to a certain point. For instance, you can't do infinite recursion because values will (no pun intended) stack up. You'll wind up with a, wait for it, stack overflow (AHH, there it is, he said it!)
Some operating systems may impose a heap size limit through a separate process, but generally if you fail to allocate memory it's because there's no memory left.
All threads share a common heap, and so yes, they can all access global variables, dynamically allocated, etc..
Generally correct, though on some really bare-bones architectures this may not be true. For the most part, the OS executes the process in the context of a virtual table, so the pointer values you're using actually point to a different memory address than what they would appear to.
The current process, if by process you mean OS-level process.
I'm assuming that is correct, but I don't know myself.
This one's out of my wheelhouse.
Yes, sort of. As I mentioned before, most OSes use vtables to map process pointers to main memory. Also, consider paging to disk (swapping)
I am trying to understand memory allocation in C++.
A question that comes to my mind is why is it so necessary to allocate memory? And what happens if we use memory without allocating it?
Also, I was shocked to see how careless C++ is on memory allocation. If gives free access to memory through arrays with no bounds checking.
int main()
{
int *p = new int[5];
p[1] = 3;
p[11118] = 9;
cout<<p[11118]<<'\n';
}
The above code works, outputs 9.
In what cases would assigning a value to a non allocated memory location be dangerous? What are the potential ill-effects? Is it possible that the memory location I am accessing has been allocated to some other program and assigning a value to it might cause that program to crash/behave in a very unexpected fashion?
The above code is Undefined Behaviour. It can work, work incorrectly, not work at all, crash, or order pizza through Microsoft Skype. Thou shalt not rely on undefined behavior.
Why is it neccessary to allocate memory?
Because that way, you mark the memory as yours. Nobody else can use it. It also verifies that there is in fact memory available. If your system only has 1000 bytes of memory, just picking byte 1500 to store some data at is a bad idea.
What happens if we use memory without allocating it?
Nobody knows. The address you write you might not exist. A different process might have already started using it, so you overwrite their data. The memory could be protected; in the former case, for instance, the operating system may notice that you are accessing memory another process has laid claim to, and stop you. You might own that region of memory, but a different part of the program is using it for some reason, and you've overwritten your own data.
Free access to memory through arrays with no bounds checking.
That code does not work... it functions as expected, at the moment, but that is not the same thing. Formally, that is undefined behavior, so the compiler can emit code to do anything it wants.
In what cases would assigning value to a non allocated memory location would be dangerous?
I gave some examples above. It is also possible to break your stack. When you call a function, the address the function should return to is stored. If you overwrite that value through careless memory access, then when you leave that function, who knows where you'll end up? Maybe the person exploiting your program... a common exploit is to load executable code into some part of memory, then use a bug in an existing program to run it. Once, on an embedded device I was working on, I had a fencepost error that resulted in my function returning into the middle of another instruction elsewhere. That should have crashed my chip, but as luck would have it the second half of that instruction was itself a valid instruction. The sequence of code that ended up running caused the device to gain sentience, and eventually finished the project we were working on itself. Now, it just plays WoW in my basement. Thus is the horror of undefined behavior.
Many good answers, but I feel that there's something missing regarding "why we need to allocate memory". I think it is important to know how the control flow of a computer program works at the lowest level, since C and C++ are relatively thin layers of abstraction over the hardware.
While it is possible to write a program in one huge global scope with ifs and gotos alone, most real-world programs are split into functions, which are separate, movable modules which can call each other at will. To keep track of all the data (arguments, return value, local variables), all this data is put on a one-dimensional, contiguous area of memory called the stack. Calling a function puts stuff on the stack, and returning from a function pops the data back off, and the same area of memory is overwritten by the next function call.
That way, all function code can be stored abstractly by just remembering offsets to local data relative to its entry point, and the same function can be called from many different contexts -- the function's local variables may be at different absolute addresses, but they're always at the same relative position relative to the function's entry address.
The fact that the stack memory is constantly overwritten as functions get called and return means that you cannot place any persistent data on the stack, i.e. in a local variable, because the memory for the local variables is not kept intact after the function returns. If your function needs to store persistent data somewhere, it must store that data somewhere else. This other location is the so-called heap, on which you manually (also called "dynamically") request persistent storage via malloc or new. That area of memory lies elsewhere and will not be recycled or overwritten by anyone, and you may safely pass a pointer to that memory around for as long as you like. The only downside is that unless you manually tell the system that you're done, it won't be able to use the memory for anything else, which is why you must manually clean up this dynamically allocated memory. But the need for functions to store persistent information is the reason we need to allocate memory.
(Just to complete the picture: local variables on the stack are said to be "automatically allocated". There is also "static allocation", which happens at compile time and is where global variables live. If you have a global char[30000], you may happily read from and write to that from anywhere in your program.)
Allocating memory on the heap allows dynamic allocation of a dynamic amount of memory with a dynamic lifetime.
If you want bounds-checking, you can get it through std::vector::at().
In what cases would assigning value to a non allocated memory location would be dangerous?
All cases.
what are the potential ill-affects?
Unexpected behavior.
Is it possible that the memory location i am accessing has been allocated to some other program and assigning a value to it might cause that program to crash/behave in a very unexpected fashion?
Depends on the operating system.
This seems like two questions:
Why doesn't c++ do bounds-checking?
Why do we need dynamic memory allocation?
My answers:
Because then it'd be slower. You can always write an accessor function that checks bounds, like std::vector::at().
Because not being able to resize memory at runtime can be very inconvenient (see early FORTRAN).
In most operating systems, there is a distinct separation between the physical memory available in the host computer, and the logical memory footprint that application code can see. This is mediated, in most cases, by a part of the CPU called the Memory Management Unit (or MMU), and it serves a number of useful goals.
The most obvious is that it allows you to assign more memory to an application (or multiple applications) than is actually present on the machine. When the application asks for some data from memory, the MMU calls the operating system to figure out where that memory really is, either in core or on disk, if it has been paged out.
Another use for this is to segment some addresses for purposes other than application use, for instance the GPU's in most computers are controlled through a region of memory that is visible to the CPU as core memory, and it can read or write to that area of memory very efficiently. the MMU provides a way for the OS to use that memory, but make it inaccessible to normal applications.
Because of this segmenting, and for other reasons, the full range of addresses are not normally available to applications until the ask the OS for some memory for a particular purpose. For instance, on linux, applications ask for more core memory by calling brk or sbrk, and they ask for memory mapped IO by calling mmap. Until an address is returned through one of those calls, the address is unmapped, and accessing it will cause a segfault, normally terminating the offending program.
Some platforms only expose memory to the application that it knows has been mapped, but C++ errs on the side of performance, it never does bounds checking automatically, because that would require some extra instructions to be executed, and on some platforms the particular instructions could be very costly. On the other hand, C++ does provide for bounds checking, if you want it, through the standard template library.
Is it possible that the memory
location i am accessing has been
allocated to some other program and
assigning a value to it might cause
that program to crash/behave in a very
unexpected fashion?
No, modern OSs are designed just to avoid that (for security reasons).
And you have to allocate memory because, although every process has its own 4GB space (provided by Windows), they all share the same xxGB the user has on his machine. Allocating memory helps the operating system know which applications need more memory and give it only to who need it.
Why my "hello world" would need the same RAM crysys 2 needs? :P
EDIT:
Ok, someone misunderstood what I meant. I didn't say it's ok and everyone can do it and nothing will happen. I just said doing this won't harm any extern process. It still is undefined behavior because no one knows what's at p + 11118, but ub doesn't mean "it can order a pizza through skype" nor other "exciting things", at most an access violation, nothing more.