I understand pointer allocation of memory fully, but deallocation of memory only on a higher level. What I'm most curious about is how C++ keeps track of what memory has already been deallocated?
int* ptr = new int;
cout << ptr;
delete ptr;
cout << ptr;
// still pointing to the same place however it knows you can't access it or delete it again
*ptr // BAD
delete ptr // BAD
How does C++ know I deallocated that memory. If it just turns it to arbitrary garbage binary numbers, wouldn't I just be reading in that garbage number when I dereference the pointer?
Instead, of course, c++ knows that these are segfaults somehow.
C++ does not track memory for you. It doesn't know, it doesn't care. It is up to you: the programmer. (De)allocation is a request to the underlying OS. Or more precisely it is a call to libc++ (or possibly some other lib) which may or may not access the OS, that is an implementation detail. Either way the OS (or some other library) tracks what parts of memory are available to you.
When you try to access a memory that the OS did not assigned to you, then the OS will issue segfault (technically it is raised by the CPU, assuming it supports memory protection, it's a bit complicated). And this is a good situation. That way the OS tells you: hey, you have a bug in your code. Note that the OS doesn't care whether you use C++, C, Rust or anything else. From the OS' perspective everything is a machine code.
However what is worse is that even after delete the memory may still be owned by your process (remember those libs that track memory?). So accessing such pointer is an undefined behaviour, anything can happen, including correct execution of the code (that's why it is often hard to find such bugs).
If it just turns it to arbitrary garbage binary numbers, wouldn't I just be reading in that garbage number when I dereference the pointer?
Who says it turns into garbage? What really happens to the underlying memory (whether the OS reclaims it, or it is filled with zeros or some garbage, or maybe nothing) is none of your concern. Everything you need to know is that after delete it is no longer safe to use the pointer. Even (or especially) when it looks ok.
How does C++ know I deallocated that memory.
When you use a delete expression, "C++ knows" that you deallocated that memory.
If it just turns it to arbitrary garbage binary numbers
C++ doesn't "turn [deallocated memory] to arbitrary garbage binary numbers". C++ merely makes the memory available for other allocations. Changing the state of that memory may be a side effect of some other part of the program using that memory - which it is now free to do.
wouldn't I just be reading in that garbage number when I dereference the pointer?
When you indirect through the pointer, the behaviour of the program is undefined.
Instead, of course, c++ knows that these are segfaults somehow.
This is where your operating system helpfully stepped in. You did something that did not make sense, and the operating system killed the misbehaving process. This is one of the many things that may but might not happen when the behaviour of the program is undefined.
I take it that you wonder what delete actually does. Here it is:
First of all, it destructs the object. If the object has a destructor, it is called, and does whatever it is programmed to do.
delete then proceeds to deallocate the memory itself. This means that the deallocator function (::operator delete() in most cases in C++) typically takes the memory object, and adds it to its own, internal data structures. I.e. it makes sure that the next call to ::operator new() can find the deallocated memory slab. The next new might then reuse that memory slab for other purposes.
The entire management of memory happens by using data structures that you do not see, or need to know that they exist. How an implementation of ::operator new() and ::operator delete() organizes its internal data is strictly and fully up to the implementation. It doesn't concern you.
What concerns you is, that the language standard defines that any access to a memory object is undefined behavior after you have passed it to the delete operator. Undefined behavior does not mean that the memory needs to vanish magically, or that it becomes inaccessible, or that it is filled with garbage. Usually none of these happens immediately, because making the memory inaccessible or filling it with garbage would require explicit action from the CPU, so implementations don't generally touch what's written in the memory. You are just forbidden to make further accesses, because it's now up to system to use the memory for any other purpose it likes.
C++ still has a strong C inheritance when it comes to memory addressing. And C was invented to build an OS (first version of Unix) where it makes sense to use well known register addresses or to whatever low level operation. That means that when you address memory through a pointer, you as the programmer are supposed to know what lies there and the language just trusts you.
On common implementations, the language requests chunks of memory from the OS for new dynamic objects, and keeps track of used and unused memory block. The goal is to re-use free blocks for new dynamic objects instead of asking the OS for each and every allocation and de-allocation.
Still for common implementation, nothing changes in a freshly allocated or deallocated block, but the pointers maintaining a list of free blocks. AFAIK few return memory to the OS until the end of the process. But a free block could be later re-used, that is the reason why when a careless programmer tries to access a block of memory containing pointers that has been re-used, SEGFAULT is not far, because the program could try to use arbitrary memory addresses that could not be mapped for the process.
BTW, the only point required by the standard is that accessing an object past its end of life, specifically here using the pointer after the delete statement invokes Undefined Behaviour. Said differently anything can happen from an immediate crash to normal results, passing through later crash or abnormal result in unrelated places of the program...
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
So I know that heap is where the memory is stored. Unlike stack, if the user does not delete the heap, the heap keeps existing.
However, I have problem with "an object has been returned back to the memory heap" part. Does returned back means that the object has been freed? If not, why it is a disaster to have a pointer/reference to that object?
Firstly, heap and stack are not C++ concepts. They refer to particular types of memory, as managed on some systems.
Second, what is often described as "heap" is, in C++, referred to as "dynamically allocated memory".
When dynamically allocated memory is released by the program (e.g. using operator delete on something obtained using operator new, using free() on a pointer returned by C's malloc()) that memory no longer exists as far as your program is concerned, but the pointer value does not change. Using a pointer or reference to something that no longer exists gives, in the language of the C++ standard, undefined behaviour.
Practically, the memory might exist physically on your system, or even still be allocated by the host system to your program. However, once it has been released by your program, there is nothing preventing the memory being reused. Your program might use operator new to allocate more memory, so the memory that was previously released is now being used for something else - completey unrelated to the original usage - by your program. That is inherently dangerous - the results can be anything (which is, loosely, the implication of undefined behaviour in the C++ standard).
The operating system might also have recovered the logical or physical memory (after all, your program has indicated no more use for that memory) and allocated it to another program. That is dangerous for both your program and the other program that has been allocated memory. Which is why most modern operating systems prevent that e.g. by forcably terminating your program if the operating system detects an access to memory it no longer owns.
A real danger is that you might access a released object for a while, and everything seems to work as required. Only to crash later on. There can often be a significant time interval between some memory being released by your program, and being reused for something else.
Such flaws in programs tend to infuriate users of that program (users, for some obscure reason, tend to be less than appreciative when programs crash in unpredictable and unrepeatable ways in the middle of their work). Such flaws in your code also tend to be very difficult to track down, and therefore hard to correct.
I have problem with "an object has been returned back to the memory heap" part. Does returned back means that the object has been freed? If not, why it is a disaster to have a pointer/reference to that object?
It might help to break down the terminology and clarify a few things:
Firstly, the term Object in C++ refers to something residing in memory, but does not refer to the memory itself.
It's worth making a distinction between an object and the memory storing the object because the actions of allocating memory and initialising an object are distinct and separate.
The following line is doing two things:
new int(123);
It is allocating an area of memory on the heap to your program whose size in bytes is equal to sizeof(int). The effect of the allocation of memory is not to create an object, nor is it even to change anything about the content of that memory; it is simply to ringfence that memory for your program to use.
Any "junk" or garbage values which might have been pre-existing in that memory will not be changed.
After the memory is allocated, that chunk of memory is initialised with an int object containing 123. Objects may only be initialised within an area of memory which is already allocated to your program.
Given that the new operator is doing two different things, it follows that the delete operator may also perform two different actions:
delete will destroy the object; this is usually limited to releasing resources owned by the object. Destroying an object typically does not bother to change or reset any memory which is about to be deallocated because that's usually a waste of CPU cycles.
After an object is destroyed, you might find the remains or partial remains of that object stuck around in memory for a while, but as far as your program is concerned, the object itself is dead/gone and the memory contents are junk.
delete will de-allocate the memory. The process of de-allocation is not to change the content of that memory, it is simply to release that memory for use by something else. Trying to access that area of memory after it has been deallocated is undefined behaviour.
Remember that a pointer cannot ever point-to an object because an object is "The thing in memory" rather than the memory itself; raw pointers in C++ are fairly simple/dumb - they don't track anything about the state of the objects in their pointed-to memory. If an object is moved or copied or destroyed, the pointer will know nothing about it.
A pointer is an address for a location in memory.
It helps to think of pointer-values (addresses) and pointer-variables which contain those values/addresses. Sadly the term pointer has historically had a bit of a double-meaning, to refer both to the pointer-value and the pointer-address, which can seem confusing.
Typically the memory location pointed-by a pointer-variable should contain a valid object within an allocated area of memory, but it also might not.
C++ does not protect you against dangling pointers, where a dangling pointer is a pointer-variable containing an address of an area of memory which has not been allocated to your program.
Neither does C++ protect you against pointers-to un-initialised memory (e.g. if you have allocated memory using malloc but not initialised its contents).
Lastly, C++ does not protect you against memory leaks; you as the programmer are responsible for keeping track of all your allocated memory.
These last 3 points are among some of the many reasons why new/delete tend not to be used very often in modern C++, and why & references, std::vector, std::shared_ptr and other alternatives are preferred; they take care of many nuances and gotchas surrounding new/delete.
For the sake of this question I will picture memory as a simple array of bytes, and I will be talking about heap memory because it is possible to dynamically allocate it.
Lets say that I am instantiating some class, and creating an object on the heap where some memory has already been allocated. Then, after creating the object, I allocate some more memory (maybe by instantiating another class). This implies the use of new and delete keywords, of course.
The memory now looks like this:
... byte byte my_object ... my_object byte byte ...
What exactly happens when delete my_object; is executed? Is all other memory shifted to the left by sizeof(MyClass)? If so, by who? The OS? Then what happens when there is no OS to provide virtual memory?
No, nothing gets shifted. Instead, memory gets fragmented, meaning that you now have a unused hole in the middle of used memory. A subsequent allocation might be able to re-use part or all of that memory (provided the requested number of bytes is small enough to fit in the hole).
Some languages/environments support compacting garbage collectors. Such collectors are permitted to move objects around and can therefore eliminate holes if they choose to. Such approaches are complicated to implement since the collector needs to know the location of every single pointer within the program. Collectors of this type are therefore more suitable for higher-level languages.
If the memory were shifted, that'd be a pretty bad OS IMO. Typically, the OS is notified that that memory is available for re-use. It's not even required to be cleared (and most of the time isn't). When no more memory can be allocated, you'd typically get an exception (if you're using new) or a NULL pointer back (if you're using malloc).
If fragmentation is a concern (it sometimes is), you'll have to write your own memory pool you can use (existing) memory pools which can deal with that, but even so, most of the responsibility still falls on the programmer.
Memory is not shifted to the left. Imagine what would happen if it was. All those pointers "on the right" would become invalid.
On a typical implementation (without a moving garbage collector for instance) nothing will be moved.
Bames53 says that Herb Sutter says that the standard says that automatic movement of allocated objects is illegal. Thanks Bames53.
I will preface this question to c/c++ as it mostly pertains to that, and I have seen it have the most impact with c/c++.
this has concerned me for some time, and I understand some of this problem can be avoided (and I would like to avoid the lectures on ways to avoid, but rather focus on the aftermath just in case it does happen), but I would still have the underlying question.
initial thoughts:
A pointer simply serves as a address to an object somewhere else in memory (this can be because of needing to modify the number of things of that type int[], or because the nature of the thing can change throughout the lifespan of the thing polymophism)
anytime the keyword new is used it should have a corresponding keyword delete (if not multiple depending on exception handling, and multiple exit points)
when a dynamically allocated memory chunk is acted upon by keyword delete the destructor is called (and its actions are performed if any), the memory chunk is returned to the system store to be made available for other things, and (depending on compiler, macros, or programmer) the pointer is set to NULL to avoid illegal memory accessing.
situation:
when I am writing a program that uses dynamic memory (combination of pointers, new, and delete). if something happens, and the program terminates unexpectedly (unhandled exception, memory access error, illegal operation. etc). the system should attempt to remove all memory that the program is using, and return it to the system, but pointers are not always cleared. this may vary between operating system, and compiler (on how program termination is performed), but the things that were pointed to may still exist in memory because all that was deleted was the pointer, and not the thing that was pointed to. granted this can be quite small loss (less then a MB for a small program, but for say stress testing a data store, or processing large files this can be quite large possibly even in the GB range.
the direct question is what steps can be taken to get that memory back? the only thing that I have found that works is to just restart the system (this is when using g++, and VS2008/2010 on a windows system)
If the program terminates, then all memory it was using is returned to the system. At least under Windows which you say you are using. If you think this is not happening, then perhaps your program is not actually terminating at all.
The heap is bound to the allocator, and the allocator is bound to the process. When the process exits, the heap comes undone. Only system-shared resources aren't deallocated.
I am trying to understand memory allocation in C++.
A question that comes to my mind is why is it so necessary to allocate memory? And what happens if we use memory without allocating it?
Also, I was shocked to see how careless C++ is on memory allocation. If gives free access to memory through arrays with no bounds checking.
int main()
{
int *p = new int[5];
p[1] = 3;
p[11118] = 9;
cout<<p[11118]<<'\n';
}
The above code works, outputs 9.
In what cases would assigning a value to a non allocated memory location be dangerous? What are the potential ill-effects? Is it possible that the memory location I am accessing has been allocated to some other program and assigning a value to it might cause that program to crash/behave in a very unexpected fashion?
The above code is Undefined Behaviour. It can work, work incorrectly, not work at all, crash, or order pizza through Microsoft Skype. Thou shalt not rely on undefined behavior.
Why is it neccessary to allocate memory?
Because that way, you mark the memory as yours. Nobody else can use it. It also verifies that there is in fact memory available. If your system only has 1000 bytes of memory, just picking byte 1500 to store some data at is a bad idea.
What happens if we use memory without allocating it?
Nobody knows. The address you write you might not exist. A different process might have already started using it, so you overwrite their data. The memory could be protected; in the former case, for instance, the operating system may notice that you are accessing memory another process has laid claim to, and stop you. You might own that region of memory, but a different part of the program is using it for some reason, and you've overwritten your own data.
Free access to memory through arrays with no bounds checking.
That code does not work... it functions as expected, at the moment, but that is not the same thing. Formally, that is undefined behavior, so the compiler can emit code to do anything it wants.
In what cases would assigning value to a non allocated memory location would be dangerous?
I gave some examples above. It is also possible to break your stack. When you call a function, the address the function should return to is stored. If you overwrite that value through careless memory access, then when you leave that function, who knows where you'll end up? Maybe the person exploiting your program... a common exploit is to load executable code into some part of memory, then use a bug in an existing program to run it. Once, on an embedded device I was working on, I had a fencepost error that resulted in my function returning into the middle of another instruction elsewhere. That should have crashed my chip, but as luck would have it the second half of that instruction was itself a valid instruction. The sequence of code that ended up running caused the device to gain sentience, and eventually finished the project we were working on itself. Now, it just plays WoW in my basement. Thus is the horror of undefined behavior.
Many good answers, but I feel that there's something missing regarding "why we need to allocate memory". I think it is important to know how the control flow of a computer program works at the lowest level, since C and C++ are relatively thin layers of abstraction over the hardware.
While it is possible to write a program in one huge global scope with ifs and gotos alone, most real-world programs are split into functions, which are separate, movable modules which can call each other at will. To keep track of all the data (arguments, return value, local variables), all this data is put on a one-dimensional, contiguous area of memory called the stack. Calling a function puts stuff on the stack, and returning from a function pops the data back off, and the same area of memory is overwritten by the next function call.
That way, all function code can be stored abstractly by just remembering offsets to local data relative to its entry point, and the same function can be called from many different contexts -- the function's local variables may be at different absolute addresses, but they're always at the same relative position relative to the function's entry address.
The fact that the stack memory is constantly overwritten as functions get called and return means that you cannot place any persistent data on the stack, i.e. in a local variable, because the memory for the local variables is not kept intact after the function returns. If your function needs to store persistent data somewhere, it must store that data somewhere else. This other location is the so-called heap, on which you manually (also called "dynamically") request persistent storage via malloc or new. That area of memory lies elsewhere and will not be recycled or overwritten by anyone, and you may safely pass a pointer to that memory around for as long as you like. The only downside is that unless you manually tell the system that you're done, it won't be able to use the memory for anything else, which is why you must manually clean up this dynamically allocated memory. But the need for functions to store persistent information is the reason we need to allocate memory.
(Just to complete the picture: local variables on the stack are said to be "automatically allocated". There is also "static allocation", which happens at compile time and is where global variables live. If you have a global char[30000], you may happily read from and write to that from anywhere in your program.)
Allocating memory on the heap allows dynamic allocation of a dynamic amount of memory with a dynamic lifetime.
If you want bounds-checking, you can get it through std::vector::at().
In what cases would assigning value to a non allocated memory location would be dangerous?
All cases.
what are the potential ill-affects?
Unexpected behavior.
Is it possible that the memory location i am accessing has been allocated to some other program and assigning a value to it might cause that program to crash/behave in a very unexpected fashion?
Depends on the operating system.
This seems like two questions:
Why doesn't c++ do bounds-checking?
Why do we need dynamic memory allocation?
My answers:
Because then it'd be slower. You can always write an accessor function that checks bounds, like std::vector::at().
Because not being able to resize memory at runtime can be very inconvenient (see early FORTRAN).
In most operating systems, there is a distinct separation between the physical memory available in the host computer, and the logical memory footprint that application code can see. This is mediated, in most cases, by a part of the CPU called the Memory Management Unit (or MMU), and it serves a number of useful goals.
The most obvious is that it allows you to assign more memory to an application (or multiple applications) than is actually present on the machine. When the application asks for some data from memory, the MMU calls the operating system to figure out where that memory really is, either in core or on disk, if it has been paged out.
Another use for this is to segment some addresses for purposes other than application use, for instance the GPU's in most computers are controlled through a region of memory that is visible to the CPU as core memory, and it can read or write to that area of memory very efficiently. the MMU provides a way for the OS to use that memory, but make it inaccessible to normal applications.
Because of this segmenting, and for other reasons, the full range of addresses are not normally available to applications until the ask the OS for some memory for a particular purpose. For instance, on linux, applications ask for more core memory by calling brk or sbrk, and they ask for memory mapped IO by calling mmap. Until an address is returned through one of those calls, the address is unmapped, and accessing it will cause a segfault, normally terminating the offending program.
Some platforms only expose memory to the application that it knows has been mapped, but C++ errs on the side of performance, it never does bounds checking automatically, because that would require some extra instructions to be executed, and on some platforms the particular instructions could be very costly. On the other hand, C++ does provide for bounds checking, if you want it, through the standard template library.
Is it possible that the memory
location i am accessing has been
allocated to some other program and
assigning a value to it might cause
that program to crash/behave in a very
unexpected fashion?
No, modern OSs are designed just to avoid that (for security reasons).
And you have to allocate memory because, although every process has its own 4GB space (provided by Windows), they all share the same xxGB the user has on his machine. Allocating memory helps the operating system know which applications need more memory and give it only to who need it.
Why my "hello world" would need the same RAM crysys 2 needs? :P
EDIT:
Ok, someone misunderstood what I meant. I didn't say it's ok and everyone can do it and nothing will happen. I just said doing this won't harm any extern process. It still is undefined behavior because no one knows what's at p + 11118, but ub doesn't mean "it can order a pizza through skype" nor other "exciting things", at most an access violation, nothing more.
I know free() won't call the destructor, but what else will this cause besides that the member variable won't be destructed properly?
Also, what if we delete a pointer that is allocated by malloc?
It is implementation defined whether new uses malloc under the hood. Mixing new with free and malloc with delete could cause a catastrophic failure at runtime if the code was ported to a new machine, a new compiler, or even a new version of the same compiler.
I know free() won't call the destructor
And that is reason enough not to do it.
In addition, there's no requirement for a C++ implementation to even use the same memory areas for malloc and new so it may be that you're trying to free memory from a totally different arena, something which will almost certainly be fatal.
Many points:
It's undefined behaviour, and hence inherently risky and subject to change or breakage at any time and for no reason at all.
(As you know) delete calls the destructor and free doesn't... you may have some POD type and not care, but it's easy for someone else to add say a string to that type without realising there are weird limitations on its content.
If you malloc and forget to use placement new to construct an object in it, then invoke a member function as if the object existed (including delete which calls the destructor), the member function may attempt operations using pointers with garbage values
new and malloc may get memory from different heaps.
Even if new calls malloc to get its memory, there may not be a 1:1 correspondence between the new/delete and underlying malloc/free behaviour.
e.g. new may have extra logic such as small-object optimisations that have proven beneficial to typical C++ programs but harmful to typical C programs.
Someone may overload new, or link in a debug version of malloc/realloc/free, either of which could break if you're not using the functions properly.
Tools like ValGrind, Purify and Insure won't be able to differentiate between the deliberately dubious and the accidentally.
In the case of arrays, delete[] invokes all the destructors and free() won't, but also the heap memory typically has a counter of the array size (for 32-bit VC++2005 Release builds for example, the array size is in the 4 bytes immediately before the pointer value visibly returned by new[]. This extra value may or may not be be there for POD types (not for VC++2005), but if it is free() certainly won't expect it. Not all heap implementations allow you to free a pointer that's been shifted from the value returned by malloc().
An important difference is that new and delete also call the constructor and destructor of the object. Thus, you may get unexpected behavior. That is the most important thing i think.
Because it might not be the same allocator, which could lead to weird, unpredictable behaviour. Plus, you shouldn't be using malloc/free at all, and avoid using new/delete where it's not necessary.
It totally depends on the implementation -- it's possible to write an implementation where this actually works fine. But there's no guarantee that the pool of memory new allocates from is the same pool that free() wants to return the memory to. Imagine that both malloc() and new use a few bytes of extra memory at the beginning of each allocated block to specify how large the block is. Further, imagine that malloc() and new use different formats for this info -- for example, malloc() uses the number of bytes, but new uses the number of 4-byte long words (just an example). Now, if you allocate with malloc() and free with delete, the info delete expects won't be valid, and you'll end up with a corrupted heap.