C++ garbage collected compiler - c++

Does anyone know of a good compiler for c++ that supports garbage collection. I know that they were considering it for c++11 but didn't implement it.

One of the most-often heard of approaches is to use Hans Boehm's GC, which can be plugged into C++. Of course, an alternative is to use smart pointers that keep track of the use of objects.
For everybody who upvoted the "who needs this" comment, the answer is that it can be more expensive:
Imagine that you fork() your program and now start adjusting refcounters in objects that remain constant otherwise. This will cause performance overhead because it means that the OS can't share the memory between the two processes, i.e. it breaks copy-on-write. In some cases, it can mean that the OS has to swap in memory only to copy and adjust reference counters.
Another example is something like the suggested boost::shared_ptr. Each of these has an additional allocation as overhead in order to store the reference counter, weak reference counter and deleter. This doesn't come for free either. Further, an instance thereof has twice the size of a pointer.
Then, if you use a normal size_t for the refcounter and built-in increment/decrement, your code isn't multithreading safe. However, if you use atomic integers, incrementing and decrementing takes much more time to flush caches and because it disallows reordering. Remember, every time you copy such a pointer, you have to increment the reference counter. Every time one instance is destroyed, you have to decrement the counter again. Maintaining a reference count can accumulate to much higher overhead that using a mark-and-sweep GC to count references now and then.
Lastly, refcounted pointers need the programmer to actively consider the possibility of cycles. GCs can detect and break cycles automatically.
If you keep the above in mind, a GC is an alternative. It does have disadvantages, like non-deterministic cleanup, but Java and C# show that you can live with this and there is nothing that keeps you from programming it yourself in those places where you really need it.

Usually, you can get around fine using RAII and smart pointers (such as shared_ptr and unique_ptr in C++11).
However, if you need garbage collection, look into Boehm's garbage collector. You could overload operator new as following.
enum GCPlacement {
NoGC,
GC,
};
void* operator new(size_t size, GCPlacement gcp) {
void* toReturn;
if (gcp == GC) toReturn = GC_MALLOC(size);
else toReturn = GC_MALLOC_UNCOLLECTABLE(size);
if (!toReturn) throw std::bad_alloc();
else return toReturn;
}
void operator delete(void* p, GCPlacement) {
GC_FREE(p);
}
Now, you can allocate garbage collected memory as following:
Object* o = new (GC) Object();
If you want, you can also derive certain classes from the gc class provided by boehmgc to indicate these should always be allocated using garbage collection.
C++/CLI is another solution, but be advised that it technically is not C++ (it is an extension of a partial implementation of C++) and it ties you to the Microsoft/.NET platform -- essentially, it's just C# with a C++ syntax.

Related

Why garbage collection when RAII is available?

I hear talks of C++14 introducing a garbage collector in the C++ standard library itself.
What is the rationale behind this feature? Isn't this the reason that RAII exists in C++?
How will the presence of standard library garbage collector affect the RAII semantic?
How does it matter to me(the programmer) or the way in which I write C++ programs?
Garbage collection and RAII are useful in different contexts. The presence of GC should not affect your use of RAII. Since RAII is well-known, I give two examples where GC is handy.
Garbage collection would be a great help in implementing lock-free data structures.
[...] it turns out that deterministic memory freeing is quite a fundamental problem in lock-free data structures. (from Lock-Free Data Structures By Andrei Alexandrescu)
Basically the problem is that you have to make sure you are not deallocating the memory while a thread is reading it. That's where GC becomes handy: It can look at the threads and only do the deallocation when it is safe. Please read the article for details.
Just to be clear here: it doesn't mean that the WHOLE WORLD should be garbage collected as in Java; only the relevant data should be garbage collected accurately.
In one of his presentations, Bjarne Stroustrup also gave a good, valid example where GC becomes handy. Imagine an application written in C/C++, 10M SLOC in size. The application works reasonably well (fairly bug free) but it leaks. You neither have the resources (man hours) nor the functional knowledge to fix this. The source code is a somewhat messy legacy code. What do you do? I agree that it is perhaps the easiest and cheapest way to sweep the problem under the rug with GC.
As it has been pointed out by sasha.sochka, the garbage collector will be optional.
My personal concern is that people would start using GC like it is used in Java and would write sloppy code and garbage collect everything. (I have the impression that shared_ptr has already become the default 'go to' even in cases where unique_ptr or, hell, stack allocation would do it.)
I agree with #DeadMG that there is no GC in current C++ standard but I would like to add the following citation from B. Stroustrup:
When (not if) automatic garbage collection becomes part of C++, it
will be optional
So Bjarne is sure that it will be added in future. At least the chairman of the EWG (Evolution Working Group) and one of the most important committee members (and more importantly language creator) wants to add it.
Unless he changed his opinion we can expect it to be added and implemented in the future.
There are some algorithms which are complicated/inefficient/impossible to write without a GC. I suspect this is the major selling point for GC in C++, and can't ever see it being used as a general-purpose allocator.
Why not a general-purpose allocator?
First, We have RAII, and most (including me) seem to believe that this is a superior method of resource management. We like determinism because it makes writing robust, leak-free code a lot simpler and makes performance predictable.
Second, you'll need to place some very un-C++-like restrictions on how you can use memory. For instance, you'd need at least one reachable, un-obfuscated pointer. Obfuscated pointers, as are popular in common tree container libraries (using alignment-guaranteed low bits for color flags) among others, won't be recognizable by the GC.
Related to that, the things which make modern GCs so usable are going to be very difficult to apply to C++ if you support any number of obfuscated pointers. Generational defragmenting GCs are really cool, because allocating is extremely cheap (essentially just incrementing a pointer) and eventually your allocations get compacted into something smaller with improved locality. To do this, objects need to be movable.
To make an object safely movable, the GC needs to be able to update all the pointers to it. It won't be able to find obfuscated ones. This could be accomodated, but wouldn't be pretty (probably a gc_pin type or similar, used like current std::lock_guard, which is used whenever you need a raw pointer). Usability would be out the door.
Without making things movable, a GC would be significantly slower and less scalable than what you're used to elsewhere.
Usability reasons (resource management) and efficiency reasons (fast, movable allocations) out of the way, what else is GC good for? Certainly not general-purpose. Enter lock-free algorithms.
Why lock-free?
Lock-free algorithms work by letting an operation under contention go temporarily "out of sync" with the data structure and detecting/correcting this at a later step. One effect of this is that under contention memory might be accessed after it has been deleted. For example, if you have multiple threads competing to pop a node from a LIFO, it is possible for one thread to pop and delete the node before another thread has realized the node was already taken:
Thread A:
Get pointer to root node.
Get pointer to next node from root node.
Suspend
Thread B:
Get pointer to root node.
Suspend
Thread A:
Pop node. (replace root node pointer with next node pointer, if root node pointer hasn't changed since it was read.)
Delete node.
Suspend
Thread B:
Get pointer to next node from our pointer of root node, which is now "out of sync" and was just deleted so instead we crash.
With GC you can avoid the possibility of reading from uncommitted memory because the node would never be deleted while Thread B is referencing it. There are ways around this, such as hazard pointers or catching SEH exceptions on Windows, but these can hurt performance significantly. GC tends to be the most optimal solution here.
There isn't, because there isn't one. The only features C++ ever had for GC were introduced in C++11 and they're just marking memory, there's no collector required. Nor will there be in C++14.
There is no way in hell a collector could pass Committee, is my opinion.
None of the answers so far touch upon the most important benefit of adding garbage-collection to a language: In the absence of language-supported garbage-collection, it's almost impossible to guarantee that no object will be destroyed while references to it exist. Worse, if such a thing does happen, it's almost impossible to guarantee that a later attempt to use the reference won't end up manipulating some other random object.
Although there are many kinds of objects whose lifetimes can be much better managed by RAII than by a garbage collector, there's considerable value in having the GC manage nearly all objects, including those whose lifetime is controlled by RAII. An object's destructor should kill the object and make it useless, but leave the corpse behind for the GC. Any reference to the object will thus become a reference to the corpse, and will remain one until it (the reference) ceases to exist entirely. Only when all references to the corpse have ceased to exist will the corpse itself do so.
While there are ways of implementing garbage collectors without inherent language support, such implementations either require that the GC be informed any time references are created or destroyed (adding considerable hassle and overhead), or run the risk that a reference the GC doesn't know about might exist to an object which is otherwise unreferenced. Compiler support for GC eliminates both those problems.
GC has the following advantages:
It can handle circular references without programmer assistance (with RAII-style, you have to use weak_ptr to break circles). So a RAII style application can still "leak" if it is used improperly.
Creating/destroying tons of shared_ptr's to a given object can be expensive because refcount increment/decrement are atomic operations. In multi-threaded applications the memory locations which contains refcounts will be "hot" places, putting a lot of pressure on the memory subsystem. GC isn't prone to this specific issue, because it uses reachable sets instead of refcounts.
I am not saying that GC is the best/good choice. I am just saying that it has different characteristics. In some scenarios that might be an advantage.
Definitions:
RCB GC: Reference-Counting Based GC.
MSB GC: Mark-Sweep Based GC.
Quick Answer:
MSB GC should be added into the C++ standard, because it is more handy than RCB GC in certain cases.
Two illustrative examples:
Consider a global buffer whose initial size is small, and any thread can dynamically enlarge its size and keep the old contents accessible for other threads.
Implementation 1 (MSB GC Version):
int* g_buf = 0;
size_t g_current_buf_size = 1024;
void InitializeGlobalBuffer()
{
g_buf = gcnew int[g_current_buf_size];
}
int GetValueFromGlobalBuffer(size_t index)
{
return g_buf[index];
}
void EnlargeGlobalBufferSize(size_t new_size)
{
if (new_size > g_current_buf_size)
{
auto tmp_buf = gcnew int[new_size];
memcpy(tmp_buf, g_buf, g_current_buf_size * sizeof(int));
std::swap(tmp_buf, g_buf);
}
}
Implementation 2 (RCB GC Version):
std::shared_ptr<int> g_buf;
size_t g_current_buf_size = 1024;
std::shared_ptr<int> NewBuffer(size_t size)
{
return std::shared_ptr<int>(new int[size], []( int *p ) { delete[] p; });
}
void InitializeGlobalBuffer()
{
g_buf = NewBuffer(g_current_buf_size);
}
int GetValueFromGlobalBuffer(size_t index)
{
return g_buf[index];
}
void EnlargeGlobalBufferSize(size_t new_size)
{
if (new_size > g_current_buf_size)
{
auto tmp_buf = NewBuffer(new_size);
memcpy(tmp_buf, g_buf, g_current_buf_size * sizeof(int));
std::swap(tmp_buf, g_buf);
//
// Now tmp_buf owns the old g_buf, when tmp_buf is destructed,
// the old g_buf will also be deleted.
//
}
}
PLEASE NOTE:
After calling std::swap(tmp_buf, g_buf);, tmp_buf owns the old g_buf. When tmp_buf is destructed, the old g_buf will also be deleted.
If another thread is calling GetValueFromGlobalBuffer(index); to fetch the value from the old g_buf, then A Race Hazard Will Occur!!!
So, though implementation 2 looks as elegant as implementation 1, it doesn't work!
If we want to make implementation 2 work correctly, we must add some kind of lock-mechanism; then it will be not only slower, but less elegant than implementaion 1.
Conclusion:
It is good to take MSB GC into the C++ standard as an optional feature.

Why should C++ programmers minimize use of 'new'?

I stumbled upon Stack Overflow question Memory leak with std::string when using std::list<std::string>, and one of the comments says this:
Stop using new so much. I can't see any reason you used new anywhere you did. You can create objects by value in C++ and it's one of the huge advantages to using the language. You do not have to allocate everything on the heap. Stop thinking like a Java programmer.
I'm not really sure what he means by that.
Why should objects be created by value in C++ as often as possible, and what difference does it make internally? Did I misinterpret the answer?
There are two widely-used memory allocation techniques: automatic allocation and dynamic allocation. Commonly, there is a corresponding region of memory for each: the stack and the heap.
Stack
The stack always allocates memory in a sequential fashion. It can do so because it requires you to release the memory in the reverse order (First-In, Last-Out: FILO). This is the memory allocation technique for local variables in many programming languages. It is very, very fast because it requires minimal bookkeeping and the next address to allocate is implicit.
In C++, this is called automatic storage because the storage is claimed automatically at the end of scope. As soon as execution of current code block (delimited using {}) is completed, memory for all variables in that block is automatically collected. This is also the moment where destructors are invoked to clean up resources.
Heap
The heap allows for a more flexible memory allocation mode. Bookkeeping is more complex and allocation is slower. Because there is no implicit release point, you must release the memory manually, using delete or delete[] (free in C). However, the absence of an implicit release point is the key to the heap's flexibility.
Reasons to use dynamic allocation
Even if using the heap is slower and potentially leads to memory leaks or memory fragmentation, there are perfectly good use cases for dynamic allocation, as it's less limited.
Two key reasons to use dynamic allocation:
You don't know how much memory you need at compile time. For instance, when reading a text file into a string, you usually don't know what size the file has, so you can't decide how much memory to allocate until you run the program.
You want to allocate memory which will persist after leaving the current block. For instance, you may want to write a function string readfile(string path) that returns the contents of a file. In this case, even if the stack could hold the entire file contents, you could not return from a function and keep the allocated memory block.
Why dynamic allocation is often unnecessary
In C++ there's a neat construct called a destructor. This mechanism allows you to manage resources by aligning the lifetime of the resource with the lifetime of a variable. This technique is called RAII and is the distinguishing point of C++. It "wraps" resources into objects. std::string is a perfect example. This snippet:
int main ( int argc, char* argv[] )
{
std::string program(argv[0]);
}
actually allocates a variable amount of memory. The std::string object allocates memory using the heap and releases it in its destructor. In this case, you did not need to manually manage any resources and still got the benefits of dynamic memory allocation.
In particular, it implies that in this snippet:
int main ( int argc, char* argv[] )
{
std::string * program = new std::string(argv[0]); // Bad!
delete program;
}
there is unneeded dynamic memory allocation. The program requires more typing (!) and introduces the risk of forgetting to deallocate the memory. It does this with no apparent benefit.
Why you should use automatic storage as often as possible
Basically, the last paragraph sums it up. Using automatic storage as often as possible makes your programs:
faster to type;
faster when run;
less prone to memory/resource leaks.
Bonus points
In the referenced question, there are additional concerns. In particular, the following class:
class Line {
public:
Line();
~Line();
std::string* mString;
};
Line::Line() {
mString = new std::string("foo_bar");
}
Line::~Line() {
delete mString;
}
Is actually a lot more risky to use than the following one:
class Line {
public:
Line();
std::string mString;
};
Line::Line() {
mString = "foo_bar";
// note: there is a cleaner way to write this.
}
The reason is that std::string properly defines a copy constructor. Consider the following program:
int main ()
{
Line l1;
Line l2 = l1;
}
Using the original version, this program will likely crash, as it uses delete on the same string twice. Using the modified version, each Line instance will own its own string instance, each with its own memory and both will be released at the end of the program.
Other notes
Extensive use of RAII is considered a best practice in C++ because of all the reasons above. However, there is an additional benefit which is not immediately obvious. Basically, it's better than the sum of its parts. The whole mechanism composes. It scales.
If you use the Line class as a building block:
class Table
{
Line borders[4];
};
Then
int main ()
{
Table table;
}
allocates four std::string instances, four Line instances, one Table instance and all the string's contents and everything is freed automagically.
Because the stack is faster and leak-proof
In C++, it takes but a single instruction to allocate space—on the stack—for every local scope object in a given function, and it's impossible to leak any of that memory. That comment intended (or should have intended) to say something like "use the stack and not the heap".
The reason why is complicated.
First, C++ is not garbage collected. Therefore, for every new, there must be a corresponding delete. If you fail to put this delete in, then you have a memory leak. Now, for a simple case like this:
std::string *someString = new std::string(...);
//Do stuff
delete someString;
This is simple. But what happens if "Do stuff" throws an exception? Oops: memory leak. What happens if "Do stuff" issues return early? Oops: memory leak.
And this is for the simplest case. If you happen to return that string to someone, now they have to delete it. And if they pass it as an argument, does the person receiving it need to delete it? When should they delete it?
Or, you can just do this:
std::string someString(...);
//Do stuff
No delete. The object was created on the "stack", and it will be destroyed once it goes out of scope. You can even return the object, thus transfering its contents to the calling function. You can pass the object to functions (typically as a reference or const-reference: void SomeFunc(std::string &iCanModifyThis, const std::string &iCantModifyThis). And so forth.
All without new and delete. There's no question of who owns the memory or who's responsible for deleting it. If you do:
std::string someString(...);
std::string otherString;
otherString = someString;
It is understood that otherString has a copy of the data of someString. It isn't a pointer; it is a separate object. They may happen to have the same contents, but you can change one without affecting the other:
someString += "More text.";
if(otherString == someString) { /*Will never get here */ }
See the idea?
Objects created by new must be eventually deleted lest they leak. The destructor won't be called, memory won't be freed, the whole bit. Since C++ has no garbage collection, it's a problem.
Objects created by value (i. e. on stack) automatically die when they go out of scope. The destructor call is inserted by the compiler, and the memory is auto-freed upon function return.
Smart pointers like unique_ptr, shared_ptr solve the dangling reference problem, but they require coding discipline and have other potential issues (copyability, reference loops, etc.).
Also, in heavily multithreaded scenarios, new is a point of contention between threads; there can be a performance impact for overusing new. Stack object creation is by definition thread-local, since each thread has its own stack.
The downside of value objects is that they die once the host function returns - you cannot pass a reference to those back to the caller, only by copying, returning or moving by value.
C++ doesn't employ any memory manager by its own. Other languages like C# and Java have a garbage collector to handle the memory
C++ implementations typically use operating system routines to allocate the memory and too much new/delete could fragment the available memory
With any application, if the memory is frequently being used it's advisable to preallocate it and release when not required.
Improper memory management could lead memory leaks and it's really hard to track. So using stack objects within the scope of function is a proven technique
The downside of using stack objects are, it creates multiple copies of objects on returning, passing to functions, etc. However, smart compilers are well aware of these situations and they've been optimized well for performance
It's really tedious in C++ if the memory being allocated and released in two different places. The responsibility for release is always a question and mostly we rely on some commonly accessible pointers, stack objects (maximum possible) and techniques like auto_ptr (RAII objects)
The best thing is that, you've control over the memory and the worst thing is that you will not have any control over the memory if we employ an improper memory management for the application. The crashes caused due to memory corruptions are the nastiest and hard to trace.
I see that a few important reasons for doing as few new's as possible are missed:
Operator new has a non-deterministic execution time
Calling new may or may not cause the OS to allocate a new physical page to your process. This can be quite slow if you do it often. Or it may already have a suitable memory location ready; we don't know. If your program needs to have consistent and predictable execution time (like in a real-time system or game/physics simulation), you need to avoid new in your time-critical loops.
Operator new is an implicit thread synchronization
Yes, you heard me. Your OS needs to make sure your page tables are consistent and as such calling new will cause your thread to acquire an implicit mutex lock. If you are consistently calling new from many threads you are actually serialising your threads (I've done this with 32 CPUs, each hitting on new to get a few hundred bytes each, ouch! That was a royal p.i.t.a. to debug.)
The rest, such as slow, fragmentation, error prone, etc., have already been mentioned by other answers.
Pre-C++17:
Because it is prone to subtle leaks even if you wrap the result in a smart pointer.
Consider a "careful" user who remembers to wrap objects in smart pointers:
foo(shared_ptr<T1>(new T1()), shared_ptr<T2>(new T2()));
This code is dangerous because there is no guarantee that either shared_ptr is constructed before either T1 or T2. Hence, if one of new T1() or new T2() fails after the other succeeds, then the first object will be leaked because no shared_ptr exists to destroy and deallocate it.
Solution: use make_shared.
Post-C++17:
This is no longer a problem: C++17 imposes a constraint on the order of these operations, in this case ensuring that each call to new() must be immediately followed by the construction of the corresponding smart pointer, with no other operation in between. This implies that, by the time the second new() is called, it is guaranteed that the first object has already been wrapped in its smart pointer, thus preventing any leaks in case an exception is thrown.
A more detailed explanation of the new evaluation order introduced by C++17 was provided by Barry in another answer.
Thanks to #Remy Lebeau for pointing out that this is still a problem under C++17 (although less so): the shared_ptr constructor can fail to allocate its control block and throw, in which case the pointer passed to it is not deleted.
Solution: use make_shared.
To a great extent, that's someone elevating their own weaknesses to a general rule. There's nothing wrong per se with creating objects using the new operator. What there is some argument for is that you have to do so with some discipline: if you create an object you need to make sure it's going to be destroyed.
The easiest way of doing that is to create the object in automatic storage, so C++ knows to destroy it when it goes out of scope:
{
File foo = File("foo.dat");
// Do things
}
Now, observe that when you fall off that block after the end-brace, foo is out of scope. C++ will call its destructor automatically for you. Unlike Java, you don't need to wait for the garbage collection to find it.
Had you written
{
File * foo = new File("foo.dat");
you would want to match it explicitly with
delete foo;
}
or even better, allocate your File * as a "smart pointer". If you aren't careful about that it can lead to leaks.
The answer itself makes the mistaken assumption that if you don't use new you don't allocate on the heap; in fact, in C++ you don't know that. At most, you know that a small amount of memory, say one pointer, is certainly allocated on the stack. However, consider if the implementation of File is something like:
class File {
private:
FileImpl * fd;
public:
File(String fn){ fd = new FileImpl(fn);}
Then FileImpl will still be allocated on the stack.
And yes, you'd better be sure to have
~File(){ delete fd ; }
in the class as well; without it, you'll leak memory from the heap even if you didn't apparently allocate on the heap at all.
new() shouldn't be used as little as possible. It should be used as carefully as possible. And it should be used as often as necessary as dictated by pragmatism.
Allocation of objects on the stack, relying on their implicit destruction, is a simple model. If the required scope of an object fits that model then there's no need to use new(), with the associated delete() and checking of NULL pointers.
In the case where you have lots of short-lived objects allocation on the stack should reduce the problems of heap fragmentation.
However, if the lifetime of your object needs to extend beyond the current scope then new() is the right answer. Just make sure that you pay attention to when and how you call delete() and the possibilities of NULL pointers, using deleted objects and all of the other gotchas that come with the use of pointers.
When you use new, objects are allocated to the heap. It is generally used when you anticipate expansion. When you declare an object such as,
Class var;
it is placed on the stack.
You will always have to call destroy on the object that you placed on the heap with new. This opens the potential for memory leaks. Objects placed on the stack are not prone to memory leaking!
One notable reason to avoid overusing the heap is for performance -- specifically involving the performance of the default memory management mechanism used by C++. While allocation can be quite quick in the trivial case, doing a lot of new and delete on objects of non-uniform size without strict order leads not only to memory fragmentation, but it also complicates the allocation algorithm and can absolutely destroy performance in certain cases.
That's the problem that memory pools where created to solve, allowing to to mitigate the inherent disadvantages of traditional heap implementations, while still allowing you to use the heap as necessary.
Better still, though, to avoid the problem altogether. If you can put it on the stack, then do so.
I tend to disagree with the idea of using new "too much". Though the original poster's use of new with system classes is a bit ridiculous. (int *i; i = new int[9999];? really? int i[9999]; is much clearer.) I think that is what was getting the commenter's goat.
When you're working with system objects, it's very rare that you'd need more than one reference to the exact same object. As long as the value is the same, that's all that matters. And system objects don't typically take up much space in memory. (one byte per character, in a string). And if they do, the libraries should be designed to take that memory management into account (if they're written well). In these cases, (all but one or two of the news in his code), new is practically pointless and only serves to introduce confusions and potential for bugs.
When you're working with your own classes/objects, however (e.g. the original poster's Line class), then you have to begin thinking about the issues like memory footprint, persistence of data, etc. yourself. At this point, allowing multiple references to the same value is invaluable - it allows for constructs like linked lists, dictionaries, and graphs, where multiple variables need to not only have the same value, but reference the exact same object in memory. However, the Line class doesn't have any of those requirements. So the original poster's code actually has absolutely no needs for new.
I think the poster meant to say You do not have to allocate everything on the heap rather than the the stack.
Basically, objects are allocated on the stack (if the object size allows, of course) because of the cheap cost of stack-allocation, rather than heap-based allocation which involves quite some work by the allocator, and adds verbosity because then you have to manage data allocated on the heap.
Two reasons:
It's unnecessary in this case. You're making your code needlessly more complicated.
It allocates space on the heap, and it means that you have to remember to delete it later, or it will cause a memory leak.
Many answers have gone into various performance considerations. I want to address the comment which puzzled OP:
Stop thinking like a Java programmer.
Indeed, in Java, as explained in the answer to this question,
You use the new keyword when an object is being explicitly created for the first time.
but in C++, objects of type T are created like so: T{} (or T{ctor_argument1,ctor_arg2} for a constructor with arguments). That's why usually you just have no reason to want to use new.
So, why is it ever used at all? Well, for two reasons:
You need to create many values the number of which is not known at compile time.
Due to limitations of the C++ implementation on common machines - to prevent a stack overflow by allocating too much space creating values the regular way.
Now, beyond what the comment you quoted implied, you should note that even those two cases above are covered well enough without you having to "resort" to using new yourself:
You can use container types from the standard libraries which can hold a runtime-variable number of elements (like std::vector).
You can use smart pointers, which give you a pointer similar to new, but ensure that memory gets released where the "pointer" goes out of scope.
and for this reason, it is an official item in the C++ community Coding Guidelines to avoid explicit new and delete: Guideline R.11.
The core reason is that objects on heap are always difficult to use and manage than simple values. Writing code that are easy to read and maintain is always the first priority of any serious programmer.
Another scenario is the library we are using provides value semantics and make dynamic allocation unnecessary. Std::string is a good example.
For object oriented code however, using a pointer - which means use new to create it beforehand - is a must. In order to simplify the complexity of resource management, we have dozens of tools to make it as simple as possible, such as smart pointers. The object based paradigm or generic paradigm assumes value semantics and requires less or no new, just as the posters elsewhere stated.
Traditional design patterns, especially those mentioned in GoF book, use new a lot, as they are typical OO code.
new is the new goto.
Recall why goto is so reviled: while it is a powerful, low-level tool for flow control, people often used it in unnecessarily complicated ways that made code difficult to follow. Furthermore, the most useful and easiest to read patterns were encoded in structured programming statements (e.g. for or while); the ultimate effect is that the code where goto is the appropriate way to is rather rare, if you are tempted to write goto, you're probably doing things badly (unless you really know what you're doing).
new is similar — it is often used to make things unnecessarily complicated and harder to read, and the most useful usage patterns can be encoded have been encoded into various classes. Furthermore, if you need to use any new usage patterns for which there aren't already standard classes, you can write your own classes that encode them!
I would even argue that new is worse than goto, due to the need to pair new and delete statements.
Like goto, if you ever think you need to use new, you are probably doing things badly — especially if you are doing so outside of the implementation of a class whose purpose in life is to encapsulate whatever dynamic allocations you need to do.
One more point to all the above correct answers, it depends on what sort of programming you are doing. Kernel developing in Windows for example -> The stack is severely limited and you might not be able to take page faults like in user mode.
In such environments, new, or C-like API calls are prefered and even required.
Of course, this is merely an exception to the rule.
new allocates objects on the heap. Otherwise, objects are allocated on the stack. Look up the difference between the two.

How to implement garbage collection in C++

I saw some post about implement GC in C and some people said it's impossible to do it because C is weakly typed. I want to know how to implement GC in C++.
I want some general idea about how to do it. Thank you very much!
This is a Bloomberg interview question my friend told me. He did badly at that time. We want to know your ideas about this.
Garbage collection in C and C++ are both difficult topics for a few reasons:
Pointers can be typecast to integers and vice-versa. This means that I could have a block of memory that is reachable only by taking an integer, typecasting it to a pointer, then dereferencing it. A garbage collector has to be careful not to think a block is unreachable when indeed it still can be reached.
Pointers are not opaque. Many garbage collectors, like stop-and-copy collectors, like to move blocks of memory around or compact them to save space. Since you can explicitly look at pointer values in C and C++, this can be difficult to implement correctly. You would have to be sure that if someone was doing something tricky with typecasting to integers that you correctly updated the integer if you moved a block of memory around.
Memory management can be done explicitly. Any garbage collector will need to take into account that the user is able to explicitly free blocks of memory at any time.
In C++, there is a separation between allocation/deallocation and object construction/destruction. A block of memory can be allocated with sufficient space to hold an object without any object actually being constructed there. A good garbage collector would need to know, when it reclaims memory, whether or not to call the destructor for any objects that might be allocated there. This is especially true for the standard library containers, which often make use of std::allocator to use this trick for efficiency reasons.
Memory can be allocated from different areas. C and C++ can get memory either from the built-in freestore (malloc/free or new/delete), or from the OS via mmap or other system calls, and, in the case of C++, from get_temporary_buffer or return_temporary_buffer. The programs might also get memory from some third-party library. A good garbage collector needs to be able to track references to memory in these other pools and (possibly) would have to be responsible for cleaning them up.
Pointers can point into the middle of objects or arrays. In many garbage-collected languages like Java, object references always point to the start of the object. In C and C++ pointers can point into the middle of arrays, and in C++ into the middle of objects (if multiple inheritance is used). This can greatly complicate the logic for detecting what's still reachable.
So, in short, it's extremely hard to build a garbage collector for C or C++. Most libraries that do garbage collection in C and C++ are extremely conservative in their approach and are technically unsound - they assume that you won't, for example, take a pointer, cast it to an integer, write it to disk, and then load it back in at some later time. They also assume that any value in memory that's the size of a pointer could possibly be a pointer, and so sometimes refuse to free unreachable memory because there's a nonzero chance that there's a pointer to it.
As others have pointed out, the Boehm GC does do garbage collection for C and C++, but subject to the aforementioned restrictions.
Interestingly, C++11 includes some new library functions that allow the programmer to mark regions of memory as reachable and unreachable in anticipation of future garbage collection efforts. It may be possible in the future to build a really good C++11 garbage collector with this sort of information. In the meantime though, you'll need to be extremely careful not to break any of the above rules.
Look into the Boehm Garbage Collector.
C isn't C++, but both have the same "weakly typed" issues. It's not the implicit typecasts that cause an issue, though, but the tendency towards "punning" (subverting the type system), especially in data structure libraries.
There are garbage collectors out there for C and/or C++. The Boehm conservative collector is probably the best know. It's conservative in that, if it sees a bit pattern that looks like a pointer to some object, it doesn't collect that object. That value might be some other type of value completely, so the object could be collected, but "conservative" means playing safe.
Even a conservative collector can be fooled, though, if you use calculated pointers. There's a data structure, for example, where every list node has a field giving the difference between the next-node and previous-node addresses. The idea is to give double-linked list behaviour with a single link per node, at the expense of more complex iterators. Since there's no explicit pointer anywhere to most of the nodes, they may be wrongly collected.
Of course this is a very exceptional special case.
More important - you can either have reliable destructors or garbage collection, not both. When a garbage cycle is collected, the collector cannot decide which destructor to call first.
Since the RAII pattern is pervasive in C++, and that relies on destructors, there is IMO a conflict. There may be valid exceptions, but my view is that if you want garbage collection, you should use a language that's designed from the ground up for garbage collection (Java, C#, ...).
You could either use smart pointers or create your own container object which will track references and handle memory allocation etc. Smart pointers would probably be preferable. Often times you can avoid dynamic heap allocation altogether.
For example:
char* pCharArray = new char[128];
// do some stuff with characters
delete [] pCharArray;
The danger with the above being if anything throws between the new and the delete your delete will not be executed. Something like above could easily be replaced with safer "garbage collected" code:
std::vector<char> charArray;
// do some stuff with characters
Bloomberg has notoriously irrelevant interview questions from a practical coding standpoint. Like most interviewers they are primarily concerned with how you think and your communication skills than the actual solution though.
You can read about the shared_ptr struct.
It implements a simple reference-counting garbage collector.
If you want a real garbage collector, you can overload the new operator.
Create a struct similar to shared_ptr, call it Object.
This will wrap the new object created. Now with overloading its operators, you can control the GC.
All you need to do now, is just implement one of the many GC algorithms
The claim you saw is false; the Boehm collector supports C and C++. I suggest reading the Boehm collector's documentation (particularly this page)for a good overview of how one might write a garbage collector in C or C++.

Why is creating STL containers dynamically considered bad practice?

Title says it.
Sample of bad practive:
std::vector<Point>* FindPoints()
{
std::vector<Point>* result = new std::vector<Point>();
//...
return result;
}
What's wrong with it if I delete that vector later?
I mostly program in C#, so this problem is not very clear for me in C++ context.
As a rule of thumb, you don't do this because the less you allocate on the heap, the less you risk leaking memory. :)
std::vector is useful also because it automatically manages the memory used for the vector in RAII fashion; by allocating it on the heap now you require an explicit deallocation (with delete result) to avoid leaking its memory. The thing is made complicated because of exceptions, that can alter your return path and skip any delete you put on the way. (In C# you don't have such problems because inaccessible memory is just recalled periodically by the garbage collector)
If you want to return an STL container you have several choices:
just return it by value; in theory you should incur in a copy-penality because of the temporaries that are created in the process of returning result, but newer compilers should be able to elide the copy using NRVO1. There may also be std::vector implementations that implement copy-on-write optimization like many std::string implementations do, but I've never heard about that.
On C++0x compilers, instead, the move semantics should trigger, avoiding any copy.
Store the pointer of result in an ownership-transferring smart pointer like std::auto_ptr (or std::unique_ptr in C++0x), and also change the return type of your function to std::auto_ptr<std::vector<Point > >; in that way, your pointer is always encapsulated in a stack-object, that is automatically destroyed when the function exits (in any way), and destroys the vector if its still owned by it. Also, it's completely clear who owns the returned object.
Make the result vector a parameter passed by reference by the caller, and fill that one instead of returning a new vector.
Hardcore STL option: you would instead provide your data as iterators; the client code would then use std::copy+std::back_inserter or whatever to store such data in whichever container it wants. Not seen much (it can be tricky to code right) but it's worth mentioning.
As #Steve Jessop pointed out in the comments, NRVO works completely only if the return value is used directly to initialize a variable in the calling method; otherwise, it would still be able to elide the construction of the temporary return value, but the assignment operator for the variable to which the return value is assigned could still be called (see #Steve Jessop's comments for details).
Creating anything dynamically is bad practice unless it's really necessary. There's rarely a good reason to create a container dynamically, so it's usually not a good idea.
Edit: Usually, instead of worrying about things like how fast or slow returning a container is, most of the code should deal only with an iterator (or two) into the container.
Creating objects dynamically in general is considered a bad practice in C++. What if an exception is thrown from your "//..." code? You'll never be able to delete the object. It is easier and safer to simply do:
std::vector<Point> FindPoints()
{
std::vector<Point> result;
//...
return result;
}
Shorter, safer, more straghtforward... As for the performance, modern compilers will optimize away the copy on return and if they are not able to, move constructors will get executed so this is still a cheap operation.
Perhaps you're referring to this recent question: C++: vector<string> *args = new vector<string>(); causes SIGABRT
One liner: It's bad practice because it's a pattern that's prone to memory leaks.
You're forcing the caller to accept dynamic allocation and take charge of its lifetime. It's ambiguous from the declaration whether the pointer returned is a static buffer, a buffer owned by some other API (or object), or a buffer that's now owned by the caller. You should avoid this pattern in any language (including plain C) unless it's clear from the function name what's going on (e.g strdup, malloc).
The usual way is to instead do this:
void FindPoints(std::vector<Point>* ret) {
std::vector<Point> result;
//...
ret->swap(result);
}
void caller() {
//...
std::vector<Point> foo;
FindPoints(&foo);
// foo deletes itself
}
All objects are on the stack, and all the deletion is taken care of by the compiler. Or just return by value, if you're running a C++0x compiler+STL, or don't mind the copy.
I like Jerry Coffin's answer. Additionally, if you want to avoid returning a copy, consider passing the result container as a reference, and the swap() method may be needed sometimes.
void FindPoints(std::vector<Point> &points)
{
std::vector<Point> result;
//...
result.swap(points);
}
Programming is the art of finding good compromises. Dynamically allocated memory can have some place of course, and I can even think to problems where a good compromise between code complexity and efficiency is obtained using std::vector<std::vector<T>*>.
However std::vector does a great job of hiding most needs of dynamically allocated arrays, and managed pointers are many times just a perfect solution for dynamically allocated single instances. This means that it's just not so common finding cases where an unmanaged dynamically allocated container (or dynamically allocated whatever, actually) is the best compromise in C++.
This in my opinion doesn't make dynamic allocation "bad", but just "suspect" if you see it in code, because there's an high probability that better solutions could be possile.
In your case for example I see no reason for using dynamic allocation; just making the function returning an std::vector would be efficient and safe. With any decent compiler Return Value Optimization will be used when assigning to a newly declared vector, and if you need to assign the result to an existing vector you can still do something like:
FindPoints().swap(myvector);
that will not do any copying of the data but just some pointer twiddling (note that you cannot use the apparently more natural myvector.swap(FindPoints()) because of a C++ rule that is sometimes annoying that forbids passing temporaries as non-const references).
In my experience the biggest source of needs of dynamically allocated objects are complex data structures where the same instance can be reached using multiple access paths (e.g. instances are at the same time both in a doubly linked list and indexed by a map). In the standard library containers are always the only owner of the contained objects (C++ is a copy semantic language) so it may be difficult to implement those solutions efficiently without the pointer and dynamic allocation concept.
Often you can stil reasonable-enough compromises that just use standard containers however (may be paying some extra O(log N) lookups that you could have avoided) and that, considering the much simpler code, can be IMO the best compromise in most cases.

General guidelines to avoid memory leaks in C++ [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What are some general tips to make sure I don't leak memory in C++ programs? How do I figure out who should free memory that has been dynamically allocated?
I thoroughly endorse all the advice about RAII and smart pointers, but I'd also like to add a slightly higher-level tip: the easiest memory to manage is the memory you never allocated. Unlike languages like C# and Java, where pretty much everything is a reference, in C++ you should put objects on the stack whenever you can. As I've see several people (including Dr Stroustrup) point out, the main reason why garbage collection has never been popular in C++ is that well-written C++ doesn't produce much garbage in the first place.
Don't write
Object* x = new Object;
or even
shared_ptr<Object> x(new Object);
when you can just write
Object x;
Use RAII
Forget Garbage Collection (Use RAII instead). Note that even the Garbage Collector can leak, too (if you forget to "null" some references in Java/C#), and that Garbage Collector won't help you to dispose of resources (if you have an object which acquired a handle to a file, the file won't be freed automatically when the object will go out of scope if you don't do it manually in Java, or use the "dispose" pattern in C#).
Forget the "one return per function" rule. This is a good C advice to avoid leaks, but it is outdated in C++ because of its use of exceptions (use RAII instead).
And while the "Sandwich Pattern" is a good C advice, it is outdated in C++ because of its use of exceptions (use RAII instead).
This post seem to be repetitive, but in C++, the most basic pattern to know is RAII.
Learn to use smart pointers, both from boost, TR1 or even the lowly (but often efficient enough) auto_ptr (but you must know its limitations).
RAII is the basis of both exception safety and resource disposal in C++, and no other pattern (sandwich, etc.) will give you both (and most of the time, it will give you none).
See below a comparison of RAII and non RAII code:
void doSandwich()
{
T * p = new T() ;
// do something with p
delete p ; // leak if the p processing throws or return
}
void doRAIIDynamic()
{
std::auto_ptr<T> p(new T()) ; // you can use other smart pointers, too
// do something with p
// WON'T EVER LEAK, even in case of exceptions, returns, breaks, etc.
}
void doRAIIStatic()
{
T p ;
// do something with p
// WON'T EVER LEAK, even in case of exceptions, returns, breaks, etc.
}
About RAII
To summarize (after the comment from Ogre Psalm33), RAII relies on three concepts:
Once the object is constructed, it just works! Do acquire resources in the constructor.
Object destruction is enough! Do free resources in the destructor.
It's all about scopes! Scoped objects (see doRAIIStatic example above) will be constructed at their declaration, and will be destroyed the moment the execution exits the scope, no matter how the exit (return, break, exception, etc.).
This means that in correct C++ code, most objects won't be constructed with new, and will be declared on the stack instead. And for those constructed using new, all will be somehow scoped (e.g. attached to a smart pointer).
As a developer, this is very powerful indeed as you won't need to care about manual resource handling (as done in C, or for some objects in Java which makes intensive use of try/finally for that case)...
Edit (2012-02-12)
"scoped objects ... will be destructed ... no matter the exit" that's not entirely true. there are ways to cheat RAII. any flavour of terminate() will bypass cleanup. exit(EXIT_SUCCESS) is an oxymoron in this regard.
– wilhelmtell
wilhelmtell is quite right about that: There are exceptional ways to cheat RAII, all leading to the process abrupt stop.
Those are exceptional ways because C++ code is not littered with terminate, exit, etc., or in the case with exceptions, we do want an unhandled exception to crash the process and core dump its memory image as is, and not after cleaning.
But we must still know about those cases because, while they rarely happen, they can still happen.
(who calls terminate or exit in casual C++ code?... I remember having to deal with that problem when playing with GLUT: This library is very C-oriented, going as far as actively designing it to make things difficult for C++ developers like not caring about stack allocated data, or having "interesting" decisions about never returning from their main loop... I won't comment about that).
Instead of managing memory manually, try to use smart pointers where applicable.
Take a look at the Boost lib, TR1, and smart pointers.
Also smart pointers are now a part of C++ standard called C++11.
You'll want to look at smart pointers, such as boost's smart pointers.
Instead of
int main()
{
Object* obj = new Object();
//...
delete obj;
}
boost::shared_ptr will automatically delete once the reference count is zero:
int main()
{
boost::shared_ptr<Object> obj(new Object());
//...
// destructor destroys when reference count is zero
}
Note my last note, "when reference count is zero, which is the coolest part. So If you have multiple users of your object, you won't have to keep track of whether the object is still in use. Once nobody refers to your shared pointer, it gets destroyed.
This is not a panacea, however. Though you can access the base pointer, you wouldn't want to pass it to a 3rd party API unless you were confident with what it was doing. Lots of times, your "posting" stuff to some other thread for work to be done AFTER the creating scope is finished. This is common with PostThreadMessage in Win32:
void foo()
{
boost::shared_ptr<Object> obj(new Object());
// Simplified here
PostThreadMessage(...., (LPARAM)ob.get());
// Destructor destroys! pointer sent to PostThreadMessage is invalid! Zohnoes!
}
As always, use your thinking cap with any tool...
Read up on RAII and make sure you understand it.
Bah, you young kids and your new-fangled garbage collectors...
Very strong rules on "ownership" - what object or part of the software has the right to delete the object. Clear comments and wise variable names to make it obvious if a pointer "owns" or is "just look, don't touch". To help decide who owns what, follow as much as possible the "sandwich" pattern within every subroutine or method.
create a thing
use that thing
destroy that thing
Sometimes it's necessary to create and destroy in widely different places; i think hard to avoid that.
In any program requiring complex data structures, i create a strict clear-cut tree of objects containing other objects - using "owner" pointers. This tree models the basic hierarchy of application domain concepts. Example a 3D scene owns objects, lights, textures. At the end of the rendering when the program quits, there's a clear way to destroy everything.
Many other pointers are defined as needed whenever one entity needs access another, to scan over arays or whatever; these are the "just looking". For the 3D scene example - an object uses a texture but does not own; other objects may use that same texture. The destruction of an object does not invoke destruction of any textures.
Yes it's time consuming but that's what i do. I rarely have memory leaks or other problems. But then i work in the limited arena of high-performance scientific, data acquisition and graphics software. I don't often deal transactions like in banking and ecommerce, event-driven GUIs or high networked asynchronous chaos. Maybe the new-fangled ways have an advantage there!
Most memory leaks are the result of not being clear about object ownership and lifetime.
The first thing to do is to allocate on the Stack whenever you can. This deals with most of the cases where you need to allocate a single object for some purpose.
If you do need to 'new' an object then most of the time it will have a single obvious owner for the rest of its lifetime. For this situation I tend to use a bunch of collections templates that are designed for 'owning' objects stored in them by pointer. They are implemented with the STL vector and map containers but have some differences:
These collections can not be copied or assigned to. (once they contain objects.)
Pointers to objects are inserted into them.
When the collection is deleted the destructor is first called on all objects in the collection. (I have another version where it asserts if destructed and not empty.)
Since they store pointers you can also store inherited objects in these containers.
My beaf with STL is that it is so focused on Value objects while in most applications objects are unique entities that do not have meaningful copy semantics required for use in those containers.
Great question!
if you are using c++ and you are developing real-time CPU-and-memory boud application (like games) you need to write your own Memory Manager.
I think the better you can do is merge some interesting works of various authors, I can give you some hint:
Fixed size allocator is heavily discussed, everywhere in the net
Small Object Allocation was introduced by Alexandrescu in 2001 in his perfect book "Modern c++ design"
A great advancement (with source code distributed) can be found in an amazing article in Game Programming Gem 7 (2008) named "High Performance Heap allocator" written by Dimitar Lazarov
A great list of resources can be found in this article
Do not start writing a noob unuseful allocator by yourself... DOCUMENT YOURSELF first.
One technique that has become popular with memory management in C++ is RAII. Basically you use constructors/destructors to handle resource allocation. Of course there are some other obnoxious details in C++ due to exception safety, but the basic idea is pretty simple.
The issue generally comes down to one of ownership. I highly recommend reading the Effective C++ series by Scott Meyers and Modern C++ Design by Andrei Alexandrescu.
There's already a lot about how to not leak, but if you need a tool to help you track leaks take a look at:
BoundsChecker under VS
MMGR C/C++ lib from FluidStudio
http://www.paulnettle.com/pub/FluidStudios/MemoryManagers/Fluid_Studios_Memory_Manager.zip (its overrides the allocation methods and creates a report of the allocations, leaks, etc)
User smart pointers everywhere you can! Whole classes of memory leaks just go away.
Share and know memory ownership rules across your project. Using the COM rules makes for the best consistency ([in] parameters are owned by the caller, callee must copy; [out] params are owned by the caller, callee must make a copy if keeping a reference; etc.)
valgrind is a good tool to check your programs memory leakages at runtime, too.
It is available on most flavors of Linux (including Android) and on Darwin.
If you use to write unit tests for your programs, you should get in the habit of systematicaly running valgrind on tests. It will potentially avoid many memory leaks at an early stage. It is also usually easier to pinpoint them in simple tests that in a full software.
Of course this advice stay valid for any other memory check tool.
Also, don't use manually allocated memory if there's a std library class (e.g. vector). Make sure if you violate that rule that you have a virtual destructor.
If you can't/don't use a smart pointer for something (although that should be a huge red flag), type in your code with:
allocate
if allocation succeeded:
{ //scope)
deallocate()
}
That's obvious, but make sure you type it before you type any code in the scope
A frequent source of these bugs is when you have a method that accepts a reference or pointer to an object but leaves ownership unclear. Style and commenting conventions can make this less likely.
Let the case where the function takes ownership of the object be the special case. In all situations where this happens, be sure to write a comment next to the function in the header file indicating this. You should strive to make sure that in most cases the module or class which allocates an object is also responsible for deallocating it.
Using const can help a lot in some cases. If a function will not modify an object, and does not store a reference to it that persists after it returns, accept a const reference. From reading the caller's code it will be obvious that your function has not accepted ownership of the object. You could have had the same function accept a non-const pointer, and the caller may or may not have assumed that the callee accepted ownership, but with a const reference there's no question.
Do not use non-const references in argument lists. It is very unclear when reading the caller code that the callee may have kept a reference to the parameter.
I disagree with the comments recommending reference counted pointers. This usually works fine, but when you have a bug and it doesn't work, especially if your destructor does something non-trivial, such as in a multithreaded program. Definitely try to adjust your design to not need reference counting if it's not too hard.
Tips in order of Importance:
-Tip#1 Always remember to declare your destructors "virtual".
-Tip#2 Use RAII
-Tip#3 Use boost's smartpointers
-Tip#4 Don't write your own buggy Smartpointers, use boost (on a project I'm on right now I can't use boost, and I've suffered having to debug my own smart pointers, I would definately not take the same route again, but then again right now I can't add boost to our dependencies)
-Tip#5 If its some casual/non-performance critical (as in games with thousands of objects) work look at Thorsten Ottosen's boost pointer container
-Tip#6 Find a leak detection header for your platform of choice such as Visual Leak Detection's "vld" header
If you can, use boost shared_ptr and standard C++ auto_ptr. Those convey ownership semantics.
When you return an auto_ptr, you are telling the caller that you are giving them ownership of the memory.
When you return a shared_ptr, you are telling the caller that you have a reference to it and they take part of the ownership, but it isn't solely their responsibility.
These semantics also apply to parameters. If the caller passes you an auto_ptr, they are giving you ownership.
Others have mentioned ways of avoiding memory leaks in the first place (like smart pointers). But a profiling and memory-analysis tool is often the only way to track down memory problems once you have them.
Valgrind memcheck is an excellent free one.
For MSVC only, add the following to the top of each .cpp file:
#ifdef _DEBUG
#define new DEBUG_NEW
#endif
Then, when debugging with VS2003 or greater, you will be told of any leaks when your program exits (it tracks new/delete). It's basic, but it has helped me in the past.
valgrind (only avail for *nix platforms) is a very nice memory checker
If you are going to manage your memory manually, you have two cases:
I created the object (perhaps indirectly, by calling a function that allocates a new object), I use it (or a function I call uses it), then I free it.
Somebody gave me the reference, so I should not free it.
If you need to break any of these rules, please document it.
It is all about pointer ownership.
Try to avoid allocating objects dynamically. As long as classes have appropriate constructors and destructors, use a variable of the class type, not a pointer to it, and you avoid dynamical allocation and deallocation because the compiler will do it for you.
Actually that's also the mechanism used by "smart pointers" and referred to as RAII by some of the other writers ;-) .
When you pass objects to other functions, prefer reference parameters over pointers. This avoids some possible errors.
Declare parameters const, where possible, especially pointers to objects. That way objects can't be freed "accidentially" (except if you cast the const away ;-))).
Minimize the number of places in the program where you do memory allocation and deallocation. E. g. if you do allocate or free the same type several times, write a function for it (or a factory method ;-)).
This way you can create debug output (which addresses are allocated and deallocated, ...) easily, if required.
Use a factory function to allocate objects of several related classes from a single function.
If your classes have a common base class with a virtual destructor, you can free all of them using the same function (or static method).
Check your program with tools like purify (unfortunately many $/€/...).
You can intercept the memory allocation functions and see if there are some memory zones not freed upon program exit (though it is not suitable for all the applications).
It can also be done at compile time by replacing operators new and delete and other memory allocation functions.
For example check in this site [Debugging memory allocation in C++]
Note: There is a trick for delete operator also something like this:
#define DEBUG_DELETE PrepareDelete(__LINE__,__FILE__); delete
#define delete DEBUG_DELETE
You can store in some variables the name of the file and when the overloaded delete operator will know which was the place it was called from. This way you can have the trace of every delete and malloc from your program. At the end of the memory checking sequence you should be able to report what allocated block of memory was not 'deleted' identifying it by filename and line number which is I guess what you want.
You could also try something like BoundsChecker under Visual Studio which is pretty interesting and easy to use.
We wrap all our allocation functions with a layer that appends a brief string at the front and a sentinel flag at the end. So for example you'd have a call to "myalloc( pszSomeString, iSize, iAlignment ); or new( "description", iSize ) MyObject(); which internally allocates the specified size plus enough space for your header and sentinel. Of course, don't forget to comment this out for non-debug builds! It takes a little more memory to do this but the benefits far outweigh the costs.
This has three benefits - first it allows you to easily and quickly track what code is leaking, by doing quick searches for code allocated in certain 'zones' but not cleaned up when those zones should have freed. It can also be useful to detect when a boundary has been overwritten by checking to ensure all sentinels are intact. This has saved us numerous times when trying to find those well-hidden crashes or array missteps. The third benefit is in tracking the use of memory to see who the big players are - a collation of certain descriptions in a MemDump tells you when 'sound' is taking up way more space than you anticipated, for example.
C++ is designed RAII in mind. There is really no better way to manage memory in C++ I think.
But be careful not to allocate very big chunks (like buffer objects) on local scope. It can cause stack overflows and, if there is a flaw in bounds checking while using that chunk, you can overwrite other variables or return addresses, which leads to all kinds security holes.
One of the only examples about allocating and destroying in different places is thread creation (the parameter you pass).
But even in this case is easy.
Here is the function/method creating a thread:
struct myparams {
int x;
std::vector<double> z;
}
std::auto_ptr<myparams> param(new myparams(x, ...));
// Release the ownership in case thread creation is successfull
if (0 == pthread_create(&th, NULL, th_func, param.get()) param.release();
...
Here instead the thread function
extern "C" void* th_func(void* p) {
try {
std::auto_ptr<myparams> param((myparams*)p);
...
} catch(...) {
}
return 0;
}
Pretty easyn isn't it? In case the thread creation fails the resource will be free'd (deleted) by the auto_ptr, otherwise the ownership will be passed to the thread.
What if the thread is so fast that after creation it releases the resource before the
param.release();
gets called in the main function/method? Nothing! Because we will 'tell' the auto_ptr to ignore the deallocation.
Is C++ memory management easy isn't it?
Cheers,
Ema!
Manage memory the same way you manage other resources (handles, files, db connections, sockets...). GC would not help you with them either.
Exactly one return from any function. That way you can do deallocation there and never miss it.
It's too easy to make a mistake otherwise:
new a()
if (Bad()) {delete a; return;}
new b()
if (Bad()) {delete a; delete b; return;}
... // etc.