How to implement garbage collection in C++ - c++

I saw some post about implement GC in C and some people said it's impossible to do it because C is weakly typed. I want to know how to implement GC in C++.
I want some general idea about how to do it. Thank you very much!
This is a Bloomberg interview question my friend told me. He did badly at that time. We want to know your ideas about this.

Garbage collection in C and C++ are both difficult topics for a few reasons:
Pointers can be typecast to integers and vice-versa. This means that I could have a block of memory that is reachable only by taking an integer, typecasting it to a pointer, then dereferencing it. A garbage collector has to be careful not to think a block is unreachable when indeed it still can be reached.
Pointers are not opaque. Many garbage collectors, like stop-and-copy collectors, like to move blocks of memory around or compact them to save space. Since you can explicitly look at pointer values in C and C++, this can be difficult to implement correctly. You would have to be sure that if someone was doing something tricky with typecasting to integers that you correctly updated the integer if you moved a block of memory around.
Memory management can be done explicitly. Any garbage collector will need to take into account that the user is able to explicitly free blocks of memory at any time.
In C++, there is a separation between allocation/deallocation and object construction/destruction. A block of memory can be allocated with sufficient space to hold an object without any object actually being constructed there. A good garbage collector would need to know, when it reclaims memory, whether or not to call the destructor for any objects that might be allocated there. This is especially true for the standard library containers, which often make use of std::allocator to use this trick for efficiency reasons.
Memory can be allocated from different areas. C and C++ can get memory either from the built-in freestore (malloc/free or new/delete), or from the OS via mmap or other system calls, and, in the case of C++, from get_temporary_buffer or return_temporary_buffer. The programs might also get memory from some third-party library. A good garbage collector needs to be able to track references to memory in these other pools and (possibly) would have to be responsible for cleaning them up.
Pointers can point into the middle of objects or arrays. In many garbage-collected languages like Java, object references always point to the start of the object. In C and C++ pointers can point into the middle of arrays, and in C++ into the middle of objects (if multiple inheritance is used). This can greatly complicate the logic for detecting what's still reachable.
So, in short, it's extremely hard to build a garbage collector for C or C++. Most libraries that do garbage collection in C and C++ are extremely conservative in their approach and are technically unsound - they assume that you won't, for example, take a pointer, cast it to an integer, write it to disk, and then load it back in at some later time. They also assume that any value in memory that's the size of a pointer could possibly be a pointer, and so sometimes refuse to free unreachable memory because there's a nonzero chance that there's a pointer to it.
As others have pointed out, the Boehm GC does do garbage collection for C and C++, but subject to the aforementioned restrictions.
Interestingly, C++11 includes some new library functions that allow the programmer to mark regions of memory as reachable and unreachable in anticipation of future garbage collection efforts. It may be possible in the future to build a really good C++11 garbage collector with this sort of information. In the meantime though, you'll need to be extremely careful not to break any of the above rules.

Look into the Boehm Garbage Collector.

C isn't C++, but both have the same "weakly typed" issues. It's not the implicit typecasts that cause an issue, though, but the tendency towards "punning" (subverting the type system), especially in data structure libraries.
There are garbage collectors out there for C and/or C++. The Boehm conservative collector is probably the best know. It's conservative in that, if it sees a bit pattern that looks like a pointer to some object, it doesn't collect that object. That value might be some other type of value completely, so the object could be collected, but "conservative" means playing safe.
Even a conservative collector can be fooled, though, if you use calculated pointers. There's a data structure, for example, where every list node has a field giving the difference between the next-node and previous-node addresses. The idea is to give double-linked list behaviour with a single link per node, at the expense of more complex iterators. Since there's no explicit pointer anywhere to most of the nodes, they may be wrongly collected.
Of course this is a very exceptional special case.
More important - you can either have reliable destructors or garbage collection, not both. When a garbage cycle is collected, the collector cannot decide which destructor to call first.
Since the RAII pattern is pervasive in C++, and that relies on destructors, there is IMO a conflict. There may be valid exceptions, but my view is that if you want garbage collection, you should use a language that's designed from the ground up for garbage collection (Java, C#, ...).

You could either use smart pointers or create your own container object which will track references and handle memory allocation etc. Smart pointers would probably be preferable. Often times you can avoid dynamic heap allocation altogether.
For example:
char* pCharArray = new char[128];
// do some stuff with characters
delete [] pCharArray;
The danger with the above being if anything throws between the new and the delete your delete will not be executed. Something like above could easily be replaced with safer "garbage collected" code:
std::vector<char> charArray;
// do some stuff with characters
Bloomberg has notoriously irrelevant interview questions from a practical coding standpoint. Like most interviewers they are primarily concerned with how you think and your communication skills than the actual solution though.

You can read about the shared_ptr struct.
It implements a simple reference-counting garbage collector.
If you want a real garbage collector, you can overload the new operator.
Create a struct similar to shared_ptr, call it Object.
This will wrap the new object created. Now with overloading its operators, you can control the GC.
All you need to do now, is just implement one of the many GC algorithms

The claim you saw is false; the Boehm collector supports C and C++. I suggest reading the Boehm collector's documentation (particularly this page)for a good overview of how one might write a garbage collector in C or C++.

Related

Why use new and delete at all?

I'm new to C++ and I'm wondering why I should even bother using new and delete? It can cause problems (memory leaks) and I don't get why I shouldn't just initialize a variable without the new operator. Can someone explain it to me? It's hard to google that specific question.
For historical and efficiency reasons, C++ (and C) memory management is explicit and manual.
Sometimes, you might allocate on the call stack (e.g. by using VLAs or alloca(3)). However, that is not always possible, because
stack size is limited (depending on the platform, to a few kilobytes or a few megabytes).
memory need is not always FIFO or LIFO. It does happen that you need to allocate memory, which would be freed (or becomes useless) much later during execution, in particular because it might be the result of some function (and the caller - or its caller - would release that memory).
You definitely should read about garbage collection and dynamic memory allocation. In some languages (Java, Ocaml, Haskell, Lisp, ....) or systems, a GC is provided, and is in charge of releasing memory of useless (more precisely unreachable) data. Read also about weak references. Notice that most GCs need to scan the call stack for local pointers.
Notice that it is possible, but difficult, to have quite efficient garbage collectors (but usually not in C++). For some programs, Ocaml -with a generational copying GC- is faster than the equivalent C++ code -with explicit memory management.
Managing memory explicitly has the advantage (important in C++) that you don't pay for something you don't need. It has the inconvenience of putting more burden on the programmer.
In C or C++ you might sometimes consider using the Boehm's conservative garbage collector. With C++ you might sometimes need to use your own allocator, instead of the default std::allocator. Read also about smart pointers, reference counting, std::shared_ptr, std::unique_ptr, std::weak_ptr, and the RAII idiom, and the rule of three (in C++, becoming the rule of 5). The recent wisdom is to avoid explicit new and delete (e.g. by using standard containers and smart pointers).
Be aware that the most difficult situation in managing memory are arbitrary, perhaps circular, graphs (of reference).
On Linux and some other systems, valgrind is a useful tool to hunt memory leaks.
The alternative, allocating on the stack, will cause you trouble as stack sizes are often limited to Mb magnitudes and you'll get lots of value copies. You'll also have problems sharing stack-allocated data between function calls.
There are alternatives: using std::shared_ptr (C++11 onwards) will do the delete for you once the shared pointer is no longer being used. A technique referred to by the hideous acronym RAII is exploited by the shared pointer implementation. I mention it explicitly since most resource cleanup idioms are RAII-based. You can also make use of the comprehensive data structures available in the C++ Standard Template Library which eliminate the need to get your hands too dirty with explicit memory management.
But formally, every new must be balanced with a delete. Similarly for new[] and delete[].
Indeed in many cases new and delete are not needed, you can just use standard containers instead and leaving to them the allocation/deallocation management.
One of the reasons for which you may need to use allocation explicitly is for objects where the identity is important (i.e. they are not just values that can be copied around).
For example if you have a gui "window" object then making copies probably doesn't make sense and thus you're more or less ruling out all standard containers (they're designed for objects that can be copied and assigned). In this case if the object needs to survive the function that creates it probably the simplest solution is to just allocate explicitly it on the heap, possibly using a smart pointer to avoid leaks or use-after-delete.
In other cases it may be important to avoid copies not because they're illegal, but just not very efficient (big objects) and explicitly handling the instance lifetime may be a better (faster) solution.
Another case where explicit allocation/deallocation may be the best option are complex data structures that cannot be represented by the standard library (for example a tree in which each node is also part of a doubly-linked list).
Modern C++ styles often frown on explicit calls to new and delete outside of specialized resource management code.
This is not because the stack/automatic storage is sufficient, but rather because RAII smart resource owners (be they containers, shared pointers, or something else) make almost all direct memory wrangling unnessecary. And as the problem of memory management is often error prone, this makes your code more robust, easier to read, and sometimes faster (as the fancy resource owners can use techniques you might not bother with everywhere).
This is exemplified by the rule of zero: write no destructor, copy/move assign, copy/move constructor. Store state in smart storage, and have it handle it for you.
None of the above applies when you yourself are writing smart memory owning classes. This is a rare thing to need to do, however. It also requires C++14 (for make_unique) to get rid of the penultimate excuse to call new.
Now, the free store is still used, just not directly, under the above style. The free store (aka heap) is needed because automatic storage (aka the stack) only supports really simple object lifetime rules (scope based, compile time deterministic size and count, FILO order). As runtime sized and counted data is common, and object lifetime is often not that simple, the free store is used by most programs. Sometimes copying an object around on the stack is enough to make the simple lifetime less of a problem, but at other times identity is important.
The final reason is stack overflow. On some C++ implementations the stack/automatic storage is seriously constrained in size. What more is that there is rarely if ever a reliable failure mode when you put to much stuff in it. By storing large data on the free store, we can reduce the chance the stack will overflow.
First, if you don't need dynamic allocation, don't use it.
The most frequent reason for needing dynamic allocation is that
the object will have a lifetime which is determined by the
program logic rather than lexical scope. The new and
delete operators are designed to support explicitly managed
lifetimes.
Another common reason is that the size or structure of the
"object" is determined at runtime. For simple cases (arrays,
etc.) there are standard classes (std::vector) which will
handle this for you, but for more complicated structures (e.g.
graphs and trees), you'll have to do this yourself. (The usual
technique here is to create a class representing the graph or
tree, and have it manage the memory.)
And there is the case where the object must be polymorphic, and
the actual type won't be known until runtime. (There are some
tricky ways of handling this without dynamic allocation in the
simplest cases, but in general, you'll need dynamic allocation.)
In this case, std::unique_ptr might be appropriate to handle
the delete, or if the object must be shared, std::shared_ptr
(although usually, objects which must be shared fall into the
first category, above, and so smart pointers aren't
appropriate).
There are probably other reasons as well, but these are the
three that I've encountered the most often.
Only on simple programs you can know beforehand how much memory you'd use. In general you can not foresee how much memory you'd use.
However with modern C++11 you generally rely on standard libraries like vector and map for memory allocation, and the use of smart pointers helps you avoid memory leaks, so you don't really need to use new and delete explicitly by hand.
When you are using New then your object stores in Heap, and it remains there until you don't manually delete it. but in the case without using new your object goes in Stack and it destroys automatically when it goes out of scope.
Stack is set to a fix size, so if there is no any block for assign a new object then Stack Overflow occurs. This often happens when a lot of nested functions are being called, or if there is an infinite recursive call. If the current size of the heap is too small to accommodate new memory, then more memory can be added to the heap by the operating system.
Another reason may be if you are explicitly calling an external library or API with a C-style interface. Setting up a callback in such cases often means context data must be supplied and returned in the callback, and such an interface usually provides only a 'simple' void* or int*. Allocating an object or struct with new is appropriate for such actions, (you can delete it later in the callback, should you need to).

Compacting garbage collector implementation in C++0x

I'm implementing a compacting garbage collector for my own personal use in C++0x, and I've got a question. Obviously the mechanics of the collector depend upon moving objects, and I've been wondering how to implement this in terms of the smart pointer types that point to it. I've been thinking about either pointer-to-pointer in the pointer type itself, or, the collector maintains a list of pointers that point to each object so that they can be modified, removing the need for a double de-ref when accessing the pointer but adding some extra overhead during collection and additional memory overhead. What's the best way to go here?
Edit: My primary concern is for speedy allocation and access. I'm not concerned with particularly efficient collections or other maintenance, because that's not really what the GC is intended for.
There's nothing straight forward about grafting on extra GC to C++, let alone a compacting algorithm. It isn't clear exactly what you're trying to do and how it will interact with the rest of the C++ code.
I have actually written a gc in C++ which works with existing C++ code, and it had a compactor at one stage (though I dropped it because it was too slow). But there are many nasty semantic problems. I mentioned to Bjarne only a few weeks ago that C++ lacks the operator required to do it properly and the situation is that it is unlikely to ever exist because it has limited utility..
What you actually need is a "re-addres-me" operator. What happens is that you do not actually move objects around. You just use mmap to change the object address. This is much faster, and, in effect, it is using the VM features to provide handles.
Without this facility you have to have a way to perform an overlapping move of an object, which you cannot do in C++ efficiently: you'd have to move to a temporary first. In C, it is much easier, you can use memmove. At some stage all the pointers to or into the moved objects have to be adjusted.
Using handles does not solve this problem, it just reduces the problem from arbitrary sized objects to constant sized ones: these are easier to manage in an array, but the same problem exists: you have to manage the storage. If you remove lots of handle from the array randomly .. you still have a problem with fragmentation.
So don't bother with handles, they don't work.
This is what I did in Felix: you call new(shape, collector) T(args). Here the shape is a descriptor of the type, including a list of offsets which contain (GC) pointers, and the address of a routine to finalise the object (by default, it calls the destructor).
It also contains a flag saying if the object can be moved with memmove. If the object is big or immobile, it is allocated by malloc. If the object is small and mobile, it is allocated in an arena, provided there is space in the arena.
The arena is compacted by moving all the objects in it, and using the shape information to globally adjust all the pointers to or into these objects. Compaction can be done incrementally.
The downside for a C++ programmer is the need to construct a correct shape object to pass. This doesn't bother me because I'm implementing a language which can generate the shape information automatically.
Now: the key point is: to do compaction, you must use a precise collector. Compaction cannot work with a conservative collector. This is very important. It is fine to allow some leakage if you see an value that looks like a pointer but happens to be an integer: some object won't be collected, but this is usually no big deal. But for compaction you have to adjust the pointers but you'd better not change that integer: so you have to know for sure when something is a pointer, so your collector has to be precise: the shape must be known.
In Ocaml this is relatively simple: everything is either a pointer or integer and the low bit is used at run time to tell. Objects pointed at have a code telling the type, and there are only a few types: either a scalar (don't scan it) or an aggregate (scan it, it only contains integers or pointers).
This is a pretty straight-forward question so here's a straight-forward answer:
Mark-and-sweep (and occasionally mark-and-compact to avoid heap fragmentation) is the fastest when it comes to allocation and access (avoiding double de-refs). It's also very easy to implement. Since you're not worried about collection performance impact (mark-and-sweep tends to freeze up the process in a nondeterministically), this should be the way to go.
Implementation details found at:
http://www.brpreiss.com/books/opus5/html/page424.html#secgarbagemarksweep
http://www.brpreiss.com/books/opus5/html/page428.html
A nursery generation will give you the best possible allocation performance because it is just a pointer bump.
You could implement pointer updates without using double indirection by using techniques like a shadow stack but this will be slow and very error prone if you're writing this C++ code by hand.

Why don't some languages allow declaration of pointers?

I'm programming in C++ right now, and I love using pointers. But it seems that other, newer languages like Java, C#, and Python don't allow you to explicitly declare pointers. In other words, you can't write both int x and int * y, and have x be a value while y is a pointer, in any of those languages. What is the reasoning behind this?
Pointers aren't bad, they are just easy to get wrong. In newer languages they have found ways of doing the same things, but with less risk of shooting yourself in the foot.
There is nothing wrong with pointers though. Go ahead and love them.
Toward your example, why would you want both x and y pointing to the same memory? Why not just always call it x?
One more point, pointers mean that you have to manage the memory lifetime yourself. Newer languages prefer to use garbage collection to manage the memory and allowing for pointers would make that task quite difficult.
I'll start with one of my favorite Scott Meyers quotes:
When I give talks on exception handling, I teach people two things:
POINTERS ARE YOUR ENEMIES, because they lead to the kinds of problems that auto_ptr is designed to eliminate.
POINTERS ARE YOUR FRIENDS, because operations on pointers can't throw.
Then I tell them to have a nice day :-)
The point is that pointers are extremely useful and it's certainly necessary to understand them when programming in C++. You can't understand the C++ memory model without understanding pointers. When you are implementing a resource-owning class (like a smart pointer, for example), you need to use pointers, and you can take advantage of their no-throw guarantee to write exception-safe resource owning classes.
However, in well-written C++ application code, you should never have to work with raw pointers. Never. You should always use some layer of abstraction instead of working directly with pointers:
Use references instead of pointers wherever possible. References cannot be null and they make code easier to understand, easier to write, and easier to code review.
Use smart pointers to manage any pointers that you do use. Smart pointers like shared_ptr, auto_ptr, and unique_ptr help to ensure that you don't leak resources or free resources prematurely.
Use containers like those found in the standard library for storing collections of objects instead of allocating arrays yourself. By using containers like vector and map, you can ensure that your code is exception safe (meaning that even when an exception is thrown, you won't leak resources).
Use iterators when working with containers. It's far easier to use iterators correctly than it is to use pointers correctly, and many library implementations provide debug support for helping you to find where you are using them incorrectly.
When you are working with legacy or third-party APIs and you absolutely must use raw pointers, write a class to encapsulate usage of that API.
C++ has automatic resource management in the form of Scope-Bound Resource Management (SBRM, also called Resource Acquisition is Initialization, or RAII). Use it. If you aren't using it, you're doing it wrong.
Pointers can be abused, and managed languages prefer to protect you from potential pitfalls. However, pointers are certainly not bad - they're an integral feature of the C and C++ languages, and writing C/C++ code without them is both tricky and cumbersome.
A true "pointer" has two characteristics.
It holds the address of another object (or primitive)
and exposes the numeric nature of that address so you can do arithmetic.
Typically the arithmetic operations defined for pointers are:
Adding an integer to a pointer into an array, which returns the address of another element.
Subtracting two pointers into the same array, which returns the number of elements in-between (inclusive of one end).
Comparing two pointers into the same array, which indicates which element is closer to the head of the array.
Managed languages generally lead you down the road of "references" instead of pointers. A reference also holds the address of another object (or primitive), but arithmetic is disallowed.
Among other things, this means you can't use pointer arithmetic to walk off the end of an array and treat some other data using the wrong type. The other way of forming an invalid pointer is taken care of in such environments by using garbage collection.
Together this ensures type-safety, but at a terrible loss of generality.
I try to answer OP's question directly:
In other words, you can't write both
int x and int * y, and have x be a
value while y is a pointer, in any of
those languages. What is the reasoning
behind this?
The reason behind this is the managed memory model in these languages. In C# (or Python, or Java,...) the lifetime of resources and thus usage of memory is managed automatically by the underlying runtime, or by it's garbage collector, to be precise. Briefly: The application has no control over the location of a resource in memory. It is not specified - and is even not guaranteed to stay constant during the lifetime of a resource. Hence, the notion of pointer as 'a location of something in virtual or physical memory' is completely irrelevant.
As someone already mentioned, pointers can, and actually, will go wrong if you have a massive application. This is one of the reasons we sometimes see Windows having problems due to NULL pointers created! I personally don't like pointers because it causes terrible memory leak and no matter how good you manage your memory, it will eventually hunt you down in some ways. I experienced this a lot with OpenCV when working around image processing applications. Having lots of pointers floating around, putting them in a list and then retrieving them later caused problems for me. But again, there are good sides of using pointers and it is often a good way to tune up your code. It all depends on what you are doing, what specs you have to meet, etc.
Pointers aren't bad. Pointers are difficult. The reason you can't have int x and int * y in Java, C#, etc., is that such languages want to put you away from solving coding problems (which are eventually subtle mistakes of yours), and they want to bring you closer to producing solutions to your project. They want to give you productivity.
As I said before, pointers aren't bad, they're just diffucult. In a Hello World programs pointers seems to be a piece of cake. Nevertheless when the program start growing, the complexity of managing pointers, passing the correct values, counting the objects, deleting the pointers, etc. start to get complex.
Now, from the point of view of the programmer (the user of the language, in this case you) this would lead to another problem that will become evident over the time: the longer it takes you don't read the code, the more difficult it will become to understand it again (i.e. projects from past years, months or event days). Added to this the fact that sometimes pointers use more than one level of indirection (TheClass ** ptr).
Last but not least, there are topics when the pointers application is very useful. As I mention before, they aren't bad. In Algorithms field they are very useful because in mathemical context you can refer to a specific value using a simple pointer (i.e. int *) and change is value without creating another object, and ironically it becomes easier to implement the algorithms with the use of pointers rather than without it.
Reminder: Each time when you ask why pointers or another thing is bad or is good, instead try to think in the historical context around when such topic or technology emerged and the problem that they tried to solve. In this case, pointers where needed to access to the memory of the PDP computers at Bell laboratories.

Is it secure to use malloc?

Somebody told me that allocating with malloc is not secure anymore, I'm not a C/C++ guru but I've made some stuff with malloc and C/C++. Does anyone know about what risks I'm into?
Quoting him:
[..] But indeed the weak point of C/C++ it is the security, and the Achilles' heel is indeed malloc and the abuse of pointers. C/C++ it is a well known insecure language. [..] There would be few apps in what I would not recommend to continue programming with C++."
It's probably true that C++'s new is safer than malloc(), but that doesn't automatically make malloc() more unsafe than it was before. Did your friend say why he considers it insecure?
However, here's a few things you should pay attention to:
1) With C++, you do need to be careful when you use malloc()/free() and new/delete side-by-side in the same program. This is possible and permissible, but everything that was allocated with malloc() must be freed with free(), and not with delete. Similarly, everything that was allocated with new must be freed with delete, and never with free(). (This logic goes even further: If you allocate an array with new[], you must free it with delete[], and not just with delete.) Always use corresponding counterparts for allocation and deallocation, per object.
int* ni = new int;
free(ni); // ERROR: don't do this!
delete ni; // OK
int* mi = (int*)malloc(sizeof(int));
delete mi; // ERROR!
free(mi); // OK
2) malloc() and new (speaking again of C++) don't do exactly the same thing. malloc() just gives you a chunk of memory to use; new will additionally call a contructor (if available). Similarly, delete will call a destructor (if available), while free() won't. This could lead to problems, such as incorrectly initialized objects (because the constructor wasn' called) or un-freed resources (because the destructor wasn't called).
3) C++'s new also takes care of allocating the right amount of memory for the type specified, while you need to calculate this yourself with malloc():
int *ni = new int;
int *mi = (int*)malloc(sizeof(int)); // required amount of memory must be
// explicitly specified!
// (in some situations, you can make this
// a little safer against code changes by
// writing sizeof(*mi) instead.)
Conclusion:
In C++, new/delete should be preferred over malloc()/free() where possible. (In C, new/delete is not available, so the choice would be obvious there.)
[...] C/C++ it is a well known insecure language. [...]
Actually, that's wrong. Actually, "C/C++" doesn't even exist. There's C, and there's C++. They share some (or, if you want, a lot of) syntax, but they are indeed very different languages.
One thing they differ in vastly is their way to manage dynamic memory. The C way is indeed using malloc()/free() and if you need dynamic memory there's very little else you can do but use them (or a few siblings of malloc()).
The C++ way is to not to (manually) deal with dynamic resources (of which memory is but one) at all. Resource management is handed to a few well-implemented and -tested classes, preferably from the standard library, and then done automatically. For example, instead of manually dealing with zero-terminated character buffers, there's std::string, instead of manually dealing with dynamically allocated arrays, there std:vector, instead of manually dealing with open files, there's the std::fstream family of streams etc.
Your friend could be talking about:
The safety of using pointers in general. For example in C++ if you're allocating an array of char with malloc, question why you aren't using a string or vector. Pointers aren't insecure, but code that's buggy due to incorrect use of pointers is.
Something about malloc in particular. Most OSes clear memory before first handing it to a process, for security reasons. Otherwise, sensitive data from one app, could be leaked to another app. On OSes that don't do that, you could argue that there's an insecurity related to malloc. It's really more related to free.
It's also possible your friend doesn't know what he's talking about. When someone says "X is insecure", my response is, "in what way?".
Maybe your friend is older, and isn't familiar with how things work now - I used to think C and C++ were effectively the same until I discovered many new things about the language that have come out in the last 10 years (most of my teachers were old-school Bell Laboratories guys who wrote primarily in C and had only a cursory knowledge of C++ - and Bell Laboratories engineers invented C++!). Don't laugh at him/her - you might be there someday too!
I think your friend is uncomfortable with the idea that you have to do your own memory management - ie, its easy to make mistakes. In that regard, it is insecure and he/she is correct... However, that insecure aspect can be overcome with good programming practices, like RAII and using smart pointers.
For many applications, though, having automated garbage collection is probably fine, and some programmers are confused about how pointers work, so as far as getting new, inexperienced developers to program effectively in C/C++ without some training might be difficult. Which is maybe why your friend thinks C/C++ should be avoided.
It's the only way to allocate and deallocate memory in C natively. If you misuse it, it can be as insecure as anything else. Microsoft provides some "secure" versions of other functions, that take an extra size_t parametre - maybe your friend was referring to something similar? If that's the case, perhaps he simply prefers calloc() over malloc()?
If you are using C, you have to use malloc to allocate memory, unless you have a third-party library that will allocate / manage your memory for you.
Certainly your friend has a point that it is difficult to write secure code in C, especially when you are allocating memory and dealing with buffers. But we all know that, right? :)
What he maybe wanted to warn you is about pointers usage. Yes, that will cause problems if you don't understand how it works. Otherwise, ask what your friend meant, or ask him for a reference that proof his affirmation.
Saying that malloc is not safe is like saying "don't use system X because it's insecure".
Until that, use malloc in C, and new in C++.
If you use malloc in C++, people will look mad at you, but that's fine in very specific occasions.
There is nothing wrong with malloc as such. Your friend apparently means that manual memory management is insecure and easily leads to bugs. Compared to other languages where the memory is managed automatically by a garbage collector (not that it is not possible to have leaks - nowadays nobody cares if the program cleans up when it terminates, what matters is that something is not hogging memory while the program is running).
Of course in C++ you wouldn't really touch malloc at all (because it simply isn't functionally equivalent to new and just doesn't do what you need, assuming most of the time you don't want just to get raw memory). And in addition, it is completely possible to program using techniques which almost entirely eliminate the possibility of memory leaks and corruption (RAII), but that takes expertise.
Technically speaking, malloc was never secure to begin with, but that aside, the only thing I can think of is the infamous "OOM killer" (OOM = out-of-memory) that the Linux kernel uses. You can read up on it if you want. Other than that, I don't see how malloc itself is inherently insecure.
In C++, there is no such problem if you stick to good conventions. In C, well, practice. Malloc itself is not an inherently insecure function at all - people simply can deal with it's results inadequately.
It is not secure to use malloc because it's not possible to write a large scale application and ensure every malloc is freed in an efficient manner. Thus, you will have tons of memory leaks which may or may not be a problem... but, when you double free, or use the wrong delete etc, undefined behaviour can result. Indeed, using the wrong delete in C++ will typically allow arbitrary code execution.
The ONLY way for code written in a language like C or C++ to be secure is to mathematically prove the entire program with its dependencies factored in.
Modern memory-safe languages are safe from these types of bugs as long as the underlying language implementation isn't vulnerable (which is indeed rare because these are all written in C/C++, but as we move towards hardware JVMs, this problem will go away).
Perhaps the person was referring to the possibility of accessing data via malloc()?
Malloc doesn't affect the contents of the region that it provides, so it MAY be possible to collect data from other processes by mallocing a large area and then scanning the contents.
free() doesn't clear memory either so data paced into dynamically allocated buffers is, in principle, accessible.
I know someone who, many years ago admittedly, exploited malloc to create an inter-process communication scheme when he found that mallocs of equal size would return the address of the most recently free'd block.

General guidelines to avoid memory leaks in C++ [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What are some general tips to make sure I don't leak memory in C++ programs? How do I figure out who should free memory that has been dynamically allocated?
I thoroughly endorse all the advice about RAII and smart pointers, but I'd also like to add a slightly higher-level tip: the easiest memory to manage is the memory you never allocated. Unlike languages like C# and Java, where pretty much everything is a reference, in C++ you should put objects on the stack whenever you can. As I've see several people (including Dr Stroustrup) point out, the main reason why garbage collection has never been popular in C++ is that well-written C++ doesn't produce much garbage in the first place.
Don't write
Object* x = new Object;
or even
shared_ptr<Object> x(new Object);
when you can just write
Object x;
Use RAII
Forget Garbage Collection (Use RAII instead). Note that even the Garbage Collector can leak, too (if you forget to "null" some references in Java/C#), and that Garbage Collector won't help you to dispose of resources (if you have an object which acquired a handle to a file, the file won't be freed automatically when the object will go out of scope if you don't do it manually in Java, or use the "dispose" pattern in C#).
Forget the "one return per function" rule. This is a good C advice to avoid leaks, but it is outdated in C++ because of its use of exceptions (use RAII instead).
And while the "Sandwich Pattern" is a good C advice, it is outdated in C++ because of its use of exceptions (use RAII instead).
This post seem to be repetitive, but in C++, the most basic pattern to know is RAII.
Learn to use smart pointers, both from boost, TR1 or even the lowly (but often efficient enough) auto_ptr (but you must know its limitations).
RAII is the basis of both exception safety and resource disposal in C++, and no other pattern (sandwich, etc.) will give you both (and most of the time, it will give you none).
See below a comparison of RAII and non RAII code:
void doSandwich()
{
T * p = new T() ;
// do something with p
delete p ; // leak if the p processing throws or return
}
void doRAIIDynamic()
{
std::auto_ptr<T> p(new T()) ; // you can use other smart pointers, too
// do something with p
// WON'T EVER LEAK, even in case of exceptions, returns, breaks, etc.
}
void doRAIIStatic()
{
T p ;
// do something with p
// WON'T EVER LEAK, even in case of exceptions, returns, breaks, etc.
}
About RAII
To summarize (after the comment from Ogre Psalm33), RAII relies on three concepts:
Once the object is constructed, it just works! Do acquire resources in the constructor.
Object destruction is enough! Do free resources in the destructor.
It's all about scopes! Scoped objects (see doRAIIStatic example above) will be constructed at their declaration, and will be destroyed the moment the execution exits the scope, no matter how the exit (return, break, exception, etc.).
This means that in correct C++ code, most objects won't be constructed with new, and will be declared on the stack instead. And for those constructed using new, all will be somehow scoped (e.g. attached to a smart pointer).
As a developer, this is very powerful indeed as you won't need to care about manual resource handling (as done in C, or for some objects in Java which makes intensive use of try/finally for that case)...
Edit (2012-02-12)
"scoped objects ... will be destructed ... no matter the exit" that's not entirely true. there are ways to cheat RAII. any flavour of terminate() will bypass cleanup. exit(EXIT_SUCCESS) is an oxymoron in this regard.
– wilhelmtell
wilhelmtell is quite right about that: There are exceptional ways to cheat RAII, all leading to the process abrupt stop.
Those are exceptional ways because C++ code is not littered with terminate, exit, etc., or in the case with exceptions, we do want an unhandled exception to crash the process and core dump its memory image as is, and not after cleaning.
But we must still know about those cases because, while they rarely happen, they can still happen.
(who calls terminate or exit in casual C++ code?... I remember having to deal with that problem when playing with GLUT: This library is very C-oriented, going as far as actively designing it to make things difficult for C++ developers like not caring about stack allocated data, or having "interesting" decisions about never returning from their main loop... I won't comment about that).
Instead of managing memory manually, try to use smart pointers where applicable.
Take a look at the Boost lib, TR1, and smart pointers.
Also smart pointers are now a part of C++ standard called C++11.
You'll want to look at smart pointers, such as boost's smart pointers.
Instead of
int main()
{
Object* obj = new Object();
//...
delete obj;
}
boost::shared_ptr will automatically delete once the reference count is zero:
int main()
{
boost::shared_ptr<Object> obj(new Object());
//...
// destructor destroys when reference count is zero
}
Note my last note, "when reference count is zero, which is the coolest part. So If you have multiple users of your object, you won't have to keep track of whether the object is still in use. Once nobody refers to your shared pointer, it gets destroyed.
This is not a panacea, however. Though you can access the base pointer, you wouldn't want to pass it to a 3rd party API unless you were confident with what it was doing. Lots of times, your "posting" stuff to some other thread for work to be done AFTER the creating scope is finished. This is common with PostThreadMessage in Win32:
void foo()
{
boost::shared_ptr<Object> obj(new Object());
// Simplified here
PostThreadMessage(...., (LPARAM)ob.get());
// Destructor destroys! pointer sent to PostThreadMessage is invalid! Zohnoes!
}
As always, use your thinking cap with any tool...
Read up on RAII and make sure you understand it.
Bah, you young kids and your new-fangled garbage collectors...
Very strong rules on "ownership" - what object or part of the software has the right to delete the object. Clear comments and wise variable names to make it obvious if a pointer "owns" or is "just look, don't touch". To help decide who owns what, follow as much as possible the "sandwich" pattern within every subroutine or method.
create a thing
use that thing
destroy that thing
Sometimes it's necessary to create and destroy in widely different places; i think hard to avoid that.
In any program requiring complex data structures, i create a strict clear-cut tree of objects containing other objects - using "owner" pointers. This tree models the basic hierarchy of application domain concepts. Example a 3D scene owns objects, lights, textures. At the end of the rendering when the program quits, there's a clear way to destroy everything.
Many other pointers are defined as needed whenever one entity needs access another, to scan over arays or whatever; these are the "just looking". For the 3D scene example - an object uses a texture but does not own; other objects may use that same texture. The destruction of an object does not invoke destruction of any textures.
Yes it's time consuming but that's what i do. I rarely have memory leaks or other problems. But then i work in the limited arena of high-performance scientific, data acquisition and graphics software. I don't often deal transactions like in banking and ecommerce, event-driven GUIs or high networked asynchronous chaos. Maybe the new-fangled ways have an advantage there!
Most memory leaks are the result of not being clear about object ownership and lifetime.
The first thing to do is to allocate on the Stack whenever you can. This deals with most of the cases where you need to allocate a single object for some purpose.
If you do need to 'new' an object then most of the time it will have a single obvious owner for the rest of its lifetime. For this situation I tend to use a bunch of collections templates that are designed for 'owning' objects stored in them by pointer. They are implemented with the STL vector and map containers but have some differences:
These collections can not be copied or assigned to. (once they contain objects.)
Pointers to objects are inserted into them.
When the collection is deleted the destructor is first called on all objects in the collection. (I have another version where it asserts if destructed and not empty.)
Since they store pointers you can also store inherited objects in these containers.
My beaf with STL is that it is so focused on Value objects while in most applications objects are unique entities that do not have meaningful copy semantics required for use in those containers.
Great question!
if you are using c++ and you are developing real-time CPU-and-memory boud application (like games) you need to write your own Memory Manager.
I think the better you can do is merge some interesting works of various authors, I can give you some hint:
Fixed size allocator is heavily discussed, everywhere in the net
Small Object Allocation was introduced by Alexandrescu in 2001 in his perfect book "Modern c++ design"
A great advancement (with source code distributed) can be found in an amazing article in Game Programming Gem 7 (2008) named "High Performance Heap allocator" written by Dimitar Lazarov
A great list of resources can be found in this article
Do not start writing a noob unuseful allocator by yourself... DOCUMENT YOURSELF first.
One technique that has become popular with memory management in C++ is RAII. Basically you use constructors/destructors to handle resource allocation. Of course there are some other obnoxious details in C++ due to exception safety, but the basic idea is pretty simple.
The issue generally comes down to one of ownership. I highly recommend reading the Effective C++ series by Scott Meyers and Modern C++ Design by Andrei Alexandrescu.
There's already a lot about how to not leak, but if you need a tool to help you track leaks take a look at:
BoundsChecker under VS
MMGR C/C++ lib from FluidStudio
http://www.paulnettle.com/pub/FluidStudios/MemoryManagers/Fluid_Studios_Memory_Manager.zip (its overrides the allocation methods and creates a report of the allocations, leaks, etc)
User smart pointers everywhere you can! Whole classes of memory leaks just go away.
Share and know memory ownership rules across your project. Using the COM rules makes for the best consistency ([in] parameters are owned by the caller, callee must copy; [out] params are owned by the caller, callee must make a copy if keeping a reference; etc.)
valgrind is a good tool to check your programs memory leakages at runtime, too.
It is available on most flavors of Linux (including Android) and on Darwin.
If you use to write unit tests for your programs, you should get in the habit of systematicaly running valgrind on tests. It will potentially avoid many memory leaks at an early stage. It is also usually easier to pinpoint them in simple tests that in a full software.
Of course this advice stay valid for any other memory check tool.
Also, don't use manually allocated memory if there's a std library class (e.g. vector). Make sure if you violate that rule that you have a virtual destructor.
If you can't/don't use a smart pointer for something (although that should be a huge red flag), type in your code with:
allocate
if allocation succeeded:
{ //scope)
deallocate()
}
That's obvious, but make sure you type it before you type any code in the scope
A frequent source of these bugs is when you have a method that accepts a reference or pointer to an object but leaves ownership unclear. Style and commenting conventions can make this less likely.
Let the case where the function takes ownership of the object be the special case. In all situations where this happens, be sure to write a comment next to the function in the header file indicating this. You should strive to make sure that in most cases the module or class which allocates an object is also responsible for deallocating it.
Using const can help a lot in some cases. If a function will not modify an object, and does not store a reference to it that persists after it returns, accept a const reference. From reading the caller's code it will be obvious that your function has not accepted ownership of the object. You could have had the same function accept a non-const pointer, and the caller may or may not have assumed that the callee accepted ownership, but with a const reference there's no question.
Do not use non-const references in argument lists. It is very unclear when reading the caller code that the callee may have kept a reference to the parameter.
I disagree with the comments recommending reference counted pointers. This usually works fine, but when you have a bug and it doesn't work, especially if your destructor does something non-trivial, such as in a multithreaded program. Definitely try to adjust your design to not need reference counting if it's not too hard.
Tips in order of Importance:
-Tip#1 Always remember to declare your destructors "virtual".
-Tip#2 Use RAII
-Tip#3 Use boost's smartpointers
-Tip#4 Don't write your own buggy Smartpointers, use boost (on a project I'm on right now I can't use boost, and I've suffered having to debug my own smart pointers, I would definately not take the same route again, but then again right now I can't add boost to our dependencies)
-Tip#5 If its some casual/non-performance critical (as in games with thousands of objects) work look at Thorsten Ottosen's boost pointer container
-Tip#6 Find a leak detection header for your platform of choice such as Visual Leak Detection's "vld" header
If you can, use boost shared_ptr and standard C++ auto_ptr. Those convey ownership semantics.
When you return an auto_ptr, you are telling the caller that you are giving them ownership of the memory.
When you return a shared_ptr, you are telling the caller that you have a reference to it and they take part of the ownership, but it isn't solely their responsibility.
These semantics also apply to parameters. If the caller passes you an auto_ptr, they are giving you ownership.
Others have mentioned ways of avoiding memory leaks in the first place (like smart pointers). But a profiling and memory-analysis tool is often the only way to track down memory problems once you have them.
Valgrind memcheck is an excellent free one.
For MSVC only, add the following to the top of each .cpp file:
#ifdef _DEBUG
#define new DEBUG_NEW
#endif
Then, when debugging with VS2003 or greater, you will be told of any leaks when your program exits (it tracks new/delete). It's basic, but it has helped me in the past.
valgrind (only avail for *nix platforms) is a very nice memory checker
If you are going to manage your memory manually, you have two cases:
I created the object (perhaps indirectly, by calling a function that allocates a new object), I use it (or a function I call uses it), then I free it.
Somebody gave me the reference, so I should not free it.
If you need to break any of these rules, please document it.
It is all about pointer ownership.
Try to avoid allocating objects dynamically. As long as classes have appropriate constructors and destructors, use a variable of the class type, not a pointer to it, and you avoid dynamical allocation and deallocation because the compiler will do it for you.
Actually that's also the mechanism used by "smart pointers" and referred to as RAII by some of the other writers ;-) .
When you pass objects to other functions, prefer reference parameters over pointers. This avoids some possible errors.
Declare parameters const, where possible, especially pointers to objects. That way objects can't be freed "accidentially" (except if you cast the const away ;-))).
Minimize the number of places in the program where you do memory allocation and deallocation. E. g. if you do allocate or free the same type several times, write a function for it (or a factory method ;-)).
This way you can create debug output (which addresses are allocated and deallocated, ...) easily, if required.
Use a factory function to allocate objects of several related classes from a single function.
If your classes have a common base class with a virtual destructor, you can free all of them using the same function (or static method).
Check your program with tools like purify (unfortunately many $/€/...).
You can intercept the memory allocation functions and see if there are some memory zones not freed upon program exit (though it is not suitable for all the applications).
It can also be done at compile time by replacing operators new and delete and other memory allocation functions.
For example check in this site [Debugging memory allocation in C++]
Note: There is a trick for delete operator also something like this:
#define DEBUG_DELETE PrepareDelete(__LINE__,__FILE__); delete
#define delete DEBUG_DELETE
You can store in some variables the name of the file and when the overloaded delete operator will know which was the place it was called from. This way you can have the trace of every delete and malloc from your program. At the end of the memory checking sequence you should be able to report what allocated block of memory was not 'deleted' identifying it by filename and line number which is I guess what you want.
You could also try something like BoundsChecker under Visual Studio which is pretty interesting and easy to use.
We wrap all our allocation functions with a layer that appends a brief string at the front and a sentinel flag at the end. So for example you'd have a call to "myalloc( pszSomeString, iSize, iAlignment ); or new( "description", iSize ) MyObject(); which internally allocates the specified size plus enough space for your header and sentinel. Of course, don't forget to comment this out for non-debug builds! It takes a little more memory to do this but the benefits far outweigh the costs.
This has three benefits - first it allows you to easily and quickly track what code is leaking, by doing quick searches for code allocated in certain 'zones' but not cleaned up when those zones should have freed. It can also be useful to detect when a boundary has been overwritten by checking to ensure all sentinels are intact. This has saved us numerous times when trying to find those well-hidden crashes or array missteps. The third benefit is in tracking the use of memory to see who the big players are - a collation of certain descriptions in a MemDump tells you when 'sound' is taking up way more space than you anticipated, for example.
C++ is designed RAII in mind. There is really no better way to manage memory in C++ I think.
But be careful not to allocate very big chunks (like buffer objects) on local scope. It can cause stack overflows and, if there is a flaw in bounds checking while using that chunk, you can overwrite other variables or return addresses, which leads to all kinds security holes.
One of the only examples about allocating and destroying in different places is thread creation (the parameter you pass).
But even in this case is easy.
Here is the function/method creating a thread:
struct myparams {
int x;
std::vector<double> z;
}
std::auto_ptr<myparams> param(new myparams(x, ...));
// Release the ownership in case thread creation is successfull
if (0 == pthread_create(&th, NULL, th_func, param.get()) param.release();
...
Here instead the thread function
extern "C" void* th_func(void* p) {
try {
std::auto_ptr<myparams> param((myparams*)p);
...
} catch(...) {
}
return 0;
}
Pretty easyn isn't it? In case the thread creation fails the resource will be free'd (deleted) by the auto_ptr, otherwise the ownership will be passed to the thread.
What if the thread is so fast that after creation it releases the resource before the
param.release();
gets called in the main function/method? Nothing! Because we will 'tell' the auto_ptr to ignore the deallocation.
Is C++ memory management easy isn't it?
Cheers,
Ema!
Manage memory the same way you manage other resources (handles, files, db connections, sockets...). GC would not help you with them either.
Exactly one return from any function. That way you can do deallocation there and never miss it.
It's too easy to make a mistake otherwise:
new a()
if (Bad()) {delete a; return;}
new b()
if (Bad()) {delete a; delete b; return;}
... // etc.