How do I know who holds the shared_ptr<>?

How do I know who holds the shared_ptr<>? - c++

I use boost::shared_ptr in my application in C++. The memory problem is really serious, and the application takes large amount of memory.
However, because I put every newed object into a shared_ptr, when the application exits, no memory leaking can be detected.
There must be something like std::vector<shared_ptr<> > pool holding the resource. How can I know who holds the shared_ptr, when debugging?
It is hard to review code line by line. Too much code...

You can't know, by only looking at a shared_ptr, where the "sibling pointers" are. You can test if one is unique() or get the use_count(), among other methods.

The popular widespread use of shared_ptr will almost inevitably cause unwanted and unseen memory occupation.
Cyclic references are a well known cause and some of them can be indirect and difficult to spot especially in complex code that is worked on by more that one programmer; a programmer may decide than one object needs a reference to another as a quick fix and doesn't have time to examine all the code to see if he is closing a cycle. This hazard is hugely underestimated.
Less well understood is the problem of unreleased references. If an object is shared out to many shared_ptrs then it will not be destroyed until every one of them is zeroed or goes out of scope. It is very easy to overlook one of these references and end up with objects lurking unseen in memory that you thought you had finished with.
Although strictly speaking these are not memory leaks (it will all be released before the program exits) they are just as harmful and harder to detect.
These problems are the consequences of expedient false declarations:
Declaring what you really want to be single ownership as shared_ptr. scoped_ptr would be correct but then any other reference to that object will have to be a raw pointer, which could be left dangling.
Declaring what you really want to be a passive observing reference as shared_ptr. weak_ptr would be correct but then you have the hassle of converting it to share_ptr every time you want to use it.
I suspect that your project is a fine example of the kind of trouble that this practice can get you into.
If you have a memory intensive application you really need single ownership so that your design can explicitly control object lifetimes.
With single ownership opObject=NULL; will definitely delete the object and it will do it now.
With shared ownership spObject=NULL; ........who knows?......

One solution to dangling or circular smart pointer references we've done is customize the smart pointer class to add a debug-only bookkeeping function. Whenever a smartpointer adds a reference to an object, it takes a stack trace and puts it in a map whose each entry keeps track of
The address of the object being allocated (what the pointer points to)
The addresses of each smartpointer object holding a reference to the object
The corresponding stacktraces of when each smartpointer was constructed
When a smartpointer goes out of scope, its entry in the map gets deleted. When the last smartpointer to an object gets destroyed, the pointee object gets its entry in the map removed.
Then we have a "track leaks" command with two functions: '[re]start leak tracking' (which clears the whole map and enabled tracking if its not already), and 'print open references', which shows all outstanding smartpointer references created since the 'start leak tracking' command was issued. Since you can see the stack traces of where those smart pointers came into being, you can easily know exactly who's keeping from your object being freed. It slows things down when its on, so we don't leave it on all the time.
It's a fair amount of work to implement, but definitely worth it if you've got a codebase where this happens a lot.

You may be experiencing a shared pointer memory leak via cycles. What happens is your shared objects may hold references to other shared objects which eventually lead back to the original. When this happens the cycle keeps all reference counts at 1 even though no one else can access the objects. The solution is weak pointers.

Try refactoring some of your code so that ownership is more explicitly expressed by the use of weak pointers instead of shared pointers in some places.
When looking at your class hierarchy it's possible to determine which class really should hold a shared pointer and which merely needs only the weak one, so you can avoid cycles if there are any and if the "real" owner object is destructed, "non-owner" objects should have already been gone. If it turns out that some objects lose pointers too early, you have to look into object destruction sequence in your app and fix it.

You're obviously holding onto references to your objects within your application. This means that you are, on purpose, keeping things in memory. That means, you don't have a memory leak. A memory leak is when memory is allocated, and then you do not keep a reference to its address.
Basically, you need to look at your design and figure out why you are keeping so many objects and data in memory, and how can you minimize it.
The one possibility that you have a pseudo-memory leak is that you are creating more objects than you think you are. Try putting breakpoints on all statements containing a 'new'. See if your application is constructing more objects than you thought it should, and then read through that code.
The problem is really not so much a memory-leak as it is an issue of your application's design.

I was going to suggest using UMDH if you are on windows. It is a very powerful tool. Use it find allocations per transaction/time-period that you expect to be freed then find who is holding them.
There is more information on this SO answer
Find memory leaks caused by smart pointers

It is not possible to tell what objects own shared_ptr from within the program. If you are on Linux, one sure way to debug the memory leaks is the Valgrind tool -- while it won't directly answer your question, it will tell where the memory was allocated, which is usually sufficient for fixing the problem. I imagine Windows has comparable tools, but I am do not know which one is best.

Related

How to avoid null-pointers in multi-threading on the same List?

Say, I have a vector of dynamic object pointers and have different threads working on those objects.
It is possible that while one thread is working on an object, the main thread is deleting it. It does this by setting a flag in the object to mark it for deletion and then starting to free up its memory.
I have thought about taking care of this by checking for the flag before each single access to the object, but theoretically the following could happen (example code for illustration, although I am trying to make it reflect the situation as best as possible there could still be errors in it):
object = copyPointerFromVector(someIndex);
if(!object->markedForDeletion){
---flag set, object cleaned up by main thread and erased from vector
object->getValues(something); //crash with access violation
}
While it is probably rare this of course is still unacceptable. As someone obviously very very rusty with multi-threading, what is the right way of solving this issue?

Note up front: I assume you know about synchronization (mutexes, condition variables, atomics etc), i.e. the primitive building blocks used for multithreaded programming and that your question is about how to use them. You need those basics.
The problem you have is basically one of unclear ownership, not one of synchronization. Of course, ownership between threads requires synchronization, so it is also involved. Still, when one part of your program is destroying a shared object while another part is still using it, it's because it wrongly assumes it was the sole owner and could dispose of the object. More generally, in multithreading, you could also say that it changes data structures without synchronization, but this case is special enough and there are according tools to deal with it.
The tools to deal with this are called reference counting and garbage collection. Of those, the easiest to apply is probably reference counting. For that, all you need is a smart pointer that keeps track of the number of owners of an object. For example, std::shared_ptr gives you exactly that and it manages the reference count in a thread-safe way. In order to "delete" an object from the mentioned vector, you just remove the smart pointer. If the refcount goes to zero, it was the last one and gets deleted for you.
Garbage collection is a bit more complex. It involves scanning your process memory for references/pointers to objects and deleting the objects which aren't referenced any more. This requires installing a garbage collector though and it's a more complex change to an existing program.

Should I free long-lived memory that would normally be freed at the very end of the program?

I am currently writing a library that parse some structured binary data into a set of objects. These objects are expected to outlive any user code, and would normally be freed at the end or after the main function.
I am using shared (and weak) pointers to manage the memory of each object, but it is causing a lot of added complexity to the program, and raises structural issues that I will not get into in this particular question.
Considering that:
traversing the entirety of the binary data is expensive and I cannot afford to do it more than one time,
each visited entry is used to build an object, that then gets registered (i.e. added into the set),
entries in the binary data may rely on other entries that appears later but gets parsed immediately, and registered when the entry is visited again,
duplicate entries may appear at any moment, but I need to merge those duplicates into one instance (and update any pointer referencing those duplicates to the new merged entry) before registration,
every single one of those objects is guaranteed to be of one of many POD types deriving a common class, so nothing except memory needs to be cleaned up,
the resulting program will run on a modern OS (or in this case, that collects memory from dead processes),
I am very tempted to just use raw pointers, never free the memory taken by those objects and let the OS do its cleanup after the process exits.
What would be the best course of action?

If you're writing reusable code, you need to at least provide the option of cleaning up. What if some program uses your library for one operation, and then continues running? It's not safe to assume that the process exits immediately after your library's task is complete.

The other answers cover the general and standard approach: in an ideal world, yes, you'd clean up your memory, because it makes the code more generic and more reusable and helps with tooling. As others have said, std::unique_ptr for owning pointers and raw pointers for non-owning pointers should work well.
There are a couple of more specialized approaches that may or may not be useful:
Use a pool allocator (such as Boost.Pool, or roll your own) to allocate a bunch of memory up front then dole out pieces of it for your objects. You can then free every object at once by deleting the pool.
Intentionally not freeing memory is occasionally a valid technique. See, e.g., "Increasing Compiler Performance by Over 75%", by Walter Bright. Of course, a compiler is a specialized problem domain, and Walter Bright is probably one of the top compiler developers alive, so techniques that work for his problem domain shouldn't be blindly applied elsewhere.

the resulting program will run on a modern OS (or in this case, that collects memory from dead processes)
I am very tempted to just use raw pointers, never free the memory taken by those objects and let the OS do its cleanup after the process exits.
If you take this approach, then anyone who uses your library and then uses valgrind to try to detect memory leaks in their program will report massive leaks coming from your library and complain to you about it, so if I were you I definitely would not do this.

If you are writing a library then you should provide a cleanup function that frees all memory that you allocated.
A practical example of why this is useful is if a Windows DLL uses your library. When the library is loaded, static data is initialized. When the library is unloaded, static data is cleared. If your library has some global pointers to memory that is never freed, then load-unload cycles of the DLL will leak memory.

If the objects are all of the same type, then rather than allocating each one independently, you could just put them all into a vector and have them refer to each other by index number instead of using pointers. The vector's built-in memory management takes care of allocating space as needed, and when you're done with the objects, you can just destroy the vector to deallocate them all at once. (Note that vector::clear() doesn't actually free the memory, though it does make it available to store a new set of objects in the vector.)
If your objects aren't all the same type, you'll want to look into the more general concept of region-based memory management. As above, the idea is that you can allocate all your objects in a relatively small number of memory chunks (possibly just one), which can be freed later without having to track all the
individual objects allocated within.

If your ownership and lifetimes are clear I suggest you use unique_ptr for the owning pointers and raw pointers for the non-owning pointers. It should be less complex than shared_ptr and weak_ptr whilst still managing memory automatically.
I don't think not managing memory at all is an option. But I think using smart pointers to express ownership is not just about good memory management it also makes code easier to reason about.

Try to think of future maintenance work. Suppose your code needs to be broken up or other stuff done after it. In this case you're opening yourself up to leaks or being a resource hog later down the line.

Cleaning up (or being able to do s) is good. It may seem obvious now that an application should work with a single structured binary dataset throughout its entire lifetime, but you'll start kicking yourself once you realize you need to write an application that needs to reset half-way through and start over with another dataset.
(a related thing that's easy to overlook is that an application may need to work with two completely independent datasets at the same time, so try not to design your library to exclude that use case!)
That said, I think you may be focusing too much on the extremes. Code that shouldn't participate in memory management can use raw pointers, and this is reasonable when there is no risk of these pointers outliving your structured dataset in memory.
However, that doesn't mean that code that does participate in memory management needs to use raw pointers too. You can use smart pointers to manage your data structures even if you are passing raw pointers out to the user.
That aside, keep in mind that, in my experience, pointers are usually the wrong semantics — usually, most use cases are most natural with reference or value semantics, which means you should be passing around raw references, or passing around lightweight wrapper class that have reference or value semantics, but are implemented as containing a pointer to the actual data. Or even as a copy of the actual data if appropriate.

Pointer pointing to deleted stack memory

I believe what I have just experienced is called "undefined behavior", but I'm not quite sure. Basically, I had an instance declared in an outer scope that holds addresses of a class. In the inner level I instantiated an object on the stack and stored the address of that instance into the holder.
After the inner scope had escaped, I checked to see if I could still access methods and properties of the removed instance. To my surprise it worked without any problem.
Is there a simple way to combat this? Is there a way I can clear deleted pointers from the list?
example:
std::vector<int*> holder;
{
int inside = 12;
holder.push_back(&inside);
}
cout << "deleted variable:" << holder[0] << endl;

Is there a simple way to combat this?
Sure, there are a number of ways to avoid this sort of problem.
The easiest way would be to not use pointers at all -- pass objects by value instead. i.e. In your example code, you could use a std::vector<int> instead of a std::vector<int *>.
If your objects are not copy-able for some reason, or are large enough that you think it will be too expensive to make copies of them, you could allocate them on the heap instead, and manage their lifetimes automatically using shared_ptr or unique_ptr or some other smart-pointer class. (Note that passing objects by value is more efficient than you might think, even for larger objects, since it avoids having to deal with the heap, which can be expensive... and modern CPUs are most efficient when dealing with contiguous memory. Finally, modern C++ has various optimizations that allow the compiler to avoid actually doing a data copy in many circumstances)
In general, retaining pointers to stack objects is a bad idea unless you are 100% sure that the pointer's lifetime will be a subset of the lifetime of the stack object it points to. (and even then it's probably a bad idea, because the next programmer who takes over the code after you've moved on to your next job might not see this subtle hazard and is therefore likely to inadvertently introduce dangling-pointer bugs when making changes to the code)
After the inner scope had escaped, I checked to see if I could still
access methods and properties of the removed instance. To my surprise
it worked without any problem.
That can happen if the memory where the object was hasn't been overwritten by anything else yet -- but definitely don't rely on that behavior (or any other particular behavior) if/when you dereference an invalid pointer, unless you like spending a lot of quality time with your debugger chasing down random crashes and/or other odd behavior :)
Is there a way I can clear deleted pointers from the list?
In principle, you could add code to the objects' destructors that would go through the list and look for pointers to themselves and remove them. In practice, I think that is a poor approach, since it uses up CPU cycles trying to recover from an error that a better design would not have allowed to be made in the first place.
Btw this is off topic but it might interest you that the Rust programming language is designed to detect and prevent this sort of error by catching it at compile-time. Maybe someday C++ will get something similar.

There is no such thing as deleted pointer. Pointer is just a number, representing some address in your process virtual address space. Even if stack frame is long gone, memory, that was holding it is still available, since it was allocated when thread started, so technically speaking, it is still a valid pointer, valid in terms, that you could dereference it and get something. But since object it was pointing is already gone, valid term will be dangling pointer. Moral is that if you have pointer to the object in the stack frame, there is no way to determine is it valid or not, not even using functions like IsBadReadPtr (Win32 API just for example). The best way to prevent such situations is avoid returning and storing pointers to the stack objects.
However, if you wish to track your heap allocated memory and automatically deallocate it after it is no longer used, you could utilize smart pointers (std::shared_ptr, boost::shared_ptr, etc).

Manual Object Ownership vs Smart Pointers

Right now, object ownership/deletion in my C++ project is manually tracked (via comments mostly). Almost every heap allocated object is created using a factory of sorts
e.g.
auto b = a->createInstanceOfB(); //a owns b
auto c = b->createInstanceOfC(); //b owns c
//auto k = new K(); //not in the code
...
//b is no longer used..
a->destroyInstanceOfB(b); //destroyInstanceOf calls delete on it
What benefits, if any, will smart pointers provide in this sitution?

It's not the creation you should worry about, it's the deletion.
With smart pointers (the reference counting kind), objects can be commonly owned be several other objects, and when the last reference goes out of scope, the object is deleted automatically. This way, you won't have to manually delete anything anymore, you can only leak memory when you have circular dependencies, and your objects are never deleted from elsewhere behind your back.
The single-owner-only type (std::auto_ptr) also relieves you of your deleting duty, but it only allows one owner at a time (though ownership can be transferred). This is useful for objects that you pass around as pointers, but you still want them automatically cleaned up when they go out of scope (so that they work well in containers, and the stack unrolling in the case of an exception works as expected).
In any case, smart pointers make ownership explicit in your code, not only to you and your teammates, but also to the compiler - doing it wrong is likely to produce either a compiler error, or a runtime error that is relatively easy to catch with defensive coding. In manually memory-managed code, it is easy to get the ownership situation wrong somewhere (due to misreading comments, or assuming things the wrong way), and the resulting bug is typically hard to track down - you'll leak memory, overwrite stuff that's not yours, the program crashes at random, etc.; these all have in common that the situation where the bug occurs is unrelated to the offending code section.

Smart pointers enforce ownership semantics- that is, it's guaranteed that the object will be freed correctly even in the case of exceptions. You should always use them purely because of the safety, even if they express only very simple semantics such as std::unique_ptr. Moreover, a pointer that enforces the semantics reduces the need to document it, and less documentation means less documentation to be out of date or incorrect- especially where multiple parts of the same program express the same semantics.
Ultimately, smart pointers reduce many sources of error and there's little reason not to use them.

If an object is only owned by one other object, and dies with it, fine. Still need to make sure there's no dangling references, but this is not the hard case.
The hard case, is where you share ownership. In that case, you will want to have smart-ptrs (or something) to automatically figure out when to actually delete an object.
Note that shared ownership is not necessary everywhere, and avoiding it will likely simplify things down the road when your product goes bloaty. :)

How to guard against memory leaks?

I was recently interviewing for a C++ position, and I was asked how I guard against creating memory leaks. I know I didn't give a satisfactory answer to that question, so I'm throwing it to you guys. What are the best ways to guard against memory leaks?
Thanks!

What all the answers given so far boil down to is this: avoid having to call delete.
Any time the programmer has to call delete, you have a potential memory leak.
Instead, make the delete call happen automatically. C++ guarantees that local objects have their destructors called when they go out of scope. Use that guarantee to ensure your memory allocations are automatically deleted.
At its most general, this technique means that every memory allocation should be wrapped inside a simple class, whose constructor allocates the necessary memory, and destructor releases it.
Because this is such a commonly-used and widely applicable technique, smart pointer classes have been created that reduce the amount of boilerplate code. Rather than allocating memory, their constructors take a pointer to the memory allocation already made, and stores that. When the smart pointer goes out of scope, it is able to delete the allocation.
Of course, depending on usage, different semantics may be called for. Do you just need the simple case, where the allocation should last exactly as long as the wrapper class lives? Then use boost::scoped_ptr or, if you can't use boost, std::auto_ptr. Do you have an unknown number of objects referencing the allocation with no knowledge of how long each of them will live? Then the reference-counted boost::shared_ptr is a good solution.
But you don't have to use smart pointers. The standard library containers do the trick too. They internally allocate the memory required to store copies of the objects you put into them, and they release the memory again when they're deleted. So the user doesn't have to call either new or delete.
There are countless variations of this technique, changing whose responsibility it is to create the initial memory allocation, or when the deallocation should be performed.
But what they all have in common is the answer to your question: The RAII idiom: Resource Acquisition Is Initialization. Memory allocations are a kind of resource. Resources should be acquired when an object is initialized, and released by the object itslef, when it is destroyed.
Make the C++ scope and lifetime rules do your work for you. Never ever call delete outside of a RAII object, whether it is a container class, a smart pointer or some ad-hoc wrapper for a single allocation. Let the object handle the resource assigned to it.
If all delete calls happen automatically, there's no way you can forget them. And then there's no way you can leak memory.

Don't allocate memory on the heap if you don't need to. Most work can be done on the stack, so you should only do heap memory allocations when you absolutely need to.
If you need a heap-allocated object that is owned by a single other object then use std::auto_ptr.
Use standard containers, or containers from Boost instead of inventing your own.
If you have an object that is referred to by several other objects and is owned by no single one in particular then use either std::tr1::shared_ptr or std::tr1::weak_ptr -- whichever suits your use case.
If none of these things match your use case then maybe use delete. If you do end up having to manually manage memory then just use memory leak detection tools to make sure that you aren't leaking anything (and of course, just be careful). You shouldn't ever really get to this point though.

You'd do well to read up on RAII.

replace new with shared_ptr's. Basically RAII. make code exception safe. Use the stl everywhere possible. If you use reference counting pointers make sure that they don't form cycles. SCOPED_EXIT from boost is also very useful.

(Easy) Never ever let a raw pointer own a object (search your code for the regexp "\= *new". Use shared_ptr or scoped_ptr instead, or even better, use real variables instead of pointers as often as you can.
(Hard) Make sure you don't have any circular references, with shared_ptrs pointing to each other, use weak_ptr to break them.
Done!

Use all kind of smart pointers.
Use certain strategy for creation and deletion of objects, like who creates that is responsible for delete.

make sure that you understand exactly how an object will be deleted everytime you create one
make sure you understand who owns the pointer every time one is returned to you
make sure your error paths dispose of objects you have created appropriately
be paranoid about the above

In addition to the advice about RAII, remember to make your base class destructor virtual if there are any virtual functions.

To avoid memory leaks, what you must do is to have a clear and definite notion of who is responsible for deleting any dynamically allocated object.
C++ allows construction of objects on the stack (i.e. as kind-of local variables). This binds creation and destruction the the control flow: an objects is created when program execution reaches its declaration, and the object is destroyed when execution escapes the block in which that declaration was made. Whenever allocation need matches that pattern, then use it. This will save you much of the trouble.
For other usages, if you can define and document a clear notion of responsibility, then this may work fine. For instance, you have a method or a function which returns a pointer to a newly allocated object, and you document that the caller becomes responsible for ultimately deleting that instance. Clear documentation coupled with good programmer discipline (something which is not easily achieved !) can solve many remaining problems of memory management.
In some situations, including undisciplined programmers and complex data structures, you may have to resort to more advanced techniques, such as reference counting. Each object is awarded a "counter" which is the number of other variables which point to it. Whenever a piece of code decides to no longer point to the object, the counter is decreased. When the counter reaches zero, the object is deleted. Reference counting requires strict counter handling. This can be done with so-called "smart pointers": these are object which are functionally pointers, but which automatically adjust the counter upon their own creation and destruction.
Reference counting works quite good in many situations, but they cannot handle cyclic structures. So for the most complex situations, you have to resort to the heavy artillery, i.e. a garbage collector. The one I link to is the GC for C and C++ written by Hans Boehm, and it has been used in some rather big projects (e.g. Inkscape). The point of a garbage collector is to maintain a global view on the complete memory space, to know whether a given instance is still in use or not. This is the right tool when local-view tools, such as reference counting, are not enough. One could argue that, at that point, one should ask oneself whether C++ is the right language for the problem at hand. Garbage collection works best when the language is cooperative (this unlocks a host of optimizations which are not doable when the compiler is unaware of what happens with memory, as a typical C or C++ compiler).
Note that none of the techniques described above allows the programmer to stop thinking. Even a GC can suffer from memory leaks, because it uses reachability as an approximation of future usage (there are theoretical reasons which imply that it is not possible, in full generality, to accurately detect all objects which will not be used thereafter). You may still have to set some fields to NULL to inform the GC that you will no longer access an object through a given variable.

I start by reading the following: https://stackoverflow.com/search?q=%5Bc%2B%2B%5D+memory+leak

A very good way is using Smart Pointers, the boost/tr1::shared_ptr. The memory will be free'd, once the (stack allocated) smart pointer goes out of scope.

You can use the utility.
If you work on Linux - use valgrid (it's free).
Use deleaker on Windows.

Smart pointers.
Memory management.
Override 'new' and 'delete' or use your own macros/templates.

On x86 you can regularly use Valgrind to check your code

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js