Manual Object Ownership vs Smart Pointers

Manual Object Ownership vs Smart Pointers - c++

Right now, object ownership/deletion in my C++ project is manually tracked (via comments mostly). Almost every heap allocated object is created using a factory of sorts
e.g.
auto b = a->createInstanceOfB(); //a owns b
auto c = b->createInstanceOfC(); //b owns c
//auto k = new K(); //not in the code
...
//b is no longer used..
a->destroyInstanceOfB(b); //destroyInstanceOf calls delete on it
What benefits, if any, will smart pointers provide in this sitution?

It's not the creation you should worry about, it's the deletion.
With smart pointers (the reference counting kind), objects can be commonly owned be several other objects, and when the last reference goes out of scope, the object is deleted automatically. This way, you won't have to manually delete anything anymore, you can only leak memory when you have circular dependencies, and your objects are never deleted from elsewhere behind your back.
The single-owner-only type (std::auto_ptr) also relieves you of your deleting duty, but it only allows one owner at a time (though ownership can be transferred). This is useful for objects that you pass around as pointers, but you still want them automatically cleaned up when they go out of scope (so that they work well in containers, and the stack unrolling in the case of an exception works as expected).
In any case, smart pointers make ownership explicit in your code, not only to you and your teammates, but also to the compiler - doing it wrong is likely to produce either a compiler error, or a runtime error that is relatively easy to catch with defensive coding. In manually memory-managed code, it is easy to get the ownership situation wrong somewhere (due to misreading comments, or assuming things the wrong way), and the resulting bug is typically hard to track down - you'll leak memory, overwrite stuff that's not yours, the program crashes at random, etc.; these all have in common that the situation where the bug occurs is unrelated to the offending code section.

Smart pointers enforce ownership semantics- that is, it's guaranteed that the object will be freed correctly even in the case of exceptions. You should always use them purely because of the safety, even if they express only very simple semantics such as std::unique_ptr. Moreover, a pointer that enforces the semantics reduces the need to document it, and less documentation means less documentation to be out of date or incorrect- especially where multiple parts of the same program express the same semantics.
Ultimately, smart pointers reduce many sources of error and there's little reason not to use them.

If an object is only owned by one other object, and dies with it, fine. Still need to make sure there's no dangling references, but this is not the hard case.
The hard case, is where you share ownership. In that case, you will want to have smart-ptrs (or something) to automatically figure out when to actually delete an object.
Note that shared ownership is not necessary everywhere, and avoiding it will likely simplify things down the road when your product goes bloaty. :)

Related

Reliably Ensure Memory Safety in C++ 14

I'm converting some old C++ code to use shared_ptr, unique_ptr and weak_ptr, and I keep running into design problems.
I have "generator" methods that return new objects, and accessor methods that return pointers to existing objects. At first glance the solution seems simple; return shared_ptr for new objects, and weak_ptr for accessors.
shared_ptr completely avoids dangling pointers, since if the object is ever deleted all of its shared and weak pointers know about it. But I keep running into cases where I'm not sure if there are cyclic references among my shared pointers. There are many classes and some of them point at each other; is it possible that at some point a cycle formed? The code is sufficiently complex that it's hard to tell - new classes are being created from instructions in a script file. So I don't know if shared_ptr is actually preventing memory leaks and have been manually deleting all objects, which seems to defeat the point.
I considered using unique_ptr instead, since I don't actually need shared ownership anywhere. (The old C++ code certainly didn't have any shared ownership, it's raw pointers only.) But I can't make weak_ptrs from a unique_ptr, so I have to use raw pointers as stand-ins for weak pointers. This solves the memory leak problem, but I can be left with dangling pointers when the unique_ptr is destroyed.
So it seems that I can have one or the other: bulletproof memory leak prevention or bulletproof dangling pointer prevention, but not both.
People have told me I need to keep the entire program structure in my head so I can verify there are no shared pointer cycles, but that seems error prone. My head is, after all, only so big. Is there a way to achieve memory safety while only needing to consider local code?
To me, that is the central tenement of OO programming, and it seems I have lost it in this case.

A strategy that may work for you is to ensure that all the shared pointers in all of your managed objects are const.
Since a const shared_ptr field can only be assigned when it is constructed, this ensures that the objects can only hold shared pointers to objects that were created before they were. (OK, there are ways around that, but you're not going to do it by mistake)
Since "created before" is a total ordering, that ensures that the graph of shared pointers is acyclic.

Pointer pointing to deleted stack memory

I believe what I have just experienced is called "undefined behavior", but I'm not quite sure. Basically, I had an instance declared in an outer scope that holds addresses of a class. In the inner level I instantiated an object on the stack and stored the address of that instance into the holder.
After the inner scope had escaped, I checked to see if I could still access methods and properties of the removed instance. To my surprise it worked without any problem.
Is there a simple way to combat this? Is there a way I can clear deleted pointers from the list?
example:
std::vector<int*> holder;
{
int inside = 12;
holder.push_back(&inside);
}
cout << "deleted variable:" << holder[0] << endl;

Is there a simple way to combat this?
Sure, there are a number of ways to avoid this sort of problem.
The easiest way would be to not use pointers at all -- pass objects by value instead. i.e. In your example code, you could use a std::vector<int> instead of a std::vector<int *>.
If your objects are not copy-able for some reason, or are large enough that you think it will be too expensive to make copies of them, you could allocate them on the heap instead, and manage their lifetimes automatically using shared_ptr or unique_ptr or some other smart-pointer class. (Note that passing objects by value is more efficient than you might think, even for larger objects, since it avoids having to deal with the heap, which can be expensive... and modern CPUs are most efficient when dealing with contiguous memory. Finally, modern C++ has various optimizations that allow the compiler to avoid actually doing a data copy in many circumstances)
In general, retaining pointers to stack objects is a bad idea unless you are 100% sure that the pointer's lifetime will be a subset of the lifetime of the stack object it points to. (and even then it's probably a bad idea, because the next programmer who takes over the code after you've moved on to your next job might not see this subtle hazard and is therefore likely to inadvertently introduce dangling-pointer bugs when making changes to the code)
After the inner scope had escaped, I checked to see if I could still
access methods and properties of the removed instance. To my surprise
it worked without any problem.
That can happen if the memory where the object was hasn't been overwritten by anything else yet -- but definitely don't rely on that behavior (or any other particular behavior) if/when you dereference an invalid pointer, unless you like spending a lot of quality time with your debugger chasing down random crashes and/or other odd behavior :)
Is there a way I can clear deleted pointers from the list?
In principle, you could add code to the objects' destructors that would go through the list and look for pointers to themselves and remove them. In practice, I think that is a poor approach, since it uses up CPU cycles trying to recover from an error that a better design would not have allowed to be made in the first place.
Btw this is off topic but it might interest you that the Rust programming language is designed to detect and prevent this sort of error by catching it at compile-time. Maybe someday C++ will get something similar.

There is no such thing as deleted pointer. Pointer is just a number, representing some address in your process virtual address space. Even if stack frame is long gone, memory, that was holding it is still available, since it was allocated when thread started, so technically speaking, it is still a valid pointer, valid in terms, that you could dereference it and get something. But since object it was pointing is already gone, valid term will be dangling pointer. Moral is that if you have pointer to the object in the stack frame, there is no way to determine is it valid or not, not even using functions like IsBadReadPtr (Win32 API just for example). The best way to prevent such situations is avoid returning and storing pointers to the stack objects.
However, if you wish to track your heap allocated memory and automatically deallocate it after it is no longer used, you could utilize smart pointers (std::shared_ptr, boost::shared_ptr, etc).

Refactoring code to use Boost shared pointers

I wrote a project using normal pointers and now I'm fed up with manual memory management.
What are the issues that one could anticipate during refactoring?
Until now, I already spent an hour replacing X* with shared_ptr<X> for types I want to automatically manage memory. Then I changed dynamic_cast to dynamic_pointer_cast. I still see many more errors (comparing with NULL, passing this to a function).
I know the question is a bit vague and subjective, but I think I can benefit from experience of someone who has already done this.
Are there some pitfalls?

Although it's easy to just use boost::shared_pointer everywhere, you should use the correct smart pointer as per ownership semantics.
In most cases, you will want to use std::unique_ptr by default, unless the ownership is shared among multiple object instances.
If you run into cyclical ownership problems, you can break up the cycles with boost::weak_ptr.
Also keep in mind that while passing shared_ptr's around, you should always pass them by const reference for performance reasons (avoids an atomic increment) unless you really want to confer ownership to a different entity.

Are there some pitfalls?
Yes, by murphy's law if you blindly replace every pointer with shared_ptr, it'll turn out that isn't what you wanted, and you'll spend next 6 months hunting bugs you introduced.
What are the issues that one could anticipate during refactoring?
Inefficient memory management, unused resources being stored longer than necessary, memory leaks (circular references), invalid reference counting (same pointer assigned to multiple different shared_pointers).
Do NOT blindly replace everything with shared_ptr. Carefully investigate program structure and make sure that shread_ptr is NEEDED and it represents EXACTLY what you want.
Also, make sure you use version control that supports easy branching (git or mercurial), so when you break something you can revert to previous state or run something similar to "git bisect" to locate problem.
obviously you need to replace X* with shared_ptr
Wrong. It depends on context. If you have a pointer that points into the middle of some array (say, pixel data manipulation), then you won't be able to replace it with shared_ptr (and you won't need to). You need to use shared_ptr only when you need to ensure automatic deallocation of object. Automatic deallocation of object isn't always what you want.

If you wish to stick with boost, you should consider if you want a boost::shared_ptr or a boost::scoped_ptr. A shared_ptr is a resource to be shared between classes, whereas a scoped_ptr sounds more like what you may want (at least in some places). A scoped_ptr will automatically delete the memory when it goes out of scope.
Be wary when passing a shared_ptr to a function. The general rule with shared_ptr is to pass by value so a copy is created. If you pass it by reference then the pointer's reference count will not be incremented. In this case, you might end up deleting a piece of memory that you wanted kept alive.
There is a case, however, when you might want to pass a shared_ptr by reference. That is, if you want the memory to be allocated inside a different function. In this case, just make sure that the caller still holds the pointer for the lifetime of the function it is calling.
void allocPtr( boost::shared_ptr< int >& ptrByRef )
{
ptrByRef.reset( new int );
*ptrByRef = 3;
}
int main()
{
boost::shared_ptr< int >& myPointer;
// I want a function to alloc the memory for this pointer.
allocPtr( myPointer ); // I must be careful that I still hold the pointer
// when the function terminates
std::cout << *ptrByRef << std::endl;
}

I'm listing the steps/issues involved. They worked for me, but I can't vouch that they are 100% correct
0) check if there could be cyclic shared pointers. If so, can this lead to memory leak? I my case, luckily, cycles need not be broken because if I had a cycle, the objects in the cycle are useful and should not be destroyed. use weak pointers to break cycles
1) you need to replace "most" X* with shared_ptr<X> . A shared_ptr is (only?) created immediately after every dynamic allocation of X . At all other times, it is copy constructed , or constructed with an empty pointer(to signal NULL) . To be safe (but a bit inefficient), pass these shared_ptrs only by reference . Anyways, it's likely that you never passed your pointers by reference to begin with => no additional change is required
2) you might have used dynamic_cast<X*>(y) at some places. replace that with
dynamic_pointer_cast<X>(y)
3) wherever you passed NULL(eg. to signal that a computation failed), pass an empty shared pointer.
4) remove all delete statements for the concerned types
5) make your base class B inherit from enable_shared_from_this<B>. Then wherever you passed this , pass, shared_from_this() . You might have to do static casting if the function expected a derived type . keep in mind that when you call shared_from_this(), some shared_ptr must already be owning this . In particular, don't call shared_from_this() in constructor of the class
I'm sure one could semi-automate this process to get a semantically equivalent but not necessarily very-efficient code. The programmer probably only needs to reason about cyclic reference(if any).
I used regexes a lot in many of these steps. It took about 3-4 hours. The code compiles and has executed correctly so far.

There is a tool that tries to automatically convert to smart pointers. I've never tried it. Here is a quote from the abstract of the following paper:
http://www.cs.rutgers.edu/~santosh.nagarakatte/papers/ironclad-oopsla2013.pdf
To enforce safety properties that are difficult to check statically,
Ironclad C++ applies dynamic checks via templated “smart
pointer” classes.
Using a semi-automatic refactoring tool, we have ported
nearly 50K lines of code to Ironclad C++

How to guard against memory leaks?

I was recently interviewing for a C++ position, and I was asked how I guard against creating memory leaks. I know I didn't give a satisfactory answer to that question, so I'm throwing it to you guys. What are the best ways to guard against memory leaks?
Thanks!

What all the answers given so far boil down to is this: avoid having to call delete.
Any time the programmer has to call delete, you have a potential memory leak.
Instead, make the delete call happen automatically. C++ guarantees that local objects have their destructors called when they go out of scope. Use that guarantee to ensure your memory allocations are automatically deleted.
At its most general, this technique means that every memory allocation should be wrapped inside a simple class, whose constructor allocates the necessary memory, and destructor releases it.
Because this is such a commonly-used and widely applicable technique, smart pointer classes have been created that reduce the amount of boilerplate code. Rather than allocating memory, their constructors take a pointer to the memory allocation already made, and stores that. When the smart pointer goes out of scope, it is able to delete the allocation.
Of course, depending on usage, different semantics may be called for. Do you just need the simple case, where the allocation should last exactly as long as the wrapper class lives? Then use boost::scoped_ptr or, if you can't use boost, std::auto_ptr. Do you have an unknown number of objects referencing the allocation with no knowledge of how long each of them will live? Then the reference-counted boost::shared_ptr is a good solution.
But you don't have to use smart pointers. The standard library containers do the trick too. They internally allocate the memory required to store copies of the objects you put into them, and they release the memory again when they're deleted. So the user doesn't have to call either new or delete.
There are countless variations of this technique, changing whose responsibility it is to create the initial memory allocation, or when the deallocation should be performed.
But what they all have in common is the answer to your question: The RAII idiom: Resource Acquisition Is Initialization. Memory allocations are a kind of resource. Resources should be acquired when an object is initialized, and released by the object itslef, when it is destroyed.
Make the C++ scope and lifetime rules do your work for you. Never ever call delete outside of a RAII object, whether it is a container class, a smart pointer or some ad-hoc wrapper for a single allocation. Let the object handle the resource assigned to it.
If all delete calls happen automatically, there's no way you can forget them. And then there's no way you can leak memory.

Don't allocate memory on the heap if you don't need to. Most work can be done on the stack, so you should only do heap memory allocations when you absolutely need to.
If you need a heap-allocated object that is owned by a single other object then use std::auto_ptr.
Use standard containers, or containers from Boost instead of inventing your own.
If you have an object that is referred to by several other objects and is owned by no single one in particular then use either std::tr1::shared_ptr or std::tr1::weak_ptr -- whichever suits your use case.
If none of these things match your use case then maybe use delete. If you do end up having to manually manage memory then just use memory leak detection tools to make sure that you aren't leaking anything (and of course, just be careful). You shouldn't ever really get to this point though.

You'd do well to read up on RAII.

replace new with shared_ptr's. Basically RAII. make code exception safe. Use the stl everywhere possible. If you use reference counting pointers make sure that they don't form cycles. SCOPED_EXIT from boost is also very useful.

(Easy) Never ever let a raw pointer own a object (search your code for the regexp "\= *new". Use shared_ptr or scoped_ptr instead, or even better, use real variables instead of pointers as often as you can.
(Hard) Make sure you don't have any circular references, with shared_ptrs pointing to each other, use weak_ptr to break them.
Done!

Use all kind of smart pointers.
Use certain strategy for creation and deletion of objects, like who creates that is responsible for delete.

make sure that you understand exactly how an object will be deleted everytime you create one
make sure you understand who owns the pointer every time one is returned to you
make sure your error paths dispose of objects you have created appropriately
be paranoid about the above

In addition to the advice about RAII, remember to make your base class destructor virtual if there are any virtual functions.

To avoid memory leaks, what you must do is to have a clear and definite notion of who is responsible for deleting any dynamically allocated object.
C++ allows construction of objects on the stack (i.e. as kind-of local variables). This binds creation and destruction the the control flow: an objects is created when program execution reaches its declaration, and the object is destroyed when execution escapes the block in which that declaration was made. Whenever allocation need matches that pattern, then use it. This will save you much of the trouble.
For other usages, if you can define and document a clear notion of responsibility, then this may work fine. For instance, you have a method or a function which returns a pointer to a newly allocated object, and you document that the caller becomes responsible for ultimately deleting that instance. Clear documentation coupled with good programmer discipline (something which is not easily achieved !) can solve many remaining problems of memory management.
In some situations, including undisciplined programmers and complex data structures, you may have to resort to more advanced techniques, such as reference counting. Each object is awarded a "counter" which is the number of other variables which point to it. Whenever a piece of code decides to no longer point to the object, the counter is decreased. When the counter reaches zero, the object is deleted. Reference counting requires strict counter handling. This can be done with so-called "smart pointers": these are object which are functionally pointers, but which automatically adjust the counter upon their own creation and destruction.
Reference counting works quite good in many situations, but they cannot handle cyclic structures. So for the most complex situations, you have to resort to the heavy artillery, i.e. a garbage collector. The one I link to is the GC for C and C++ written by Hans Boehm, and it has been used in some rather big projects (e.g. Inkscape). The point of a garbage collector is to maintain a global view on the complete memory space, to know whether a given instance is still in use or not. This is the right tool when local-view tools, such as reference counting, are not enough. One could argue that, at that point, one should ask oneself whether C++ is the right language for the problem at hand. Garbage collection works best when the language is cooperative (this unlocks a host of optimizations which are not doable when the compiler is unaware of what happens with memory, as a typical C or C++ compiler).
Note that none of the techniques described above allows the programmer to stop thinking. Even a GC can suffer from memory leaks, because it uses reachability as an approximation of future usage (there are theoretical reasons which imply that it is not possible, in full generality, to accurately detect all objects which will not be used thereafter). You may still have to set some fields to NULL to inform the GC that you will no longer access an object through a given variable.

I start by reading the following: https://stackoverflow.com/search?q=%5Bc%2B%2B%5D+memory+leak

A very good way is using Smart Pointers, the boost/tr1::shared_ptr. The memory will be free'd, once the (stack allocated) smart pointer goes out of scope.

You can use the utility.
If you work on Linux - use valgrid (it's free).
Use deleaker on Windows.

Smart pointers.
Memory management.
Override 'new' and 'delete' or use your own macros/templates.

On x86 you can regularly use Valgrind to check your code

How do I know who holds the shared_ptr<>?

I use boost::shared_ptr in my application in C++. The memory problem is really serious, and the application takes large amount of memory.
However, because I put every newed object into a shared_ptr, when the application exits, no memory leaking can be detected.
There must be something like std::vector<shared_ptr<> > pool holding the resource. How can I know who holds the shared_ptr, when debugging?
It is hard to review code line by line. Too much code...

You can't know, by only looking at a shared_ptr, where the "sibling pointers" are. You can test if one is unique() or get the use_count(), among other methods.

The popular widespread use of shared_ptr will almost inevitably cause unwanted and unseen memory occupation.
Cyclic references are a well known cause and some of them can be indirect and difficult to spot especially in complex code that is worked on by more that one programmer; a programmer may decide than one object needs a reference to another as a quick fix and doesn't have time to examine all the code to see if he is closing a cycle. This hazard is hugely underestimated.
Less well understood is the problem of unreleased references. If an object is shared out to many shared_ptrs then it will not be destroyed until every one of them is zeroed or goes out of scope. It is very easy to overlook one of these references and end up with objects lurking unseen in memory that you thought you had finished with.
Although strictly speaking these are not memory leaks (it will all be released before the program exits) they are just as harmful and harder to detect.
These problems are the consequences of expedient false declarations:
Declaring what you really want to be single ownership as shared_ptr. scoped_ptr would be correct but then any other reference to that object will have to be a raw pointer, which could be left dangling.
Declaring what you really want to be a passive observing reference as shared_ptr. weak_ptr would be correct but then you have the hassle of converting it to share_ptr every time you want to use it.
I suspect that your project is a fine example of the kind of trouble that this practice can get you into.
If you have a memory intensive application you really need single ownership so that your design can explicitly control object lifetimes.
With single ownership opObject=NULL; will definitely delete the object and it will do it now.
With shared ownership spObject=NULL; ........who knows?......

One solution to dangling or circular smart pointer references we've done is customize the smart pointer class to add a debug-only bookkeeping function. Whenever a smartpointer adds a reference to an object, it takes a stack trace and puts it in a map whose each entry keeps track of
The address of the object being allocated (what the pointer points to)
The addresses of each smartpointer object holding a reference to the object
The corresponding stacktraces of when each smartpointer was constructed
When a smartpointer goes out of scope, its entry in the map gets deleted. When the last smartpointer to an object gets destroyed, the pointee object gets its entry in the map removed.
Then we have a "track leaks" command with two functions: '[re]start leak tracking' (which clears the whole map and enabled tracking if its not already), and 'print open references', which shows all outstanding smartpointer references created since the 'start leak tracking' command was issued. Since you can see the stack traces of where those smart pointers came into being, you can easily know exactly who's keeping from your object being freed. It slows things down when its on, so we don't leave it on all the time.
It's a fair amount of work to implement, but definitely worth it if you've got a codebase where this happens a lot.

You may be experiencing a shared pointer memory leak via cycles. What happens is your shared objects may hold references to other shared objects which eventually lead back to the original. When this happens the cycle keeps all reference counts at 1 even though no one else can access the objects. The solution is weak pointers.

Try refactoring some of your code so that ownership is more explicitly expressed by the use of weak pointers instead of shared pointers in some places.
When looking at your class hierarchy it's possible to determine which class really should hold a shared pointer and which merely needs only the weak one, so you can avoid cycles if there are any and if the "real" owner object is destructed, "non-owner" objects should have already been gone. If it turns out that some objects lose pointers too early, you have to look into object destruction sequence in your app and fix it.

You're obviously holding onto references to your objects within your application. This means that you are, on purpose, keeping things in memory. That means, you don't have a memory leak. A memory leak is when memory is allocated, and then you do not keep a reference to its address.
Basically, you need to look at your design and figure out why you are keeping so many objects and data in memory, and how can you minimize it.
The one possibility that you have a pseudo-memory leak is that you are creating more objects than you think you are. Try putting breakpoints on all statements containing a 'new'. See if your application is constructing more objects than you thought it should, and then read through that code.
The problem is really not so much a memory-leak as it is an issue of your application's design.

I was going to suggest using UMDH if you are on windows. It is a very powerful tool. Use it find allocations per transaction/time-period that you expect to be freed then find who is holding them.
There is more information on this SO answer
Find memory leaks caused by smart pointers

It is not possible to tell what objects own shared_ptr from within the program. If you are on Linux, one sure way to debug the memory leaks is the Valgrind tool -- while it won't directly answer your question, it will tell where the memory was allocated, which is usually sufficient for fixing the problem. I imagine Windows has comparable tools, but I am do not know which one is best.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js