Are data structures an appropriate place for shared_ptr? - c++

I'm in the process of implementing a binary tree in C++. Traditionally, I'd have a pointer to left and a pointer to right, but manual memory management typically ends in tears. Which leads me to my question...
Are data structures an appropriate place to use shared_ptr?

I think it depends on where you'd be using them. I'm assuming that what you're thinking of doing is something like this:
template <class T>
class BinaryTreeNode
{
//public interface ignored for this example
private:
shared_ptr<BinaryTreeNode<T> > left;
shared_ptr<BinaryTreeNode<T> > right;
T data;
}
This would make perfect sense if you're expecting your data structure to handle dynamically created nodes. However, since that's not the normal design, I think it's inappropriate.
My answer would be that no, it's not an appropriate place to use shared_ptr, as the use of shared_ptr implies that the object is actually shared - however, a node in a binary tree is not ever shared. However, as Martin York pointed out, why reinvent the wheel - there's already a smart pointer type that does what we're trying to do - auto_ptr. So go with something like this:
template <class T>
class BinaryTreeNode
{
//public interface ignored for this example
private:
auto_ptr<BinaryTreeNode<T> > left;
auto_ptr<BinaryTreeNode<T> > right;
T data;
}
If anyone asks why data isn't a shared_ptr, the answer is simple - if copies of the data are good for the client of the library, they pass in the data item, and the tree node makes a copy. If the client decides that copies are a bad idea, then the client code can pass in a shared_ptr, which the tree node can safely copy.

Because left and right are not shared boost::shared_ptr<> is probably not the correct smart pointer.
This would be a good place to try std::auto_ptr<>

Yes, absolutely.
But be careful if you have a circular data structure. If you have two objects, both with a shared ptr to each other, then they will never be freed without manually clearing the shared ptr. The weak ptr can be used in this case. This, of course, isn't a worry with a binary tree.

Writing memory management manually is not so difficult on those happy occasions where each object has a single owner, which can therefore delete what it owns in its destructor.
Given that a tree by definition consists of nodes which each have a single parent, and therefore an obvious candidate for their single owner, this is just such a happy occasion. Congratulations!
I think it would be well worth* developing such a solution in your case, AND also trying the shared_ptr approach, hiding the differences entirely behind an identical interface, so you switch between the two and compare the difference in performance with some realistic experiments. That's the only sure way to know whether shared_ptr is suitable for your application.
(* for us, if you tell us how it goes.)

Never use shared_ptr for the the nodes of a data structure. It can cause the destruction of the node to be suspended or delayed if at any point the ownership was shared. This can cause destructors to be called in the wrong sequence.
It is a good practice in data structures for the constructors of nodes to contain any code that couples with other nodes and the destructors to contain code that de-couples from other nodes. Destructors called in the wrong sequence can break this design.

There is a bit of extra overhead with a shared_ptr, notably in space requirements, but if your elements are individually allocated then shared_ptr would be perfect.

Do you even need pointers? It seems you could use boost::optional<BinaryTreeNode<T> > left, right.

Related

Is it bad practice to have a unique_ptr member that has a raw pointer accessor?

I have a class that has a unique_ptr member, and this class retains sole ownership of this object. However, external classes may require access to this object. In this case, should I just return a raw pointer? shared_ptr doesn't seem to be correct because that would imply that the accessing class now shares ownership of that memory, whereas I want to make it clear that the original class is the sole owner.
For example, perhaps I have a tree class that owns a root node. Another class may wish to explore the tree for some reason, and requires a pointer to the root node to do this. A partial implementation might look like:
class Tree
{
public:
Node* GetRoot()
{
return m_root.Get();
}
private:
std::unique_ptr<Node> m_root;
};
Is this bad practice? What would a better solution be?
A more normal implementation might be for the Tree to expose iterators or provide a visit mechanism to explore the tree, rather than exposing the implementation details of the Tree itself. Exposing the implementation details means that you can never change the tree's underlying structure without risk of breaking who-knows-how-many clients of that code.
If you absolutely insist there's a need for this, at least return the pointer as const, such as const Node* GetRoot() const because external clients should absolutely not be mutating the tree structure.
Scott Meyers, Effective C++ Item 15 says yes, although the primary reason is for interactivity with legacy code. If you are in a controlled environment - e.g. a startup where there is little legacy code and it's easy to extend a class as needed or in a homework assignment - there may be little need and it may only encourage using the class incorrectly. But the more reusable approach is to expose raw resources.
Keep in mind that when people say "avoid pointers" they really mean "use ownership semantics", and that pointers are not inherently bad.
Returning a non-owning pointer is an acceptable way of getting references that may also be null and isn't anymore dangerous than the alternatives. Now, in your case this would only make sense if your tree was meant to be used as a tree, rather than using a tree internally such as std::map does.
It's uncommon to do this though as usually you'll want the pointers and resources abstracted away.

C++ Linked List remove all

So this is a bit of a conceptual question. I'm writing a LinkedList in C++, and as Java is my first language, I start to write my removeAll function so that it just joins the head an the tail nodes (I'm using sentinel Nodes btw). But I instantly realize that this won't work in C++ because I have to free the memory for the Nodes!
Is there some way around iterating through the entire list, deleting every element manually?
You can make each node own the next one, i.e. be responsible for destroying it when it is destroyed itself. You can do this by using a smart pointer like std::unique_ptr:
struct node {
// blah blah
std::unique_ptr<node> next;
};
Then you can just destroy the first node and all the others will be accounted for: they will all be destroyed in a chain reaction of unique_ptr destructors.
If this is a doubly-linked list, you should not use unique_ptrs in both directions, however. That would make each node own the next one, and be owned by the next one! You should make this ownership relation exist only in one direction. In the other use regular non-owning pointers: node* previous;
However, this will not work as is for the sentinel node: it should not be destroyed. How to handle that depends on how the sentinel node is identified and other properties of the list.
If you can tell the sentinel node apart easily, like, for example, checking a boolean member, you can use a custom deleter that avoids deleting the sentinel:
struct delete_if_not_sentinel {
void operator()(node* ptr) const {
if(!ptr->is_sentinel) delete ptr;
}
};
typedef std::unique_ptr<node, delete_if_not_sentinel> node_handle;
struct node {
// blah blah
node_handle next;
};
This stops the chain reaction at the sentinel.
You could do it like Java if you used a c++ garbage collector. Not many do. In any case, it saves you at most a constant factor in running time, as you spend the cost to allocate each element in the list anyway.
Yes. Well, sort of... If you implement your list to use a memory pool then it is responsible for all data in that pool and the entire list can be deleted by deleting the memory pool (which may contain one or more large chunks of memory).
When you use memory pools, you generally have at least one of the following considerations:
limitations on how your objects are created and destroyed;
limitations on what kind of data you can store;
extra memory requirements on each node (to reference the pool);
a simple, intuitive pool versus a complex, confusing pool.
I am no expert on this. Generally when I've needed fast memory management it's been for memory that is populated once, with no need to maintain free-lists etc. Memory pools are much easier to design and implement when you have specific goals and design constraints. If you want some magic bullet that works for all situations, you're probably out of luck.

How do I hand out weak_ptrs to this in my constructor?

I'm creating a class that will be part of a DAG. The constructor will take pointers to other instances and use them to initialize a dependency list.
After the dependency list is initialized, it can only ever be shortened - the instance can never be added as a dependency of itself or any of its children.
::std::shared_ptr is a natural for handling this. Reference counts were made for handling DAGs.
Unfortunately, the dependencies need to know their dependents - when a dependency is updated, it needs to tell all of its dependents.
This creates a trivial cycle that can be broken with ::std::weak_ptr. The dependencies can just forget about dependents that go away.
But I cannot find a way for a dependent to create a ::std::weak_ptr to itself while it's being constructed.
This does not work:
object::object(shared_ptr<object> dependency)
{
weak_ptr<object> me = shared_from_this();
dependency->add_dependent(me);
dependencies_.push_back(dependency);
}
That code results in the destructor being called before the constructor exits.
Is there a good way to handle this problem? I'm perfectly happy with a C++11-only solution.
Instead of a constructor, use a function to build the nodes of your graph.
std::shared_ptr<Node> mk_node(std::vector<std::shared_ptr<Node>> const &dependencies)
{
std::shared_ptr<Node> np(new Node(dependencies));
for (size_t i=0; i<dependencies.size(); i++)
dependencies[i].add_dependent(np); // makes a weak_ptr copy of np
return np;
}
If you make this a static member function or a friend of your Node class, you can make the actual constructor private.
Basically, you can't. You need a shared_ptr or weak_ptr to make a weak_ptr and obviously self can only be aware of of its own shared_ptr only in form of weak_ptr (otherwise there's no point in counting references). And, of course, there could be no self-shared_ptr when the object isn't yet constructed.
You can't.
The best I've come up with is to make the constructor private and have a public factory function that returns a shared_ptr to a new object; the factory function can then call a private method on the object post-construction that initialises the weak_ptr.
I'm understanding your question as conceptually related to garbage collection issues.
GC is a non-modular feature: it deals with some global property of the program (more precisely, being a live data is a global, non-modular, property inside a program - there are situations where you cannot deal with that in a modular & compositional way.). AFAIK, STL or C++ standard libraries does not help much for global program features.
A possible answer might be to use (or implement yourself) a garbage collector; Boehm's GC could be useful to you, if you are able to use it in your entire program.
And you could also use GC algorithms (even copying generational ones) to deal with your issue.
Unfortunately, the dependencies need to know their dependents. This is because when a dependency is updated, it needs to tell all of its dependents. And there is a trivial cycle. Fortunately, this cycle can be broken with ::std::weak_ptr. The dependencies can just forget about dependents that go away.
It sounds like a dependency can't reference a dependent unless the dependent exists. (E.g. if the dependent is destroyed, the dependency is destroyed too -- that's what a DAG is after all)
If that's the case, you can just hand out plain pointers (in this case, this). You're never going to need to check inside the dependency if the dependent is alive, because if the dependent died then the dependency should have also died.
Just because the object is owned by a shared_ptr doesn't mean that all pointers to it themselves must be shared_ptrs or weak_ptrs - just that you have to define clear semantics as to when the pointers become invalidated.
It sounds to me like you're trying to conflate to somewhat different items: a single object (a node in the DAG), and managing a collection of those objects.
class DAG {
class node {
std::vector<std::weak_ptr<node> > dependents;
public:
node(std::vector<weak_ptr<node> > d) : dependents(d) {}
};
weak_ptr<node> root;
};
Now, it may be true that DAG will only ever hold a weak_ptr<node> rather than dealing with an instance of a node directly. To the node itself, however, this is more or less irrelevant. It needs to maintain whatever key/data/etc., it contains, along with its own list of dependents.
At the same time, by nesting it inside of DAG (especially if we make the class definition of node private to DAG), we can minimize access to node, so very little other code has to be concerned with anything about a node. Depending on the situation, you might also want to do things like deleting some (most?) of the functions in node that the compiler will generate by default (e.g., default ctor, copy ctor, assignment operator).
This has been driving me nuts as well.
I considered adopting the policy of using pointers to break cycles... But I'm really not found of this because I really like how clear the intent of the weak_ptr is when you see it in your code (you know it's there to break cycles).
Right now I'm leaning toward writing my own weak_ptr class.
Maybe this will help:
inherit from enable_shared_from_this which basically holds a weak_ptr.
This will allow you to use this->shared_from_this();
shared_ptr's know to look if the class inherits from the class and use the classes weak_ptr when pointing to the object (prevents 2 shared pointers from counting references differently)
More about it: cppreference

Dynamically allocated list in C++

I made a cute generic (i.e. template) List class to handle lists in C++. The reason for that is that I found the std::list class terribly ugly for everyday use and since I constantly use lists, I needed a new one. The major improvement is that with my class, I can use [] to get items from it. Also, still to be implemented is an IComparer system to sort things.
I'm using this List class in OBJLoader, my class that loads Wavefront .obj files and converts them to meshes. OBJLoader contains lists of pointers to the following "types": 3D positions, 3D normals, uv texture coordinates, vertices, faces and meshes. The vertices list has objects that must be linked to some objects in all of the 3D positions, 3D normals and uv texture coordinates lists. Faces link to vertices and meshes link to faces. So they are all inter-connected.
For the sake of simplicity, let's consider that, in some context, there are just two lists of pointers: List<Person*> and List<Place*>. Person class contains, among others, the field List<Place*> placesVisited and the Place class contains the field List<Person*> peopleThatVisited. So we have the structure:
class Person
{
...
public:
Place* placeVisited;
...
};
class Place
{
...
public:
List<People*> peopleThatVisited;
};
Now we have the following code:
Person* psn1 = new Person();
Person* psn2 = new Person();
Place* plc1 = new Place();
Place* plc2 = new Place();
Place* plc2 = new Place();
// make some links between them here:
psn1->placesVisited.Add(plc1, plc2);
psn2->placesVisited.Add(plc2, plc3);
// add the links to the places as well
plc1->peopleThatVisited.Add(psn1);
plc2->peopleThatVisited.Add(psn1, psn2);
plc3->peopleThatVisited.Add(plc3);
// to make things worse:
List<Person*> allThePeopleAvailable;
allThePeopleAvailable.Add(psn1);
allThePeopleAvailable.Add(psn2);
List<Place*> allThePlacesAvailable;
allThePlacesAvailable.Add(plc1);
allThePlacesAvailable.Add(plc2);
allThePlacesAvailable.Add(plc3);
All done. What happens when we reach }? All the dtors are called and the program crashes because it tries to delete things two or more times.
The dtor of my list looks like this:
~List(void)
{
cursor = begin;
cursorPos = 0;
while(cursorPos < capacity - 1)
{
cursor = cursor->next;
cursorPos++;
delete cursor->prev;
}
delete cursor;
}
where Elem is:
struct Elem
{
public:
Elem* prev;
T value;
Elem* next;
};
and T is the generic List type.
Which brings us back to the question: What ways are there to safely delete my List classes? The elements inside may or may not be pointers and, if they are pointers, I would like to be able, when I delete my List, to specify whether I want to delete the elements inside or just the Elem wrappers around them.
Smart pointers could be an answer, but that would mean that I can't have a List<bubuType*>, but just List<smart_pointer_to_bubuType>. This could be ok, but again: declaring a List<bubuType*> would cause no error or warning and in some cases the smart pointers would cause some problems in the implementation: for example, I might want to declare a List<PSTR> for some WinAPI returns. I think getting those PSTR inside smart pointers would be an ugly job. Thus, the solution I'm looking for I think should be somehow related to the deallocation system of the List template.
Any ideas?
Without even looking at your code, I say: scrap it!
C++ has a list class template that's about as efficient as it gets, well-known to all C++ programmers, and comes bug-free with your compiler.
Learn to use the STL.1 Coming from other OO languages, the STL might seem strange, but there's an underlying reason for its strangeness, an alien beauty combining abstraction and performance - something considered impossible before Stepanov came and thought up the STL.
Rest assured that you are not the only one struggling with understanding the STL. When it came upon us, we all struggled to grasp its concepts, to learn its particularities, to understand how it ticks. The STL is a strange beast, but then it manages to combine two goals everybody thought could never be combined, so it's allowed to seem unfamiliar at first.
I bet writing your own linked list class used to be the second most popular indoor sport of C++ programmers - right after writing your own string class. Those of us who had been programming C++ 15 years ago nowadays enjoy ripping out those bug-ridden, inefficient, strange, and unknown string, list, and dictionary classes rotting in old code, and replacing it with something that is very efficient, well-known, and bug-free. Starting your own list class (other than for educational purposes) has to be one of the worst heresies.
If you program in C++, get used to one of the mightiest tools in its box as soon as possible.
1Note that the term "STL" names that part of the C++ standard library that stems from Stepanov's library (plus things like std::string which got an STL interface attached as an afterthought), not the whole standard library.
The best answer is that you must think on the lifetime of each one of the objects, and the responsibility of managing that lifetime.
In particular, in a relation from people and the sites they have visited, most probably neither of them should be naturally made responsible for the lifetime of the others: people can live independently from the sites that they have visited, and places exists regardless of whether they have been visited. This seems to hint that the lifetime of both people and sites is unrelated to the others, and that the pointers held are not related to resource management, but are rather references (not in the C++ sense).
Once you know who is responsible for managing the resource, that should be the code that should delete (or better hold the resource in a container or suitable smart pointer if it needs to be dynamically allocated), and you must ensure that the deletion does not happen before the other objects that refer to the same elements finish with them.
If at the end of the day, in your design ownership is not clear, you can fall back to using shared_ptr (either boost or std) being careful not to create circular dependencies that would produce memory leaks. Again to use the shared_ptrs correctly you have to go back and think, think on the lifetime of the objects...
Always, always use smart pointers if you are responsible for deallocating that memory. Do not ever use raw pointers unless you know that you're not responsible for deleting that memory. For WinAPI returns, wrap them into smart pointers. Of course, a list of raw pointers is not an error, because you may wish to have a list of objects whose memory you do not own. But avoiding smart pointers is most assuredly not a solution to any problem, because they're a completely essential tool.
And just use the Standard list. That's what it's for.
In the lines:
// make some links between them here:
psn1->placesVisited.Add(plc1, plc2);
psn2->placesVisited.Add(plc2, plc3);
// add the links to the places as well
plc1->peopleThatVisited.Add(psn1);
plc2->peopleThatVisited.Add(psn1, psn2);
plc3->peopleThatVisited.Add(plc3);
You have instances on the heap that contain pointers to each others. Not only one, but adding the same Place pointer to more than one person will also cause the problem (deleting more than one time the same object in memory).
Telling you to learn STL or to use shared_ptr (Boost) can be a good advice, however, as David Rodríguez said, you need to think of the lifetime of the objects. In other words, you need to redesign this scenario so that the objects do not contain pointers to each other.
Example: Is it really necessary to use pointers? - in this case, and if you require STL lists or vectors, you need to use shared_ptr, but again, if the objects make reference to each other, not even the best shared_ptr implementation will make it.
If this relationship between places and persons is required, design a class or use a container that will carry the references to each other instead of having the persons and places point at each other. Like a many-to-many table in a RDBMS. Then you will have a class/container that will take care of deleting the pointers at the end of the process. This way, no relations between Places and Persons will exist, only in the container.
Regards, J. Rivero
Without looking the exact code of the Add function, and the destructor of your list, it's hard to pin point the problem.
However, as said in the comments, the main problem with this code is that you don't use std::list or std::vector. There are proven efficient implementations, which fit what you need.
First of all, I would certainly use the STL (standard template library), but, I see that you are learning C++, so as an exercise it can be nice to write such a thing as a List template.
First of all, you should never expose data members (e.g. placeVisited and peopleThatVisited). This is a golden rule in object oriented programming. You should use getter and setter methods for that.
Regarding the problem around double deletion: the only solution is having a wrapper class around your pointers, that keeps track of outstanding references. Have a look at the boost::shared_ptr. (Boost is another magnificent well-crafted C++ library).
The program crashes because delete deallocates the memory of the pointer it is given it isn't removing the items from you list. You have the same pointer in at least two lists, so delete is getting called on the same block of memory multiple times this causes the crash.
First, smart pointers are not the answer here. All they will do is
guarantee that the objects never get deleted (since a double linked list
contains cycles, by definition).
Second, there's no way you can pass an argument to the destructor
telling it to delete contained pointers: this would have to be done
either via the list's type, one of its template arguments, partial
specialization or an argument to the constructor. (The latter would
probably require partial specialization as well, to avoid trying to
delete a non-pointer.)
Finally, the way to not delete an object twice is to not call delete on
it twice. I'm not quite sure what you thing is happening in your
destructor, but you never change cursor, so every time through, you're
deleting the same two elements. You probably need something more along
the lines of:
while ( cursor not at end ) {
Elem* next = cursor->next;
delete cursor;
cursor = next;
}
--
James Kanze
And you could easily implement [] around a normal list:
template <class Type>
class mystdlist : public std::list<Type> {
public:
Type& operator[](int index) {
list<Type>::iterator iter = this.begin();
for ( int i = 0; i < index; i++ ) {
iter++;
}
return *iter;
}
};
Why you would want to do this is strange, IMHO. If you want O(1) access, use a vector.

Best way to manage memory ownership in C++? Shared-pointers or other mechanisms?

In a new piece of code I have several different classes that refer to each other. Something like this (this is not my actual situation but an example of something similar):
class BookManager
{
...
};
class Book
{
public:
void setBookManager(BookManager *bookManager) {m_bookManager = bookManager;}
private:
BookManager *m_bookManager;
};
Every book refers to a book manager, but the problem is that many books will have its own specific BookManager, but some books may share a common BookManager.
The caller doesn't really specify what the Book should do with its BookManager, but in about 90% of the cases, the BookManager can be destroyed together with the Book. In about 10% of the cases, the same BookManager is reused for multiple books, and the BookManager must not be deleted with the Book.
Deleting the BookManager together with the Book is handy in those 90% of the cases, as the caller of Book::setBookManager doesn't need to remember the BookManager anymore. It just dies with the Book itself.
I see two alternative solutions for solving this.
First is to make extensive use of shared pointers. If the caller is not interested anymore in the BookManager afterwards, it doesn't keep a shared pointer to it. If it is still interested in it, or if it wants the BookManager to be shared over multiple books, it keeps the shared pointer and passes it to those multiple books.
A second alternative is to tell the Book explicitly what to do with the ownership of the book, like this:
class Book
{
public:
void setBookManager(BookManager *bookManager, book takeOwnership=true)
{
m_bookManager = bookManager;
m_hasOwnership = takeOwnership;
}
~Book()
{
if (m_hasOwnership && m_bookManager) delete m_bookManager;
}
private:
BookManager *m_bookManager;
bool m_hasOwnership;
};
The second solution seems much easier and allows us to use normal pointer syntax (BookManager * as opposed to std::shared_ptr<BookManager>), but it seems less 'clean' than the shared pointer approach.
Another alternative might be to have a typedef in BookManager like this:
class BookManager
{
public:
typedef std::shared_ptr<BookManager> Ptr;
...
};
Which allows us to write this:
BookManager::Ptr bookManager;
Which looks more like the the normal pointer-syntax than the original shared pointer syntax.
Does anyone have experience with either approach?
Any other suggestions?
In C++ if you have shared, un-coordinated access to common objects then the most common approach is some kind of reference counting, which you get from shared_ptr.
The only downside is that it isn't pervasive in C++ (especially libraries), so you sometimes need access to the raw pointer. In those cases, you need to be careful to keep the object alive.
I suggest that if you used shared_ptr -- try to use it everywhere, unless it's impossible. And yes, use the typedef if you want.
You seem to have this pretty well thought through. Looks like an ideal use for shared_ptr to me.
In the interests of adding SOME value for you, make sure you look at the templates for efficient shared_ptr creation here.
There are 2 kinds of "ownership" in C++:
deterministic ownership: at any moment you can point who is responsible for the resource
shared ownership: at any moment there might be several owners, they are hard to pinpoint though
shared_ptr, as the name implies, is for the second case. It is rarely required, especially with the introduction of move semantics and thus unique_ptr, but in those cases where it is actually required, it's invaluable.
Looking at your case, the question I have is: do you actually need shared ownership ?
One typical solution to avoid shared ownership is to use a Factory, which will be the sole owner of all the objects it creates, and guarantees they are alive as long the Factory itself is alive.
It might be "less safe" than using a shared_ptr, but there are very interesting arguments:
the lifetime of the objects is deterministic once more, much easier to debug
there is no risk of creating a cycle of references (and leaking memory) by accident
Unless you are memory constrained (in which case the sooner the objects get deleted the better it is), you might wish to take a BookManagerFactory approach here.
Only when you know exactly where the object is owned and take care to delete it there (!), should you use bare pointers. I have a "stack" of classes that each hold unique parent pointer, in which case the reference count would always be 1 and you can access the deepest element through the last child. It seems like you have a much more complicated setup here, though, go with smart pointers. Remember that even if a bare pointer might seem cleaner or easier, unique_ptr should be recommended, but you might have to battle move vs copy assignment in pre-conversion code and cryptic error messages resulting from the switch.
Lou Franco already gave you the answer, but as a side note: Your second implementation idea is essentially what an auto_ptr does (when take ownership is true).
A better solution might be to have the Book class hold a handle (e.g. array index or hash) to a BookManager that residese in some BookManagerCache class. The Cache class is solely responsible for managing the lifetimes of BookManagers.