C++: Replace raw pointers with shared and weak ptr - c++

I'm facing a design issue in my program.
I have to manage Nodes object which are part of a root ChainDescriptor.
Basically it looks like the following:
class ChainDescriptor
{
public:
~ChainDescriptor()
{
//delete the nodes in nodes...
}
void addNode(Node *);
Node * getNode();
const std::list<Node *>& getNodes() const;
std::list<Node *> m_nodes;
};
class Node
{
public:
Node(Node *parent);
void addChild(Node *node);
Node * getChild(const std::string& nodeName);
private:
Node * m_parent;
std::list<Node*> m_childs;
};
The ChainDescriptor class owns all the nodes and is responsible of deleting them.
But these classes need now to be used in another program, a GUI with undo/redo capabilities, with the problematic of the "ownership".
Before modifying the existing code in depth, I'm considering the different solutions:
using shared_ptr and respective list<shared_ptr<...> >
using weak_ptr and respective list<weak_ptr<...> >
In the example above, I don't really know where to use shared_ptr and weak_ptr properly.
Any suggestion?

You can use shared_ptr for m_childs and weak_ptr for m_parent.
However, it might be still reasonable to retain the raw pointer to the parent Node and don't use any weak pointers at all. The safeguarding mechanism behind this is the invariant that non-null parent always exists.
Another option is using shared_ptr in ChainDescriptor only and retaining all raw pointers in Node. This approach avoids weak pointers and has a clean ownership policy (parent nodes own their children).
Weak pointers will help you to manage the memory automatically, but the backside of this are fuzzy ownership logic and performance penalties.

shared_ptr is owning smart pointer and weak_ptr is referencing smart pointer.
So in your situation I think the ChainDescriptor should use shared_ptr (it owns the nodes) and Node should use weak_ptr for m_parent (it only references it) and shared_ptr for m_childs (it owns them).

The usual implementation would be for each node to have strong reference to its child (i.e. keeps them alive), and each child to have a weak reference back to the parent.
The reason for this is to avoid circular references. If only strong references were used, then you'd have a situation where the parent refcount never drops to zero (because the child has a reference), and the child refcount never drops to zero (because the parent has a reference).
I think your ChainDescriptor class is okay to use strong references here though.

Trying to just replace raw pointers with some sort of smart
pointer will in general not work. Smart pointers have
different semantics than weak pointers, and usually, these
special semantics need to be taken into account at a higher
level. The "cleanest" solution here is to add support for copy
in ChainDescriptor, implementing a deep copy. (I'm supposing
here that you can clone Node, and that all of the Node are
always owned by a ChainDescriptor.) Also, for undo, you may
need a deep copy anyway; you don't want modifications in the
active instance to modify the data saved for an undo.
Having said that, your nodes seem to be used to form a tree. In
this case, std::shared_ptr will work, as long as 1) all Node
are always "owned" by either a ChainDescriptor or a parent
Node, and 2) the structure really is a forest, or at least
a collection of DAG (and, of course, you aren't making changes
in any of the saved instances). If the structure is such that
cycles may occur, then you cannot use shared_ptr at this
level. You might be able to abstract the list of nodes and the
trees into a separate implementation class, and have
ChainDescriptor keep a shared_ptr to this.
(FWIW: I used a reference counted pointer for the nodes in
a parse tree I wrote many years ago, and different instances
could share sub-trees. But I designed it from the
start to use reference counted pointers. And because of how the
tree was constructed, I was guaranteed that there could be no
cycles.)

Related

How to store references to other objects in C++?

This is a more general question that I'm trying to resolve for C++ best practices. Suppose I want to create objects which store references to each other, like a graph. All objects are owned by the same object, like a Graph object to all the Nodes, which is to say the ownership is fixed.
Here's my idea: a class Graph has a std::vector of Nodes, each Node has a std::vector of Nodes representing its list of connections. I'm wondering how best to implement this in terms of smart pointers? To my understanding, ownership is unique so the Graph vector should be std::vector<std::unique_ptr<Node>> nodes and I can populate that as needed. But the connections vector, how can I get each node to store references to its connections? These would only be read-only references, and maybe it would be better to name all the nodes and only store the names, or to store connections in the Graph. But is there a good way of storing references to the connection nodes as if they were const pointers?
Note: this is really about ownership and smart pointers, not about data structures, the graph example is just an example.
When discussing "Best Practices", it's important to consider what your quality-attributes and needs are for the code.
There is no "right" or "wrong" answer in the example of code such as a Graph; there are varying degrees that solve different problems in different ways -- and it depends strongly on the way its intended to be used.
By-far the simplest way to solve such a problem is for the main container (Graph) to have strong ownership in the with unique_ptr, and to only view the lifetime in the internal elements (Node) with a raw pointer, e.g.:
class Graph
{
...
private:
std::vector<std::unique_ptr<Node>> m_nodes;
};
class Node
{
...
private:
std::vector<const Node*> m_connected_nodes;
};
This would work well, since Node cannot mutate its connected nodes, and since Graph assumes that Node will never outlive it.
However, this approach does not work if you ever want Node to outlive Graph, or if you want Node to be used across multiple Graph objects. If it lives between different Graphs, then you may run the risk of a Node referring to a dangling pointer -- and this would be bad.
If this is the case, you might need to consider a different ownership pattern, such as shared_ptr and weak_ptr ownership:
class Graph
{
...
private:
std::vector<std::shared_ptr<Node>> m_nodes;
};
class Node
{
...
private:
std::vector<std::weak_ptr<const Node*>> m_connected_nodes;
};
In this case, Nodes only weakly know other Node objects, whereas Graph is the strong owner of them. This prevents the dangling issue, but incurs additional overhead now for the shared_ptr's control node, and for having to check for whether it's alive before accessing weak_ptr nodes.
So the correct answer is: It depends. If you can get away with the former approach, that's probably the cleanest; you always have 1 owner, and thus the logic is simple and easy to follow.
I'm wondering how best to implement this in terms of smart pointers?
By not using them. Use a vector of nodes for the graph: std::vector<Node>. This is a reasonable default choice until you have a good reason to do otherwise.
But is there a good way of storing references to the connection nodes as if they were const pointers?
Yes. Const pointers are a good way of storing as if they were const pointers. (And by "const pointer", I presume we are actually talking about pointer to const).
A reference wrapper is another choice. Although it has the advantage of not having representation for null, it does have the downside of clumsy syntax.

Getting into smart pointers, how to deal with representing ownership?

i've made a dynamic graph structure where both nodes and arcs are classes (i mean arcs are an actual instance in memory, they are not implied by an adjacency list of nodes to nodes).
Each node has a list of pointers to the arcs it's connected to.
Each arc has 2 pointers to the 2 nodes it's connecting.
Deleting a node calls delete for each of its arcs.
Each arc delete removes its pointer from the arcs lists in the 2 nodes it connects.
Simplified:
~node()
{
while(arcs_list.size())
{
delete arcs_list[arcs_list.size()-1];
}
}
~arc()
{
node_from.remove_arc(this);
node_to.remove_arc(this);
}
If i want to start using smart pointers here, how do i proceed?
Does each arc own 2 nodes, or do 2 nodes share an individual arc's ownership?
I was thinking about a shared_ptr, but shared pointer would only delete the arc when both nodes are deleted. If i delete only one node i would still have to explicitly delete all its arcs if i used shared_ptr. And that totally defeats the point of not using raw pointers in the first place.
Nodes can exist alone; each arc is owned by two nodes and it can only exist as long as these two nodes both exist.
Is there some other kind of smart pointer i should use to handle this?
Or is raw pointer just the plain simple way to go?
Does each arc own 2 nodes, or do 2 nodes share an individual arc's ownership?
You answered this question yourself:
Nodes can exist alone; each arc is owned by two nodes and it can only exist as long as these two nodes both exist.
When object A owns object B, then object A can exist after destroying B, but destroying A implies destroying B. Applied to your case, the two nodes share ownership of the arc.
Is there some other kind of smart pointer i should use to handle this? Or is raw pointer just the plain simple way to go?
Ah, yes. That is the real question. There is no pre-made smart pointer for this situation. However, I would not go with raw pointers in your node and/or arc classes. That would mean those classes would need to implement memory management on top of their primary purpose. (Much better to let each class do one thing well, then try to do many things and fail.) I see a few viable options.
1: Write your own smart pointer
Write a class that can encapsulate the necessary destruction logic. The node and/or arc classes would use your new class instead of standard smart pointers (and instead of raw pointers). Take some time to make sure your design decisions are solid. I'm guessing your new class would want a functional/callable of some sort to tell it how to remove itself from the lists it is in. Or maybe shift some data (like the pointers to the nodes) from the arc class to the new class.
I haven't worked out the details, but this would be a reasonable approach since the situation does not fit any of the standard smart pointers. The key point is to not put this logic directly in your node and arc classes.
2: Flag invalid arcs
If your program can stand not immediately releasing memory, you may be able to take a different approach to resolving an arc deletion. Instead of immediately removing an arc from its nodes' lists, simply flag the arc as no longer valid. When a node needs to access its arcs, it (or better yet, its list) would check each arc it accesses – if the arc is invalid, it can be removed from the list at that time. Once the node has been removed from both lists, the normal shared_ptr functionality will kick in to delete the arc object.
The usefulness of this approach decreases the less frequently a node iterates over its arcs. So there is a judgement call to be made.
How would an arc be flagged invalid? The naive approach would be to give it a boolean flag. Set the flag to false in the constructors, and to true when the arc should be considered deleted. Effective, but does require a new field. Can this be done without bloating the arc class? Well, presumably, each arc needs pointers to its nodes. Since the arc does not own its nodes, these are probably weak pointers. So one way to define an arc being invalid is to check if either weak pointer is expired(). (Note that the weak pointers could be manually reset() when the arc is being deleted directly, not via a node's deletion. So an expired weak pointer need not mean the associated node is gone, only that the arc no longer points to it.)
In the case where the arc class is sizeable, you might want to discard most of its memory immediately, leaving just a stub behind. You could add a level of indirection to accomplish this. Essentially, the nodes would share a pointer to a unique pointer, and the unique pointer would point to what you currently call your arc class. When the arc is deleted, the unique pointer is reset(), freeing most of the arc's memory. An arc is invalid when this unique pointer is null. (It looks like Davis Herring's answer is another way to get this effect with less memory overhead, if you can accept an object storing a shared_ptr to itself.)
3: Use Boost.Bimap
If you can use Boost, they have a container that looks like it would solve your problem: Boost.Bimap. But, you ask, didn't I already discount using an adjacency list? Yes, but this Bimap is more than just a way to associate nodes to each other. This container supports having additional information associated with each relation. That is, each relation in the Bimap would represent an arc and it would have an associated object with the arc's information. Seems to fit your situation well, and you would be able to let someone else worry about memory management (always a nice thing, provided you can trust that someone's abilities).
Since nodes can exist alone, they are owned by the graph (which might or might not be a single object), not the arcs (even as shared ownership). The ownership of an arc by its nodes is, as you observed, dual to the usual shared_ptr situation of either owner being sufficient to keep the object alive. You can nonetheless use shared_ptr and weak_ptr here (along with raw, non-owning pointers to the nodes):
struct Node;
struct Arc {
Node *a,*b;
private:
std::shared_ptr<Arc> skyhook{this};
public:
void free() {skyhook.reset();}
};
struct Node {
std::vector<std::weak_ptr<Arc>> arcs;
~Node() {
for(const auto &w : arcs)
if(const auto a=w.lock()) a->free();
}
};
Obviously other Node operations have to check for empty weak pointers and perhaps clean them out periodically.
Note that exception safety (including vs. bad_alloc in constructing the shared_ptr) requires more care in constructing an Arc.

How to properly use shared_ptr in good C++ APIs

I'm currently trying to find out how to properly use the shared_ptr feature of C++11 in C++ APIs. The main area where I need it is in container classes (Like nodes in a scene graph for example which may contain a list of child nodes and a reference to the parent node and stuff like that). Creating copies of the nodes is not an option and using references or pointers is pain in the ass because no one really knows who is responsible for destructing the nodes (And when someone destructs a node which is still referenced by some other node the program will crash).
So I think using shared_ptr may be a good idea here. Let's take a look at the following simplified example (Which demonstrates a child node which must be connected to a parent node):
#include <memory>
#include <iostream>
using namespace std;
class Parent {};
class Child {
private:
shared_ptr<Parent> parent;
public:
Child(const shared_ptr<Parent>& parent) : parent(parent) {}
Parent& getParent() { return *parent.get(); }
};
int main() {
// Create parent
shared_ptr<Parent> parent(new Parent());
// Create child for the parent
Child child(parent);
// Some other code may need to get the parent from the child again like this:
Parent& p = child.getParent();
...
return 0;
}
This API forces the user to use a shared_ptr for creating the actual connection between the child and the parent. But in other methods I want a more simple API, that's why the getParent() method returns a reference to the parent and not the shared_ptr.
My first question is: Is this a correct usage of shared_ptr? Or is there room for improvement?
My second question is: How do I properly react on null-pointers? Because the getParent method returns a reference the user may think it never can return NULL. But that's wrong because it will return NULL when someone passes a shared pointer containing a null-pointer to the constructor. Actually I don't want null pointers. The parent must always be set. How do I properly handle this? By manually checking the shared pointer in the constructor and throwing an exception when it contains NULL? Or is there a better way? Maybe some sort of non-nullable-shared-pointer?
Using shared pointers for the purpose you describe is reasonable and increasingly common in C++11 libraries.
A few points to note:
On an API, taking a shared_ptr as an argument forces the caller construct a shared_ptr. This is definitely a good move where there is a transfer of ownership of the pointee. In cases where the function merely uses a shared_ptr, it may be acceptable to take a reference to the object or the shared_ptr
You are using shared_ptr<Parent> to hold a back reference to the parent object whilst using one in the other direction. This will create a retain-cycle resulting in objects that never get deleted. In general, used a shared_ptr when referencing from the top down, and a weak_ptr when referencing up. Watch out in particular for delegate/callback/observer objects - these almost always want a weak_ptr to the callee. You also need to take care around lambdas if they are executing asynchronously. A common pattern is to capture a weak_ptr.
Passing shared pointers by reference rather than value is a stylistic point with arguments for and against. Clearly when passing by reference you are not passing ownership (e.g. increasing the reference count on the object). On the other hand, you are also not taking the overhead either. There is a danger that you under reference objects this way. On a more practical level, with a C++11 compiler and standard library, passing by value should result in a move rather than copy construction and be very nearly free anyway. However, passing by reference makes debugging considerably easier as you won't be repeatedly stepping into shared_ptr's constructor.
Construct your shared_ptr with std::make_shared rather than new() and shared_ptr's constructor
shared_ptr<Parent> parent = std::make_shared<Parent>();
With modern compilers and libraries this can save a call to new().
both shared_ptr and weak_ptr can contain NULL - just as any other pointer can. You should always get in the habit of checking before dereferencing and probably assert()ing liberally too. For the constructor case, you can always accept NULL pointers and instead throw at the point of use.
You might consider using a typedef for your shared pointer type. One style that is sometimes used is follows:
typedef std::weak_ptr<Parent> Parent_P;
typedef std::shared_ptr<Parent> Parent_WkP;
typedef std::weak_ptr<Child> Child_P;
typedef std::shared_ptr<Child> Child_WkP;
It's also useful to know that in header files you can forward declare shared_ptr<Type> without having seen a full declaration for Type. This can save a lot of header bloat
The way that you are using shared pointers is correct with 2 caveats.
That your tree of parents and childen must share the lifetime of the pointers with other objects. If your Parent child tree will be the sole users of the pointer, please use a unique_ptr. If another object controls the lifetime of the pointer are you only want to reference the pointer, you may be better off using a weak_ptr unless the lifetime is guaranteed to exceed your Parent Child tree the raw pointer may be suitable.. Please remember that with shared_ptr you can get circular reference so it is not a silver bullet.
As for how to control NULL pointers: well this all comes down to the contract implicit in your API. If the user is not allowed to supply a null pointer, you just need to document this fact. The best way to do this is to include an assert that the pointer is not null. This will crash your application in debug mode (if the pointer is null) but will not incur a runtime penalty on your release binary. If however a null pointer is is an allowed input for some reason, then you need to provide correct error handling in the case of a null pointer.
Children do not own their parents. Rather, it's the other way around. If children need to be able to get their parents, then use a non-owning pointer or reference. Use shared (or better, unique if you can) pointer for parent to child.

Implementing a list with unique_ptr<>?

As I understand it, a unique_ptr signifies exclusive ownership. A singly linked list seems to fit this, with each node owning the next, like (pseduocode alert)
class node{
public:
unique_ptr<node> next;
int value;
};
but I don't understand how to perform operations like traversing the list, where I'm used to doing
here=here->next;
How do you implement data structures using unique_ptr's? Are they the right tool for the job?
When you go through the nodes, you don't need to own the node pointer, which means that
here=here->next;
Is incorrect if here is a unique_ptr.
Owning an object means "being responsible for it's life and death" which means the owner is the one who have the code that will destroy the object. If you use another definition of owning, then it's not what unique_ptr means.
In you're list node code, you assume that each node is responsible for the next node (if you destroy a node, all the next nodes will be destroyed too). It can be valid behaviour, it depends on your needs, just be sure it's what you really wants.
What you want is to read the pointer without owning it. Current good practice to do this is to use a raw pointer indicating a "use but don't own" kind of usage to other developers looking at this code (unique_ptr means "if I die, the pointed object dies too"):
node* here = nullptr; // it will not own the pointed nodes (don't call delete with this pointer)
here = &first_node(); // assuming first_node() returns a reference to the first node
here = here->next.get(); // to get the next node without owning it: use get() - true in all smart pointers interface

Are data structures an appropriate place for shared_ptr?

I'm in the process of implementing a binary tree in C++. Traditionally, I'd have a pointer to left and a pointer to right, but manual memory management typically ends in tears. Which leads me to my question...
Are data structures an appropriate place to use shared_ptr?
I think it depends on where you'd be using them. I'm assuming that what you're thinking of doing is something like this:
template <class T>
class BinaryTreeNode
{
//public interface ignored for this example
private:
shared_ptr<BinaryTreeNode<T> > left;
shared_ptr<BinaryTreeNode<T> > right;
T data;
}
This would make perfect sense if you're expecting your data structure to handle dynamically created nodes. However, since that's not the normal design, I think it's inappropriate.
My answer would be that no, it's not an appropriate place to use shared_ptr, as the use of shared_ptr implies that the object is actually shared - however, a node in a binary tree is not ever shared. However, as Martin York pointed out, why reinvent the wheel - there's already a smart pointer type that does what we're trying to do - auto_ptr. So go with something like this:
template <class T>
class BinaryTreeNode
{
//public interface ignored for this example
private:
auto_ptr<BinaryTreeNode<T> > left;
auto_ptr<BinaryTreeNode<T> > right;
T data;
}
If anyone asks why data isn't a shared_ptr, the answer is simple - if copies of the data are good for the client of the library, they pass in the data item, and the tree node makes a copy. If the client decides that copies are a bad idea, then the client code can pass in a shared_ptr, which the tree node can safely copy.
Because left and right are not shared boost::shared_ptr<> is probably not the correct smart pointer.
This would be a good place to try std::auto_ptr<>
Yes, absolutely.
But be careful if you have a circular data structure. If you have two objects, both with a shared ptr to each other, then they will never be freed without manually clearing the shared ptr. The weak ptr can be used in this case. This, of course, isn't a worry with a binary tree.
Writing memory management manually is not so difficult on those happy occasions where each object has a single owner, which can therefore delete what it owns in its destructor.
Given that a tree by definition consists of nodes which each have a single parent, and therefore an obvious candidate for their single owner, this is just such a happy occasion. Congratulations!
I think it would be well worth* developing such a solution in your case, AND also trying the shared_ptr approach, hiding the differences entirely behind an identical interface, so you switch between the two and compare the difference in performance with some realistic experiments. That's the only sure way to know whether shared_ptr is suitable for your application.
(* for us, if you tell us how it goes.)
Never use shared_ptr for the the nodes of a data structure. It can cause the destruction of the node to be suspended or delayed if at any point the ownership was shared. This can cause destructors to be called in the wrong sequence.
It is a good practice in data structures for the constructors of nodes to contain any code that couples with other nodes and the destructors to contain code that de-couples from other nodes. Destructors called in the wrong sequence can break this design.
There is a bit of extra overhead with a shared_ptr, notably in space requirements, but if your elements are individually allocated then shared_ptr would be perfect.
Do you even need pointers? It seems you could use boost::optional<BinaryTreeNode<T> > left, right.