How to store references to other objects in C++? - c++

This is a more general question that I'm trying to resolve for C++ best practices. Suppose I want to create objects which store references to each other, like a graph. All objects are owned by the same object, like a Graph object to all the Nodes, which is to say the ownership is fixed.
Here's my idea: a class Graph has a std::vector of Nodes, each Node has a std::vector of Nodes representing its list of connections. I'm wondering how best to implement this in terms of smart pointers? To my understanding, ownership is unique so the Graph vector should be std::vector<std::unique_ptr<Node>> nodes and I can populate that as needed. But the connections vector, how can I get each node to store references to its connections? These would only be read-only references, and maybe it would be better to name all the nodes and only store the names, or to store connections in the Graph. But is there a good way of storing references to the connection nodes as if they were const pointers?
Note: this is really about ownership and smart pointers, not about data structures, the graph example is just an example.

When discussing "Best Practices", it's important to consider what your quality-attributes and needs are for the code.
There is no "right" or "wrong" answer in the example of code such as a Graph; there are varying degrees that solve different problems in different ways -- and it depends strongly on the way its intended to be used.
By-far the simplest way to solve such a problem is for the main container (Graph) to have strong ownership in the with unique_ptr, and to only view the lifetime in the internal elements (Node) with a raw pointer, e.g.:
class Graph
{
...
private:
std::vector<std::unique_ptr<Node>> m_nodes;
};
class Node
{
...
private:
std::vector<const Node*> m_connected_nodes;
};
This would work well, since Node cannot mutate its connected nodes, and since Graph assumes that Node will never outlive it.
However, this approach does not work if you ever want Node to outlive Graph, or if you want Node to be used across multiple Graph objects. If it lives between different Graphs, then you may run the risk of a Node referring to a dangling pointer -- and this would be bad.
If this is the case, you might need to consider a different ownership pattern, such as shared_ptr and weak_ptr ownership:
class Graph
{
...
private:
std::vector<std::shared_ptr<Node>> m_nodes;
};
class Node
{
...
private:
std::vector<std::weak_ptr<const Node*>> m_connected_nodes;
};
In this case, Nodes only weakly know other Node objects, whereas Graph is the strong owner of them. This prevents the dangling issue, but incurs additional overhead now for the shared_ptr's control node, and for having to check for whether it's alive before accessing weak_ptr nodes.
So the correct answer is: It depends. If you can get away with the former approach, that's probably the cleanest; you always have 1 owner, and thus the logic is simple and easy to follow.

I'm wondering how best to implement this in terms of smart pointers?
By not using them. Use a vector of nodes for the graph: std::vector<Node>. This is a reasonable default choice until you have a good reason to do otherwise.
But is there a good way of storing references to the connection nodes as if they were const pointers?
Yes. Const pointers are a good way of storing as if they were const pointers. (And by "const pointer", I presume we are actually talking about pointer to const).
A reference wrapper is another choice. Although it has the advantage of not having representation for null, it does have the downside of clumsy syntax.

Related

Getting into smart pointers, how to deal with representing ownership?

i've made a dynamic graph structure where both nodes and arcs are classes (i mean arcs are an actual instance in memory, they are not implied by an adjacency list of nodes to nodes).
Each node has a list of pointers to the arcs it's connected to.
Each arc has 2 pointers to the 2 nodes it's connecting.
Deleting a node calls delete for each of its arcs.
Each arc delete removes its pointer from the arcs lists in the 2 nodes it connects.
Simplified:
~node()
{
while(arcs_list.size())
{
delete arcs_list[arcs_list.size()-1];
}
}
~arc()
{
node_from.remove_arc(this);
node_to.remove_arc(this);
}
If i want to start using smart pointers here, how do i proceed?
Does each arc own 2 nodes, or do 2 nodes share an individual arc's ownership?
I was thinking about a shared_ptr, but shared pointer would only delete the arc when both nodes are deleted. If i delete only one node i would still have to explicitly delete all its arcs if i used shared_ptr. And that totally defeats the point of not using raw pointers in the first place.
Nodes can exist alone; each arc is owned by two nodes and it can only exist as long as these two nodes both exist.
Is there some other kind of smart pointer i should use to handle this?
Or is raw pointer just the plain simple way to go?
Does each arc own 2 nodes, or do 2 nodes share an individual arc's ownership?
You answered this question yourself:
Nodes can exist alone; each arc is owned by two nodes and it can only exist as long as these two nodes both exist.
When object A owns object B, then object A can exist after destroying B, but destroying A implies destroying B. Applied to your case, the two nodes share ownership of the arc.
Is there some other kind of smart pointer i should use to handle this? Or is raw pointer just the plain simple way to go?
Ah, yes. That is the real question. There is no pre-made smart pointer for this situation. However, I would not go with raw pointers in your node and/or arc classes. That would mean those classes would need to implement memory management on top of their primary purpose. (Much better to let each class do one thing well, then try to do many things and fail.) I see a few viable options.
1: Write your own smart pointer
Write a class that can encapsulate the necessary destruction logic. The node and/or arc classes would use your new class instead of standard smart pointers (and instead of raw pointers). Take some time to make sure your design decisions are solid. I'm guessing your new class would want a functional/callable of some sort to tell it how to remove itself from the lists it is in. Or maybe shift some data (like the pointers to the nodes) from the arc class to the new class.
I haven't worked out the details, but this would be a reasonable approach since the situation does not fit any of the standard smart pointers. The key point is to not put this logic directly in your node and arc classes.
2: Flag invalid arcs
If your program can stand not immediately releasing memory, you may be able to take a different approach to resolving an arc deletion. Instead of immediately removing an arc from its nodes' lists, simply flag the arc as no longer valid. When a node needs to access its arcs, it (or better yet, its list) would check each arc it accesses – if the arc is invalid, it can be removed from the list at that time. Once the node has been removed from both lists, the normal shared_ptr functionality will kick in to delete the arc object.
The usefulness of this approach decreases the less frequently a node iterates over its arcs. So there is a judgement call to be made.
How would an arc be flagged invalid? The naive approach would be to give it a boolean flag. Set the flag to false in the constructors, and to true when the arc should be considered deleted. Effective, but does require a new field. Can this be done without bloating the arc class? Well, presumably, each arc needs pointers to its nodes. Since the arc does not own its nodes, these are probably weak pointers. So one way to define an arc being invalid is to check if either weak pointer is expired(). (Note that the weak pointers could be manually reset() when the arc is being deleted directly, not via a node's deletion. So an expired weak pointer need not mean the associated node is gone, only that the arc no longer points to it.)
In the case where the arc class is sizeable, you might want to discard most of its memory immediately, leaving just a stub behind. You could add a level of indirection to accomplish this. Essentially, the nodes would share a pointer to a unique pointer, and the unique pointer would point to what you currently call your arc class. When the arc is deleted, the unique pointer is reset(), freeing most of the arc's memory. An arc is invalid when this unique pointer is null. (It looks like Davis Herring's answer is another way to get this effect with less memory overhead, if you can accept an object storing a shared_ptr to itself.)
3: Use Boost.Bimap
If you can use Boost, they have a container that looks like it would solve your problem: Boost.Bimap. But, you ask, didn't I already discount using an adjacency list? Yes, but this Bimap is more than just a way to associate nodes to each other. This container supports having additional information associated with each relation. That is, each relation in the Bimap would represent an arc and it would have an associated object with the arc's information. Seems to fit your situation well, and you would be able to let someone else worry about memory management (always a nice thing, provided you can trust that someone's abilities).
Since nodes can exist alone, they are owned by the graph (which might or might not be a single object), not the arcs (even as shared ownership). The ownership of an arc by its nodes is, as you observed, dual to the usual shared_ptr situation of either owner being sufficient to keep the object alive. You can nonetheless use shared_ptr and weak_ptr here (along with raw, non-owning pointers to the nodes):
struct Node;
struct Arc {
Node *a,*b;
private:
std::shared_ptr<Arc> skyhook{this};
public:
void free() {skyhook.reset();}
};
struct Node {
std::vector<std::weak_ptr<Arc>> arcs;
~Node() {
for(const auto &w : arcs)
if(const auto a=w.lock()) a->free();
}
};
Obviously other Node operations have to check for empty weak pointers and perhaps clean them out periodically.
Note that exception safety (including vs. bad_alloc in constructing the shared_ptr) requires more care in constructing an Arc.

Modern C++ Object Relationships

I have a graph implemented using a struct Node and a struct Edge where:
Each Edge has a start and an end Node
Each Node maintains a list of Edge objects which start from or end at it
The following is one possible implementation:
struct Node;
struct Edge {
Node *st;
Node *en;
int some_data;
};
const int MAX_EDGES = 100;
struct Node {
Edge *edges[MAX_EDGES];
int some_data;
};
While the above structs can represent the graph I have in mind, I would like to do it the "Modern C++" way while satisfying the following requirements:
Avoid pointers
Use an std::vector for Node::edges
Be able to store Node and Edge objects in standard C++ containers
How is this done in Modern C++? Can all of 1-3 be achieved?
Avoid pointers
You can use std::shared_ptr and std::weak_ptr for this. Just decide whether you want nodes to own edges, or vice versa. The non-owning type should use weak_ptr (to avoid cycles).
Unless your graph is acyclic you might still need to be careful about ownership cycles.
std::unique_ptr is not an option, because there is not a one-to-one relationship between nodes and edges, so there cannot be a unique owner of any given object.
Use an std::vector for Node::edges
No problem. Make it a std::vector<std::weak_ptr<Edge>> or std::vector<std::shared_ptr<Edge>> (depending whether edges own nodes or vice versa)
Be able to store Node and Edge objects in standard C++ containers
No problem, just ensure your type can be safely moved/copied without leaking or corrupting memory, i.e. has correct copy/move constructors and assignment operators. That will happen automatically if you use smart pointers and std::vector as suggested above.
Modern C++ eschews the assignment of dynamic memory to a raw pointer. This is because it is all to easy to forget to delete said pointer. Having said that there is nothing wrong with the use of raw pointers as reference to an object provided you can guarantee that the object's lifetime will be greater than the use of said pointer.
The rules generally are:
Use std::unique_ptr if an object has single owner.
Use raw pointers to reference objects created in 1. provided you can guarantee that the object's lifetime will be greater than the use of your reference.
Use std::shared_ptr for reference counted objects
Use std::weak_ptr to refer to a reference counted object when you do not want to increase the refernce count.
So in your case, if the Edge owns the Nodes then use std::unique_ptr, if not, the keep the raw pointers.
In your Node class, if the Node owns the Edges use a std::vector<Edge> otherwise use a std::vector<Edge*> although it might be more efficient to link the your Edges together in their own intrusive linked list.
Having done some work on complex graphs, it might be allocate all your Nodes and Edgees in a vector outside your graph and then only refer to them internally using raw pointers inside the graph. Remember memory allocation is slow so the less you do the faster your algorithm will be.
By using std::shared_ptr or std::unique_ptr
I don't think vector is a right choice here since a graph usually is not linear (usually speaking, also ,in most cases you can't linearize it like you can with a heap)
there is no standard 'general-use' container , but you can use templates here for generity
for example, your Element class can look like this:
template <class T>
struct Elem {
std::shared_ptr<Node> st , en;
T some_data;
};
speaking of modern C++ , I don't think struct is encouraged here , you ahould encapsulate you data

C++: Replace raw pointers with shared and weak ptr

I'm facing a design issue in my program.
I have to manage Nodes object which are part of a root ChainDescriptor.
Basically it looks like the following:
class ChainDescriptor
{
public:
~ChainDescriptor()
{
//delete the nodes in nodes...
}
void addNode(Node *);
Node * getNode();
const std::list<Node *>& getNodes() const;
std::list<Node *> m_nodes;
};
class Node
{
public:
Node(Node *parent);
void addChild(Node *node);
Node * getChild(const std::string& nodeName);
private:
Node * m_parent;
std::list<Node*> m_childs;
};
The ChainDescriptor class owns all the nodes and is responsible of deleting them.
But these classes need now to be used in another program, a GUI with undo/redo capabilities, with the problematic of the "ownership".
Before modifying the existing code in depth, I'm considering the different solutions:
using shared_ptr and respective list<shared_ptr<...> >
using weak_ptr and respective list<weak_ptr<...> >
In the example above, I don't really know where to use shared_ptr and weak_ptr properly.
Any suggestion?
You can use shared_ptr for m_childs and weak_ptr for m_parent.
However, it might be still reasonable to retain the raw pointer to the parent Node and don't use any weak pointers at all. The safeguarding mechanism behind this is the invariant that non-null parent always exists.
Another option is using shared_ptr in ChainDescriptor only and retaining all raw pointers in Node. This approach avoids weak pointers and has a clean ownership policy (parent nodes own their children).
Weak pointers will help you to manage the memory automatically, but the backside of this are fuzzy ownership logic and performance penalties.
shared_ptr is owning smart pointer and weak_ptr is referencing smart pointer.
So in your situation I think the ChainDescriptor should use shared_ptr (it owns the nodes) and Node should use weak_ptr for m_parent (it only references it) and shared_ptr for m_childs (it owns them).
The usual implementation would be for each node to have strong reference to its child (i.e. keeps them alive), and each child to have a weak reference back to the parent.
The reason for this is to avoid circular references. If only strong references were used, then you'd have a situation where the parent refcount never drops to zero (because the child has a reference), and the child refcount never drops to zero (because the parent has a reference).
I think your ChainDescriptor class is okay to use strong references here though.
Trying to just replace raw pointers with some sort of smart
pointer will in general not work. Smart pointers have
different semantics than weak pointers, and usually, these
special semantics need to be taken into account at a higher
level. The "cleanest" solution here is to add support for copy
in ChainDescriptor, implementing a deep copy. (I'm supposing
here that you can clone Node, and that all of the Node are
always owned by a ChainDescriptor.) Also, for undo, you may
need a deep copy anyway; you don't want modifications in the
active instance to modify the data saved for an undo.
Having said that, your nodes seem to be used to form a tree. In
this case, std::shared_ptr will work, as long as 1) all Node
are always "owned" by either a ChainDescriptor or a parent
Node, and 2) the structure really is a forest, or at least
a collection of DAG (and, of course, you aren't making changes
in any of the saved instances). If the structure is such that
cycles may occur, then you cannot use shared_ptr at this
level. You might be able to abstract the list of nodes and the
trees into a separate implementation class, and have
ChainDescriptor keep a shared_ptr to this.
(FWIW: I used a reference counted pointer for the nodes in
a parse tree I wrote many years ago, and different instances
could share sub-trees. But I designed it from the
start to use reference counted pointers. And because of how the
tree was constructed, I was guaranteed that there could be no
cycles.)

C++ Design: Passing pointer/reference to ref-counted object

I have a directed acyclic graph, composed of Node objects. Each node has a list of std::shared_ptrs to other nodes, which are its children in the graph. I have lots of useful methods I need, such as inserting/emplacing/reparenting nodes, testing if a node is an ancestor of another, etc. Some are standard STL-like methods, and some are specific to directed acyclic graphs and specific to my needs.
The question is, When such a method takes a node as a parameter, should it take a reference? of a weak_ptr? or a shared_ptr? I tried to examine use cases but it's hard to tell. What's the best design here? I'm new to smart pointers and I'm not sure what's the best choice. Should I treat shared_ptr<Node> as "the representation" of node objects? Or maybe the way to choose is more sophisticated?
Thanks in advance
Only pass a shared_ptr (by value) or copy it when the set of owners is meaningfully extended. It's safe, and preferred, to pass pointers when dealing with nodes as pure information.
Note the std::enable_shared_from_this facility to retrieve the correct std::shared_ptr from any graph object. With that base class, a valid naked pointer and a shared pointer are essentially equivalent. I'm not sure how much, if any, overhead it adds. (It definitely ensures that there will be no additional heap fragmentation, which std::make_shared also does.)
Passing shared_ptr anywhere is just an optimization of functionality elegantly provided by shared_from_this. But when you do, pass them by const reference, since they are just providing information without actively arbitrating ownership.

Are data structures an appropriate place for shared_ptr?

I'm in the process of implementing a binary tree in C++. Traditionally, I'd have a pointer to left and a pointer to right, but manual memory management typically ends in tears. Which leads me to my question...
Are data structures an appropriate place to use shared_ptr?
I think it depends on where you'd be using them. I'm assuming that what you're thinking of doing is something like this:
template <class T>
class BinaryTreeNode
{
//public interface ignored for this example
private:
shared_ptr<BinaryTreeNode<T> > left;
shared_ptr<BinaryTreeNode<T> > right;
T data;
}
This would make perfect sense if you're expecting your data structure to handle dynamically created nodes. However, since that's not the normal design, I think it's inappropriate.
My answer would be that no, it's not an appropriate place to use shared_ptr, as the use of shared_ptr implies that the object is actually shared - however, a node in a binary tree is not ever shared. However, as Martin York pointed out, why reinvent the wheel - there's already a smart pointer type that does what we're trying to do - auto_ptr. So go with something like this:
template <class T>
class BinaryTreeNode
{
//public interface ignored for this example
private:
auto_ptr<BinaryTreeNode<T> > left;
auto_ptr<BinaryTreeNode<T> > right;
T data;
}
If anyone asks why data isn't a shared_ptr, the answer is simple - if copies of the data are good for the client of the library, they pass in the data item, and the tree node makes a copy. If the client decides that copies are a bad idea, then the client code can pass in a shared_ptr, which the tree node can safely copy.
Because left and right are not shared boost::shared_ptr<> is probably not the correct smart pointer.
This would be a good place to try std::auto_ptr<>
Yes, absolutely.
But be careful if you have a circular data structure. If you have two objects, both with a shared ptr to each other, then they will never be freed without manually clearing the shared ptr. The weak ptr can be used in this case. This, of course, isn't a worry with a binary tree.
Writing memory management manually is not so difficult on those happy occasions where each object has a single owner, which can therefore delete what it owns in its destructor.
Given that a tree by definition consists of nodes which each have a single parent, and therefore an obvious candidate for their single owner, this is just such a happy occasion. Congratulations!
I think it would be well worth* developing such a solution in your case, AND also trying the shared_ptr approach, hiding the differences entirely behind an identical interface, so you switch between the two and compare the difference in performance with some realistic experiments. That's the only sure way to know whether shared_ptr is suitable for your application.
(* for us, if you tell us how it goes.)
Never use shared_ptr for the the nodes of a data structure. It can cause the destruction of the node to be suspended or delayed if at any point the ownership was shared. This can cause destructors to be called in the wrong sequence.
It is a good practice in data structures for the constructors of nodes to contain any code that couples with other nodes and the destructors to contain code that de-couples from other nodes. Destructors called in the wrong sequence can break this design.
There is a bit of extra overhead with a shared_ptr, notably in space requirements, but if your elements are individually allocated then shared_ptr would be perfect.
Do you even need pointers? It seems you could use boost::optional<BinaryTreeNode<T> > left, right.