Reduce number of shared_ptrs in persistent data structure

Reduce number of shared_ptrs in persistent data structure - c++

I'm faced with a design choice for a singly linked list class. The rough idea is this:
template<typename T>
class List {
public:
...
private:
struct Node {
std::shared_ptr<const T> value;
std::shared_ptr<const Node> next;
};
std::shared_ptr<const Node> node_;
};
Yes I know there are a lot of shared_ptrs wandering around, but that's because List is a functional persistent data structure that needs as much structural sharing as possible. In this implementation, for example, reversing a list does not require copying any elements, and multiple lists can share a common sub-list (by pointing to a same shared_ptr tail).
That being said, I still feel there are perhaps too many shared_ptrs. Is there anyway to reduce the number of shared_ptrs used while still enabling structural sharing? Something like combining the two shared_ptrs inside a Node to reduce the overhead of control blocks... I don't know, maybe there isn't a way, or maybe there is. Any idea is welcome, even about redesigning the List class altogether.

You want to share data without structure (the reverse case).
You want to share structure.
Both require shared pointers. However, if you want to reduce control block overhead, this can be done, so long as you entangle lifetimes.
You can make the T's lifetime tied to its node. The reversed node then needs to also make the original node persist. This can cause structure to outlive its needs, but makes the pure-forward case less expensive.
Make the pointer-to-T a raw pointer.
Create a combined struct with a T and a Node in it.
Use make_shared to create it.
Now make the pointer-to-T point at the T in the combined struct.
Next, use the aliasing ctor to create a shared ptr to the Node sharing the control block of the combined struct.
To reverse, create a helper struct with a Node and a shared ptr to Node. Make shared the helper. Point the shared node ptr to the forward node, the T ptr to the T ptr in the forward node, and then use the aliasing ctor of shared ptr to get a shared ptr to Node.
I do not think this is worth it.

Related

How to store references to other objects in C++?

This is a more general question that I'm trying to resolve for C++ best practices. Suppose I want to create objects which store references to each other, like a graph. All objects are owned by the same object, like a Graph object to all the Nodes, which is to say the ownership is fixed.
Here's my idea: a class Graph has a std::vector of Nodes, each Node has a std::vector of Nodes representing its list of connections. I'm wondering how best to implement this in terms of smart pointers? To my understanding, ownership is unique so the Graph vector should be std::vector<std::unique_ptr<Node>> nodes and I can populate that as needed. But the connections vector, how can I get each node to store references to its connections? These would only be read-only references, and maybe it would be better to name all the nodes and only store the names, or to store connections in the Graph. But is there a good way of storing references to the connection nodes as if they were const pointers?
Note: this is really about ownership and smart pointers, not about data structures, the graph example is just an example.

When discussing "Best Practices", it's important to consider what your quality-attributes and needs are for the code.
There is no "right" or "wrong" answer in the example of code such as a Graph; there are varying degrees that solve different problems in different ways -- and it depends strongly on the way its intended to be used.
By-far the simplest way to solve such a problem is for the main container (Graph) to have strong ownership in the with unique_ptr, and to only view the lifetime in the internal elements (Node) with a raw pointer, e.g.:
class Graph
{
...
private:
std::vector<std::unique_ptr<Node>> m_nodes;
};
class Node
{
...
private:
std::vector<const Node*> m_connected_nodes;
};
This would work well, since Node cannot mutate its connected nodes, and since Graph assumes that Node will never outlive it.
However, this approach does not work if you ever want Node to outlive Graph, or if you want Node to be used across multiple Graph objects. If it lives between different Graphs, then you may run the risk of a Node referring to a dangling pointer -- and this would be bad.
If this is the case, you might need to consider a different ownership pattern, such as shared_ptr and weak_ptr ownership:
class Graph
{
...
private:
std::vector<std::shared_ptr<Node>> m_nodes;
};
class Node
{
...
private:
std::vector<std::weak_ptr<const Node*>> m_connected_nodes;
};
In this case, Nodes only weakly know other Node objects, whereas Graph is the strong owner of them. This prevents the dangling issue, but incurs additional overhead now for the shared_ptr's control node, and for having to check for whether it's alive before accessing weak_ptr nodes.
So the correct answer is: It depends. If you can get away with the former approach, that's probably the cleanest; you always have 1 owner, and thus the logic is simple and easy to follow.

I'm wondering how best to implement this in terms of smart pointers?
By not using them. Use a vector of nodes for the graph: std::vector<Node>. This is a reasonable default choice until you have a good reason to do otherwise.
But is there a good way of storing references to the connection nodes as if they were const pointers?
Yes. Const pointers are a good way of storing as if they were const pointers. (And by "const pointer", I presume we are actually talking about pointer to const).
A reference wrapper is another choice. Although it has the advantage of not having representation for null, it does have the downside of clumsy syntax.

Modern C++ Object Relationships

I have a graph implemented using a struct Node and a struct Edge where:
Each Edge has a start and an end Node
Each Node maintains a list of Edge objects which start from or end at it
The following is one possible implementation:
struct Node;
struct Edge {
Node *st;
Node *en;
int some_data;
};
const int MAX_EDGES = 100;
struct Node {
Edge *edges[MAX_EDGES];
int some_data;
};
While the above structs can represent the graph I have in mind, I would like to do it the "Modern C++" way while satisfying the following requirements:
Avoid pointers
Use an std::vector for Node::edges
Be able to store Node and Edge objects in standard C++ containers
How is this done in Modern C++? Can all of 1-3 be achieved?

Avoid pointers
You can use std::shared_ptr and std::weak_ptr for this. Just decide whether you want nodes to own edges, or vice versa. The non-owning type should use weak_ptr (to avoid cycles).
Unless your graph is acyclic you might still need to be careful about ownership cycles.
std::unique_ptr is not an option, because there is not a one-to-one relationship between nodes and edges, so there cannot be a unique owner of any given object.
Use an std::vector for Node::edges
No problem. Make it a std::vector<std::weak_ptr<Edge>> or std::vector<std::shared_ptr<Edge>> (depending whether edges own nodes or vice versa)
Be able to store Node and Edge objects in standard C++ containers
No problem, just ensure your type can be safely moved/copied without leaking or corrupting memory, i.e. has correct copy/move constructors and assignment operators. That will happen automatically if you use smart pointers and std::vector as suggested above.

Modern C++ eschews the assignment of dynamic memory to a raw pointer. This is because it is all to easy to forget to delete said pointer. Having said that there is nothing wrong with the use of raw pointers as reference to an object provided you can guarantee that the object's lifetime will be greater than the use of said pointer.
The rules generally are:
Use std::unique_ptr if an object has single owner.
Use raw pointers to reference objects created in 1. provided you can guarantee that the object's lifetime will be greater than the use of your reference.
Use std::shared_ptr for reference counted objects
Use std::weak_ptr to refer to a reference counted object when you do not want to increase the refernce count.
So in your case, if the Edge owns the Nodes then use std::unique_ptr, if not, the keep the raw pointers.
In your Node class, if the Node owns the Edges use a std::vector<Edge> otherwise use a std::vector<Edge*> although it might be more efficient to link the your Edges together in their own intrusive linked list.
Having done some work on complex graphs, it might be allocate all your Nodes and Edgees in a vector outside your graph and then only refer to them internally using raw pointers inside the graph. Remember memory allocation is slow so the less you do the faster your algorithm will be.

By using std::shared_ptr or std::unique_ptr
I don't think vector is a right choice here since a graph usually is not linear (usually speaking, also ,in most cases you can't linearize it like you can with a heap)
there is no standard 'general-use' container , but you can use templates here for generity
for example, your Element class can look like this:
template <class T>
struct Elem {
std::shared_ptr<Node> st , en;
T some_data;
};
speaking of modern C++ , I don't think struct is encouraged here , you ahould encapsulate you data

C++: Replace raw pointers with shared and weak ptr

I'm facing a design issue in my program.
I have to manage Nodes object which are part of a root ChainDescriptor.
Basically it looks like the following:
class ChainDescriptor
{
public:
~ChainDescriptor()
{
//delete the nodes in nodes...
}
void addNode(Node *);
Node * getNode();
const std::list<Node *>& getNodes() const;
std::list<Node *> m_nodes;
};
class Node
{
public:
Node(Node *parent);
void addChild(Node *node);
Node * getChild(const std::string& nodeName);
private:
Node * m_parent;
std::list<Node*> m_childs;
};
The ChainDescriptor class owns all the nodes and is responsible of deleting them.
But these classes need now to be used in another program, a GUI with undo/redo capabilities, with the problematic of the "ownership".
Before modifying the existing code in depth, I'm considering the different solutions:
using shared_ptr and respective list<shared_ptr<...> >
using weak_ptr and respective list<weak_ptr<...> >
In the example above, I don't really know where to use shared_ptr and weak_ptr properly.
Any suggestion?

You can use shared_ptr for m_childs and weak_ptr for m_parent.
However, it might be still reasonable to retain the raw pointer to the parent Node and don't use any weak pointers at all. The safeguarding mechanism behind this is the invariant that non-null parent always exists.
Another option is using shared_ptr in ChainDescriptor only and retaining all raw pointers in Node. This approach avoids weak pointers and has a clean ownership policy (parent nodes own their children).
Weak pointers will help you to manage the memory automatically, but the backside of this are fuzzy ownership logic and performance penalties.

shared_ptr is owning smart pointer and weak_ptr is referencing smart pointer.
So in your situation I think the ChainDescriptor should use shared_ptr (it owns the nodes) and Node should use weak_ptr for m_parent (it only references it) and shared_ptr for m_childs (it owns them).

The usual implementation would be for each node to have strong reference to its child (i.e. keeps them alive), and each child to have a weak reference back to the parent.
The reason for this is to avoid circular references. If only strong references were used, then you'd have a situation where the parent refcount never drops to zero (because the child has a reference), and the child refcount never drops to zero (because the parent has a reference).
I think your ChainDescriptor class is okay to use strong references here though.

Trying to just replace raw pointers with some sort of smart
pointer will in general not work. Smart pointers have
different semantics than weak pointers, and usually, these
special semantics need to be taken into account at a higher
level. The "cleanest" solution here is to add support for copy
in ChainDescriptor, implementing a deep copy. (I'm supposing
here that you can clone Node, and that all of the Node are
always owned by a ChainDescriptor.) Also, for undo, you may
need a deep copy anyway; you don't want modifications in the
active instance to modify the data saved for an undo.
Having said that, your nodes seem to be used to form a tree. In
this case, std::shared_ptr will work, as long as 1) all Node
are always "owned" by either a ChainDescriptor or a parent
Node, and 2) the structure really is a forest, or at least
a collection of DAG (and, of course, you aren't making changes
in any of the saved instances). If the structure is such that
cycles may occur, then you cannot use shared_ptr at this
level. You might be able to abstract the list of nodes and the
trees into a separate implementation class, and have
ChainDescriptor keep a shared_ptr to this.
(FWIW: I used a reference counted pointer for the nodes in
a parse tree I wrote many years ago, and different instances
could share sub-trees. But I designed it from the
start to use reference counted pointers. And because of how the
tree was constructed, I was guaranteed that there could be no
cycles.)

C++ Linked List remove all

So this is a bit of a conceptual question. I'm writing a LinkedList in C++, and as Java is my first language, I start to write my removeAll function so that it just joins the head an the tail nodes (I'm using sentinel Nodes btw). But I instantly realize that this won't work in C++ because I have to free the memory for the Nodes!
Is there some way around iterating through the entire list, deleting every element manually?

You can make each node own the next one, i.e. be responsible for destroying it when it is destroyed itself. You can do this by using a smart pointer like std::unique_ptr:
struct node {
// blah blah
std::unique_ptr<node> next;
};
Then you can just destroy the first node and all the others will be accounted for: they will all be destroyed in a chain reaction of unique_ptr destructors.
If this is a doubly-linked list, you should not use unique_ptrs in both directions, however. That would make each node own the next one, and be owned by the next one! You should make this ownership relation exist only in one direction. In the other use regular non-owning pointers: node* previous;
However, this will not work as is for the sentinel node: it should not be destroyed. How to handle that depends on how the sentinel node is identified and other properties of the list.
If you can tell the sentinel node apart easily, like, for example, checking a boolean member, you can use a custom deleter that avoids deleting the sentinel:
struct delete_if_not_sentinel {
void operator()(node* ptr) const {
if(!ptr->is_sentinel) delete ptr;
}
};
typedef std::unique_ptr<node, delete_if_not_sentinel> node_handle;
struct node {
// blah blah
node_handle next;
};
This stops the chain reaction at the sentinel.

You could do it like Java if you used a c++ garbage collector. Not many do. In any case, it saves you at most a constant factor in running time, as you spend the cost to allocate each element in the list anyway.

Yes. Well, sort of... If you implement your list to use a memory pool then it is responsible for all data in that pool and the entire list can be deleted by deleting the memory pool (which may contain one or more large chunks of memory).
When you use memory pools, you generally have at least one of the following considerations:
limitations on how your objects are created and destroyed;
limitations on what kind of data you can store;
extra memory requirements on each node (to reference the pool);
a simple, intuitive pool versus a complex, confusing pool.
I am no expert on this. Generally when I've needed fast memory management it's been for memory that is populated once, with no need to maintain free-lists etc. Memory pools are much easier to design and implement when you have specific goals and design constraints. If you want some magic bullet that works for all situations, you're probably out of luck.

Are data structures an appropriate place for shared_ptr?

I'm in the process of implementing a binary tree in C++. Traditionally, I'd have a pointer to left and a pointer to right, but manual memory management typically ends in tears. Which leads me to my question...
Are data structures an appropriate place to use shared_ptr?

I think it depends on where you'd be using them. I'm assuming that what you're thinking of doing is something like this:
template <class T>
class BinaryTreeNode
{
//public interface ignored for this example
private:
shared_ptr<BinaryTreeNode<T> > left;
shared_ptr<BinaryTreeNode<T> > right;
T data;
}
This would make perfect sense if you're expecting your data structure to handle dynamically created nodes. However, since that's not the normal design, I think it's inappropriate.
My answer would be that no, it's not an appropriate place to use shared_ptr, as the use of shared_ptr implies that the object is actually shared - however, a node in a binary tree is not ever shared. However, as Martin York pointed out, why reinvent the wheel - there's already a smart pointer type that does what we're trying to do - auto_ptr. So go with something like this:
template <class T>
class BinaryTreeNode
{
//public interface ignored for this example
private:
auto_ptr<BinaryTreeNode<T> > left;
auto_ptr<BinaryTreeNode<T> > right;
T data;
}
If anyone asks why data isn't a shared_ptr, the answer is simple - if copies of the data are good for the client of the library, they pass in the data item, and the tree node makes a copy. If the client decides that copies are a bad idea, then the client code can pass in a shared_ptr, which the tree node can safely copy.

Because left and right are not shared boost::shared_ptr<> is probably not the correct smart pointer.
This would be a good place to try std::auto_ptr<>

Yes, absolutely.
But be careful if you have a circular data structure. If you have two objects, both with a shared ptr to each other, then they will never be freed without manually clearing the shared ptr. The weak ptr can be used in this case. This, of course, isn't a worry with a binary tree.

Writing memory management manually is not so difficult on those happy occasions where each object has a single owner, which can therefore delete what it owns in its destructor.
Given that a tree by definition consists of nodes which each have a single parent, and therefore an obvious candidate for their single owner, this is just such a happy occasion. Congratulations!
I think it would be well worth* developing such a solution in your case, AND also trying the shared_ptr approach, hiding the differences entirely behind an identical interface, so you switch between the two and compare the difference in performance with some realistic experiments. That's the only sure way to know whether shared_ptr is suitable for your application.
(* for us, if you tell us how it goes.)

Never use shared_ptr for the the nodes of a data structure. It can cause the destruction of the node to be suspended or delayed if at any point the ownership was shared. This can cause destructors to be called in the wrong sequence.
It is a good practice in data structures for the constructors of nodes to contain any code that couples with other nodes and the destructors to contain code that de-couples from other nodes. Destructors called in the wrong sequence can break this design.

There is a bit of extra overhead with a shared_ptr, notably in space requirements, but if your elements are individually allocated then shared_ptr would be perfect.

Do you even need pointers? It seems you could use boost::optional<BinaryTreeNode<T> > left, right.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Reduce number of shared_ptrs in persistent data structure - c++

Related

How to store references to other objects in C++?

Modern C++ Object Relationships

C++: Replace raw pointers with shared and weak ptr

C++ Linked List remove all

Are data structures an appropriate place for shared_ptr?

Categories

Resources