Implementing a list with unique_ptr<>?

Implementing a list with unique_ptr<>? - c++

As I understand it, a unique_ptr signifies exclusive ownership. A singly linked list seems to fit this, with each node owning the next, like (pseduocode alert)
class node{
public:
unique_ptr<node> next;
int value;
};
but I don't understand how to perform operations like traversing the list, where I'm used to doing
here=here->next;
How do you implement data structures using unique_ptr's? Are they the right tool for the job?

When you go through the nodes, you don't need to own the node pointer, which means that
here=here->next;
Is incorrect if here is a unique_ptr.
Owning an object means "being responsible for it's life and death" which means the owner is the one who have the code that will destroy the object. If you use another definition of owning, then it's not what unique_ptr means.
In you're list node code, you assume that each node is responsible for the next node (if you destroy a node, all the next nodes will be destroyed too). It can be valid behaviour, it depends on your needs, just be sure it's what you really wants.
What you want is to read the pointer without owning it. Current good practice to do this is to use a raw pointer indicating a "use but don't own" kind of usage to other developers looking at this code (unique_ptr means "if I die, the pointed object dies too"):
node* here = nullptr; // it will not own the pointed nodes (don't call delete with this pointer)
here = &first_node(); // assuming first_node() returns a reference to the first node
here = here->next.get(); // to get the next node without owning it: use get() - true in all smart pointers interface

Related

Getting into smart pointers, how to deal with representing ownership?

i've made a dynamic graph structure where both nodes and arcs are classes (i mean arcs are an actual instance in memory, they are not implied by an adjacency list of nodes to nodes).
Each node has a list of pointers to the arcs it's connected to.
Each arc has 2 pointers to the 2 nodes it's connecting.
Deleting a node calls delete for each of its arcs.
Each arc delete removes its pointer from the arcs lists in the 2 nodes it connects.
Simplified:
~node()
{
while(arcs_list.size())
{
delete arcs_list[arcs_list.size()-1];
}
}
~arc()
{
node_from.remove_arc(this);
node_to.remove_arc(this);
}
If i want to start using smart pointers here, how do i proceed?
Does each arc own 2 nodes, or do 2 nodes share an individual arc's ownership?
I was thinking about a shared_ptr, but shared pointer would only delete the arc when both nodes are deleted. If i delete only one node i would still have to explicitly delete all its arcs if i used shared_ptr. And that totally defeats the point of not using raw pointers in the first place.
Nodes can exist alone; each arc is owned by two nodes and it can only exist as long as these two nodes both exist.
Is there some other kind of smart pointer i should use to handle this?
Or is raw pointer just the plain simple way to go?

Does each arc own 2 nodes, or do 2 nodes share an individual arc's ownership?
You answered this question yourself:
Nodes can exist alone; each arc is owned by two nodes and it can only exist as long as these two nodes both exist.
When object A owns object B, then object A can exist after destroying B, but destroying A implies destroying B. Applied to your case, the two nodes share ownership of the arc.
Is there some other kind of smart pointer i should use to handle this? Or is raw pointer just the plain simple way to go?
Ah, yes. That is the real question. There is no pre-made smart pointer for this situation. However, I would not go with raw pointers in your node and/or arc classes. That would mean those classes would need to implement memory management on top of their primary purpose. (Much better to let each class do one thing well, then try to do many things and fail.) I see a few viable options.
1: Write your own smart pointer
Write a class that can encapsulate the necessary destruction logic. The node and/or arc classes would use your new class instead of standard smart pointers (and instead of raw pointers). Take some time to make sure your design decisions are solid. I'm guessing your new class would want a functional/callable of some sort to tell it how to remove itself from the lists it is in. Or maybe shift some data (like the pointers to the nodes) from the arc class to the new class.
I haven't worked out the details, but this would be a reasonable approach since the situation does not fit any of the standard smart pointers. The key point is to not put this logic directly in your node and arc classes.
2: Flag invalid arcs
If your program can stand not immediately releasing memory, you may be able to take a different approach to resolving an arc deletion. Instead of immediately removing an arc from its nodes' lists, simply flag the arc as no longer valid. When a node needs to access its arcs, it (or better yet, its list) would check each arc it accesses – if the arc is invalid, it can be removed from the list at that time. Once the node has been removed from both lists, the normal shared_ptr functionality will kick in to delete the arc object.
The usefulness of this approach decreases the less frequently a node iterates over its arcs. So there is a judgement call to be made.
How would an arc be flagged invalid? The naive approach would be to give it a boolean flag. Set the flag to false in the constructors, and to true when the arc should be considered deleted. Effective, but does require a new field. Can this be done without bloating the arc class? Well, presumably, each arc needs pointers to its nodes. Since the arc does not own its nodes, these are probably weak pointers. So one way to define an arc being invalid is to check if either weak pointer is expired(). (Note that the weak pointers could be manually reset() when the arc is being deleted directly, not via a node's deletion. So an expired weak pointer need not mean the associated node is gone, only that the arc no longer points to it.)
In the case where the arc class is sizeable, you might want to discard most of its memory immediately, leaving just a stub behind. You could add a level of indirection to accomplish this. Essentially, the nodes would share a pointer to a unique pointer, and the unique pointer would point to what you currently call your arc class. When the arc is deleted, the unique pointer is reset(), freeing most of the arc's memory. An arc is invalid when this unique pointer is null. (It looks like Davis Herring's answer is another way to get this effect with less memory overhead, if you can accept an object storing a shared_ptr to itself.)
3: Use Boost.Bimap
If you can use Boost, they have a container that looks like it would solve your problem: Boost.Bimap. But, you ask, didn't I already discount using an adjacency list? Yes, but this Bimap is more than just a way to associate nodes to each other. This container supports having additional information associated with each relation. That is, each relation in the Bimap would represent an arc and it would have an associated object with the arc's information. Seems to fit your situation well, and you would be able to let someone else worry about memory management (always a nice thing, provided you can trust that someone's abilities).

Since nodes can exist alone, they are owned by the graph (which might or might not be a single object), not the arcs (even as shared ownership). The ownership of an arc by its nodes is, as you observed, dual to the usual shared_ptr situation of either owner being sufficient to keep the object alive. You can nonetheless use shared_ptr and weak_ptr here (along with raw, non-owning pointers to the nodes):
struct Node;
struct Arc {
Node *a,*b;
private:
std::shared_ptr<Arc> skyhook{this};
public:
void free() {skyhook.reset();}
};
struct Node {
std::vector<std::weak_ptr<Arc>> arcs;
~Node() {
for(const auto &w : arcs)
if(const auto a=w.lock()) a->free();
}
};
Obviously other Node operations have to check for empty weak pointers and perhaps clean them out periodically.
Note that exception safety (including vs. bad_alloc in constructing the shared_ptr) requires more care in constructing an Arc.

Can I get the unique_ptr (if any) that the pointer belong to?

I'm building a tree with Node class having unique_ptr on the left and right child and Node* pointer to parent. When I'm deleting nodes, I take node and the I have to check if the node i'm deleting is right of left child and then reset the unique_ptr in the parent. Is there any way to take the pointer and ask if there is any unique_ptr wrapper around it and possibly return it?

Is there any way to take the pointer and ask if there is any unique_ptr wrapper around it and possibly return it?
There's no generic way to find the unique_ptr, but you can for example store a reference.
Assuming your tree is binary, you can find the unique_ptr in parent like this:
(parent->left == this ? parent->left : parent->right).release();
If the tree isn't binary, you can iterate over all children.

In C++, pointers are uni-directional; and unique_ptr, being simply a wrapper class around a pointer, doesn't change that. There is no way to get the unique_ptr from the raw pointer it is pointing to.
A few alternative solutions to your particular issue are possible:
Add a parent pointer to the child object, then you can navigate to the parent to delete the child from there. This may be inefficient if you have a lot of nodes.
Implement the concept of an iterator - an abstraction of a context that carries sufficient information to be able to modify the tree (e.g. delete a node). For example, a tree iterator could contain a pointer to the current node, a pointer to its parent and the flag indicating if it's a left or right child. The downside is that you can't modify a tree simply by having a pointer to its node, you need to have an instance of an iterator.

C++: Replace raw pointers with shared and weak ptr

I'm facing a design issue in my program.
I have to manage Nodes object which are part of a root ChainDescriptor.
Basically it looks like the following:
class ChainDescriptor
{
public:
~ChainDescriptor()
{
//delete the nodes in nodes...
}
void addNode(Node *);
Node * getNode();
const std::list<Node *>& getNodes() const;
std::list<Node *> m_nodes;
};
class Node
{
public:
Node(Node *parent);
void addChild(Node *node);
Node * getChild(const std::string& nodeName);
private:
Node * m_parent;
std::list<Node*> m_childs;
};
The ChainDescriptor class owns all the nodes and is responsible of deleting them.
But these classes need now to be used in another program, a GUI with undo/redo capabilities, with the problematic of the "ownership".
Before modifying the existing code in depth, I'm considering the different solutions:
using shared_ptr and respective list<shared_ptr<...> >
using weak_ptr and respective list<weak_ptr<...> >
In the example above, I don't really know where to use shared_ptr and weak_ptr properly.
Any suggestion?

You can use shared_ptr for m_childs and weak_ptr for m_parent.
However, it might be still reasonable to retain the raw pointer to the parent Node and don't use any weak pointers at all. The safeguarding mechanism behind this is the invariant that non-null parent always exists.
Another option is using shared_ptr in ChainDescriptor only and retaining all raw pointers in Node. This approach avoids weak pointers and has a clean ownership policy (parent nodes own their children).
Weak pointers will help you to manage the memory automatically, but the backside of this are fuzzy ownership logic and performance penalties.

shared_ptr is owning smart pointer and weak_ptr is referencing smart pointer.
So in your situation I think the ChainDescriptor should use shared_ptr (it owns the nodes) and Node should use weak_ptr for m_parent (it only references it) and shared_ptr for m_childs (it owns them).

The usual implementation would be for each node to have strong reference to its child (i.e. keeps them alive), and each child to have a weak reference back to the parent.
The reason for this is to avoid circular references. If only strong references were used, then you'd have a situation where the parent refcount never drops to zero (because the child has a reference), and the child refcount never drops to zero (because the parent has a reference).
I think your ChainDescriptor class is okay to use strong references here though.

Trying to just replace raw pointers with some sort of smart
pointer will in general not work. Smart pointers have
different semantics than weak pointers, and usually, these
special semantics need to be taken into account at a higher
level. The "cleanest" solution here is to add support for copy
in ChainDescriptor, implementing a deep copy. (I'm supposing
here that you can clone Node, and that all of the Node are
always owned by a ChainDescriptor.) Also, for undo, you may
need a deep copy anyway; you don't want modifications in the
active instance to modify the data saved for an undo.
Having said that, your nodes seem to be used to form a tree. In
this case, std::shared_ptr will work, as long as 1) all Node
are always "owned" by either a ChainDescriptor or a parent
Node, and 2) the structure really is a forest, or at least
a collection of DAG (and, of course, you aren't making changes
in any of the saved instances). If the structure is such that
cycles may occur, then you cannot use shared_ptr at this
level. You might be able to abstract the list of nodes and the
trees into a separate implementation class, and have
ChainDescriptor keep a shared_ptr to this.
(FWIW: I used a reference counted pointer for the nodes in
a parse tree I wrote many years ago, and different instances
could share sub-trees. But I designed it from the
start to use reference counted pointers. And because of how the
tree was constructed, I was guaranteed that there could be no
cycles.)

Using * operator when dealing with abstract data types in C++

While doing a project dealing with graph theory, I used objects like such:
class Node{
vector<Node> nodeList;
};
Node n = new Node();
Node a = new Node();
n.nodeList.push_back(a);
After creating about 20 nodes each with an average of 3 connections to other nodes, my program would basically hang.
To fix that, I changed my object declarations to
class Node{
vector<Node*> nodeList;
};
Node* n = new Node();
Node* a = new Node();
n.nodeList.push_back(a);
And my program ran through 50 nodes with 10 connections instantly.
The second example ran faster because I was just adding pointers to the lists, as opposed to the actual nodes, right?
But the C++ documentation says that the new keyword returns a pointer to the created object. Why is the entire object put into the vector in the first example as opposed to just the pointer?
Is there any reason the standard in C++ is to copy the entire object into a data structure instead of a pointer?
EDIT:
I apologize, you are right the first example should not compile. I don't have the first example anymore on my drive anymore, and I can't remember exactly how it was. Sorry.

Is there any reason the standard in C++ is to copy the entire object into a data structure instead of a pointer?
Traditionally, all standard library container classes work with Value Semantics as opposed to Reference Semantics.
Values semantics means that containers create internal copies of their elements and return copies of those elements, while Reference Semantics mean that containers contain references to the objects that are their elements. The most obvious way to achieve this by using pointers as container elements. The standard library uses values semantics because:
Implementing value semantics is simpler.
Reference semantics can be error prone. One needs to deal with the actual object being valid all the time during the life cycle of the container element.
If one needs explicit reference semantics then they can choose to do so by using pointers as container elements.
The first code example you show just cannot work as it is. The use of new mandates a pointer. Because this pointer needs to point to the object on freestore. An object with non pointer data type cannot do that. Probably, what you do have in code is assigning a derived class object to a base class object, thus resulting in Object slicing.
If you do need reference semantics it is a good idea to use a smart pointer as container element than the raw pointer you are using now.

But the C++ documentation says that the new keyword returns a pointer
to the created object. Why is the entire object put into the vector in
the first example as opposed to just the pointer?
It doesnt, this code:
class Node{
vector<Node> nodeList;
};
Node n = new Node();
Node a = new Node();
n.nodeList.push_back(a);
simply wont compile, you cant assing a pointer to a value or reference variable.
The vector will hold what ever you specify in the template parameter, be a pointer or a whole object (byVal):
vector<Node> nodeList;
This vector will be a collection of Node objects (by value), and you can only push back Node objects by value.
vector<Node*> nodeList;
This is a vector of pointers of type Node, and you can only push back pointers of type node.

But the C++ documentation says that the new keyword returns a pointer to the created object. Why is the entire object put into the vector in the first example as opposed to just the pointer?
It's probably not... the only way Node a = new Node(); can compile is if there's a Node(Node*) or Node(const Node*) constructor, which Steven would presumably have told us about by now given the comments.
So - my recent SO mantra - 'show us the code or it didn't happen'. ;-P
Is there any reason the standard in C++ is to copy the entire object into a data structure instead of a pointer?
Containers are designed to have value semantics. That's very flexible, as you can choose to have raw- or any-of-many smart-pointers in the container if that suits your purposes. If the containers were designed to extract and store raw pointers to the objects being push_back()ed then there would be more decisions and necessary inefficiencies. For example, a container of doubles or ints doesn't want to store each one indirectly in separate (non-contiguous) heap memory - that'd have a large performance hit.

C++ Linked List remove all

So this is a bit of a conceptual question. I'm writing a LinkedList in C++, and as Java is my first language, I start to write my removeAll function so that it just joins the head an the tail nodes (I'm using sentinel Nodes btw). But I instantly realize that this won't work in C++ because I have to free the memory for the Nodes!
Is there some way around iterating through the entire list, deleting every element manually?

You can make each node own the next one, i.e. be responsible for destroying it when it is destroyed itself. You can do this by using a smart pointer like std::unique_ptr:
struct node {
// blah blah
std::unique_ptr<node> next;
};
Then you can just destroy the first node and all the others will be accounted for: they will all be destroyed in a chain reaction of unique_ptr destructors.
If this is a doubly-linked list, you should not use unique_ptrs in both directions, however. That would make each node own the next one, and be owned by the next one! You should make this ownership relation exist only in one direction. In the other use regular non-owning pointers: node* previous;
However, this will not work as is for the sentinel node: it should not be destroyed. How to handle that depends on how the sentinel node is identified and other properties of the list.
If you can tell the sentinel node apart easily, like, for example, checking a boolean member, you can use a custom deleter that avoids deleting the sentinel:
struct delete_if_not_sentinel {
void operator()(node* ptr) const {
if(!ptr->is_sentinel) delete ptr;
}
};
typedef std::unique_ptr<node, delete_if_not_sentinel> node_handle;
struct node {
// blah blah
node_handle next;
};
This stops the chain reaction at the sentinel.

You could do it like Java if you used a c++ garbage collector. Not many do. In any case, it saves you at most a constant factor in running time, as you spend the cost to allocate each element in the list anyway.

Yes. Well, sort of... If you implement your list to use a memory pool then it is responsible for all data in that pool and the entire list can be deleted by deleting the memory pool (which may contain one or more large chunks of memory).
When you use memory pools, you generally have at least one of the following considerations:
limitations on how your objects are created and destroyed;
limitations on what kind of data you can store;
extra memory requirements on each node (to reference the pool);
a simple, intuitive pool versus a complex, confusing pool.
I am no expert on this. Generally when I've needed fast memory management it's been for memory that is populated once, with no need to maintain free-lists etc. Memory pools are much easier to design and implement when you have specific goals and design constraints. If you want some magic bullet that works for all situations, you're probably out of luck.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Implementing a list with unique_ptr<>? - c++

Related

Getting into smart pointers, how to deal with representing ownership?

Can I get the unique_ptr (if any) that the pointer belong to?

C++: Replace raw pointers with shared and weak ptr

Using * operator when dealing with abstract data types in C++

C++ Linked List remove all

Categories

Resources