N-Ary Tree Design with Smart Pointers - c++

I'm trying to design a tree class in C++, but I'm running into some trouble with node destruction.
If I destroy a node, I don't want to destroy it's entire sub-tree because there might be something else pointed to it. So the obvious solution is the use reference counting. I'd have a weak pointer to the parent, and a vector of shared pointers to the child nodes. That way if a node is destroyed, it's children are only destroyed if nothing is pointing to them.
But I run into another problem here: adding a child to a node. weak_ptr only works if there's already a shared_ptr pointing to an object. And if I adding a child to a node, I don't know where to find a shared_ptr that's pointing to it. So what do I do here?

To expand on David Rodriguez's idea, a skeleton tree might look like this:
struct node : std::enable_shared_from_this<node>
{
std::vector<std::shared_ptr<node>> children;
std::weak_ptr<node> parent;
void add_child()
{
auto n = std::make_shared_node>();
n->parent = std::weak_ptr<node>(shared_from_this());
children.emplace_back(n);
}
}
auto root = std::make_shared<node>();
root.add_child();
root.add_child();
root.add_child();
root.children[0].add_child();
(Of course a real-world node would have a non-trivial constructor with payload values, and add_child would take similar arguments or be a template...)

You might want to look into enable_shared_from_this that allows you to obtain the shared_ptr directly from the object. It still requires that the object is managed by a shared_ptr, but you don't need to find who is holding it.

Related

Can we implement a link list without using the head pointer means by using a simple variable of the head instead of the pointer of the head?

Can we implement a link list without using the head pointer means by using a simple variable of the head instead of the pointer of the head ?
Yes. If you are implementing a circular linked list with a sentinel node, the sentinel node can be the simple variable that also serves as the head.
Alternatively, you could use a std::optional instance to serve as the head.
In specific cases you could, but in general not. And why would you want to? Here are some reasons, I could think of now. Take for example this code:
template<class T>
class Node
{
private:
T value;
Node<T> *next;
};
class MyLinkedList
{
private:
bool isEmpty; // indicates wether the list is empty or not
Node head; // Head as member
};
But there are several major flaws with this code:
You would always need to care about isEmpty when adding or deleting, or doing anything with the list
You can't initialize head if T has no default constructor
When deleting the last element you have to call the destructor of object that technically remains in scope.
When deleting the last element and then deleting the empty list the destructor of Node::value will be called twice
Don't know if those are all reasons, but I think, just #2 is a big enough problem to not consider this.
Of course you could use std::optional, but that's just a pointer with a wrapper. which even works, without a default constructor, so could be an alternative. Alltough it would be used in the same way as a (smart) pointer, so it's not "a simple variable of the head".

C++ Linked list using smart pointers

I have only been using raw pointers for linked list with templates. For example, the member data, Node<T>* head; and when I am inserting a node one of the lines would be head = new Node<T>(data);.
However, now I need to use a smart pointer and I am not sure how I would change it to use smart pointers. Would the member data be changed to shared_ptr<Node<T>> head; and the other line would change to
head = shared_ptr<Node<T>>( new <Node<T>>(data) );?
You do not "need" to use a smart pointer for a linked list, because that statement doesn't make sense. You do not use smart pointers for low-level data structures. You use smart pointers for high-level program logic.
As far as low-level data structures are concerned, you use a standard container class from the C++ standard library, like std::list [*], which solves all your memory-management problems anyway, without using any smart pointers internally.
If you really really need your own highly specialised/optimised custom container class because the entire C++ standard library is unfit for your requirements and you need a replacement for std::list, std::vector, std::unordered_map and other optimised, tested, documented and safe containers – which I very much doubt! –, then you have to manage memory manually anyway, because the point of such a specialised class will almost certainly be the need for techniques like memory pools, copy-on-write or even garbage collection, all of which conflict with a typical smart pointer's rather simplistic deletion logic.
In the words of Herb Sutter:
Never use owning raw pointers and delete, except in rare cases when
implementing your own low-level data structure (and even then keep
that well encapsulated inside a class boundary).
Something along those lines is also expressed in Herb Sutter's and Bjarne Stroustrup's C++ Core Guidelines:
This problem cannot be solved (at scale) by transforming all owning
pointers to unique_ptrs and shared_ptrs, partly because we need/use
owning "raw pointers" as well as simple pointers in the implementation
of our fundamental resource handles. For example, common vector
implementations have one owning pointer and two non-owning pointers.
Writing a linked-list class in C++ with raw pointers can be a useful academic exercise. Writing a linked-list class in C++ with smart pointers is a pointless academic exercise. Using any of these two self-made things in production code is almost automatically wrong.
[*] Or just std::vector, because due to cache locality that will almost always be the better choice anyway.
There are basically two alternatives to set up a smart-pointer enhanced list:
Using std::unique_ptr:
template<typename T>
struct Node
{
Node* _prev;
std::unique_ptr<Node> _next;
T data;
};
std::unique_ptr<Node<T> > root; //inside list
That would be my first choice. The unique-pointer _next takes care there are no memory leaks, whereas _prev is an observing pointer. However, copy constructor and such things -- in case you need them -- need to be defined and implemented by hand.
Using shared_ptr:
template<typename T>
struct Node
{
std::weak_ptr<Node> _prev; //or as well Node*
std::shared_ptr<Node> _next;
T data;
};
std::shared_ptr<Node<T> > root; //inside list
This is alternative is copyable by design and adds further safety because of the weak_ptr, see below. It is less performant than the unique_ptr when it comes to structural changes of the list, such as insertions and removals, e.g. due to thread safety in shared_ptr's control block.
Yet, traversing the list, i.e. dereferencing the pointers, should be as performant as for the unique_ptr.
In both approaches the idea is that one node owns the complete remaining list. Now when a node goes out of scope, there is no danger that the remaining list becomes a memory leak, as the nodes are iteratively destructed (starting from the last one).
The _prev pointer is in both options only an observing pointer: it's task is not to keep the previous nodes alive, but only to provide a link to visit them.
For that, a Node * is usually sufficient (--note: observing pointer means you never do memory related stuff like new, delete on the pointer).
If you want more safety, you can also use a std::weak_ptr which prevents from things like
std::shared_ptr<Node<T> > n;
{
list<T> li;
//fill the list
n = li.root->next->next; //let's say that works for this example
}
n->_prev; //dangling pointer, the previous list does not exists anymore
Using a weak_ptr, you can lock() it and in this way chack whether _prev is still valid.
I would look at the interface of std::list, which is a C++ implementation of linked lists. It seems that you are approaching the templating of your Linked list class wrong. Ideally your linked list should not care about ownership semantics (i.e. whether it is instantiated with raw ptrs, smart pointers or stack allocated variables). An example of ownership sematics with STL containers follows. However, there are better examples of STL and ownership from more authoritative sources.
#include <iostream>
#include <list>
#include <memory>
using namespace std;
int main()
{
// Unique ownership.
unique_ptr<int> int_ptr = make_unique<int>(5);
{
// list of uniquely owned integers.
list<unique_ptr<int>> list_unique_integers;
// Transfer of ownership from my parent stack frame to the
// unique_ptr list.
list_unique_integers.push_back(move(int_ptr));
} // list is destroyed and the integers it owns.
// Accessing the integer here is not a good idea.
// cout << *int_ptr << endl;
// You can make a new one though.
int_ptr.reset(new int(6));
// Shared ownership.
// Create a pointer we intend to share.
shared_ptr<int> a_shared_int = make_shared<int>(5);
{
// A list that shares ownership of integers with anyone that has
// copied the shared pointer.
list<shared_ptr<int>> list_shared_integers;
list_shared_integers.push_back(a_shared_int);
// Editing and reading obviously works.
const shared_ptr<int> a_ref_to_int = list_shared_integers.back();
(*a_ref_to_int)++;
cout << *a_ref_to_int << endl;
} // list_shared_integers goes out of scope, but the integer is not as a
// "reference" to it still exists.
// a_shared_int is still accessible.
(*a_shared_int)++;
cout << (*a_shared_int) << endl;
} // now the integer is deallocated because the shared_ptr goes
// out of scope.
A good exercise to understand ownership, memory allocation/deallocation, and shared pointers is to do a tutorial where you implement your own smart pointers. Then you will understand exactly how to use smart pointers and you will have one of those xen moments where you realise how pretty much everything in C++ comes back to RAII (ownership of resources).
So back to the crux of your question. If you want to stick to Nodes of type T, don't wrap the node in a smart pointer. The Node destructor must delete the underlying raw pointer. The raw pointer may point to a smart pointer itself specified as T. When your "LinkedList"'s class destructor is called it iterates through all Nodes with Node::next and calls delete node; after it obtained the pointer to the next node.
You could create a list where nodes are smart pointers... but this is a very specialised linked list probably called SharedLinkedList or UniqueLinkedList with very different sematics for object creation, popping, etc. Just as an example, a UniqueLinkedList would move a node in the return value when popping a value to a caller. To do metaprogramming for this problem would require the use of partial specialization for different types of T passed. Example, something like:
template<class T>
struct LinkedList
{
Node<T> *head;
};
// The very start of a LinkedList with shared ownership. In all your access
// methods, etc... you will be returning copies of the appropriate pointer,
// therefore creating another reference to the underlying data.
template<class T>
struct LinkedList<std::shared_ptr<T>>
{
shared_ptr<Node<T>> head;
};
Now you start implementing your own STL! You can already see potential for problems as mentioned in the comments to your question with this approach. If nodes have shared_ptr next it will result in a call to that shared Node's destructor, which will call the next shared Node destructor and so forth (stack overflow due to the recursion is possible). So that is why I don't care much for this approach.
Structure will look like
template<typename T> struct Node
{
T data;
shared_ptr<Node<T>> next;
};
Creating of node will look like
shared_ptr<Node<int>> head(new Node<int>);
or
auto head = make_shared<Node>(Node{ 1,nullptr });
dont use smart pointer in graph like data structure because it may cause stack overflow an many performance issue due to recursive call of destructor or inc, decr reference count wich it non optimal due to how dfs and bfs algorithm work

unique_ptr in class how to work with them

I am implementing AVL tree in C++ and using unique_ptr for children.
struct Node
{
const int key;
std::unique_ptr<Node> left, right;
Node* parent;
std::size_t height; ///< for avl tree.
Node(const int key) : key(key), height(0) {}
};
class AVL
{
std::unique_ptr<Node> root;
public:
AVL(int rootKey) : root(std::unique_ptr<Node>(new Node(rootKey))) {
}
void insert(std::unique_ptr<Node> newNode) {
std::unique_ptr<Node> & node = root;
Node* parentWeak;
while(node.get()) {
parentWeak = node->parent;
if (node->key == newNode->key)
throw std::runtime_error("Key already present");
if (node->key < newNode->key)
node = node->right;
else
node = node->left;
}
auto parent = parentWeak;
const int key = newNode->key;
if (parent == nullptr) {
// there is no root
root = std::move(newNode);
} else {
if (parent->key < newNode->key) {
assert(NULL == parent->right.get());
parent->right = std::move(newNode);
} else {
assert(NULL == parent->left.get());
parent->left = std::move(newNode);
}
}
// Now increment the height upto down.
incrementHeight(key);
// balance starting from parent upwards untill we find some dislanace in height
balance(parent, key);
}
};
I am getting compiler errors on line node = node->right;. Which is right because it can be possible with only std::move semantics. but that would be wrong because i want to just iterate over the tree, otherwise it would just remove them from the child-list.
However, i need the unique_ptr also, as it would passed in function balance as it would modify the pointers and re-balance the tree.
If i use shared_ptr it would all work. However, i do not need to share the ownership with others. Or am i misunderstanding ownership ?
Your problem seems to be caused by a lack of understanding how to use unique_ptr in real programs, which is related to the concept of ownership. If a something owns an object, it means, this something is responsible for keeping the object alive as long as this something keeps owning the object, and is responsible to destroy the object as soon as nothing owns the object anymore.
Both unique_ptr and shared_ptr can be used to own objects. The difference, you seem to be aware of, is that an object pointed to by unique_ptr can only have a single owner, while there might be multiple shared_ptr objects sharing ownership of a specific object. If a unique_ptr is destroyed or assigned a different value, by definition it can destroy the object it previously pointed to, as a unique_ptr is the single (unique) owner of an object.
Now you have to think about your tree: You can use shared_ptr for everything, which will likely (seems to) work, as objects are kept alive as long as there are references to them. If there really is the parent member in node which you use in your method but did not declare in the node structuer, you would be likely to create reference cycles, though, creating the danger of keeping objects around way too long (or even forever, this is called a memory leak), as shared_ptr in C++ is purely reference-counted. Two objects containing shared_ptrs pointing to each other keep themselves alive forever, even if no other pointer points to them. It seems like in your shared_ptr solution, the parent member was a weak_ptr which is a sensible way to work around this problem, although possibly not the most efficient one.
You seem to want to improve performance and strictness of your code by using unique_ptr instead of shared_ptr which is commonly accepted as a very good idea, as it forces you to deal with ownership in much greater detail. Your choice that the tree owns the root node, and each node owns the children is a sound design. You seem to have removed the parent pointer, because it can not be a unique_ptr, as in that case, a node would be owned by its parents and any childrens it might have, violating the constraint that an object pointed to by unique_ptr may only have one owner. Also, the parent member can not be a weak_ptr, as weak_ptr can only be used with objects managed by shared_ptr. If you want to translate a design from shared_ptr to unique_ptr, you should consider changing weak_ptrs into raw pointers. A non-owning pointer to an object managed by unique_ptr that detects expiration of that object does not exist (it would not be effienctly implementable with the typical C++ memory management). If you need the property of being able to detect a non-owning pointer to be stale, keep using shared_ptr. The overhead for tracking non-owning pointers is almost as big as full shared-ownership semantics, so there is no middle ground in the standard library.
Finally, let's discuss the insert method. The node variable quite surely is not what you want. You correctly found out (possibly by a compiler error message) that node can not be a unique_ptr, as that would take away ownership from the tree object. In fact, having this variable refer to the root pointer in the tree is the right solution, as you don't want to mess around with ownership at this point, but just want to be able to get a grip on some node. But declaring it as a reference does not fit to the way you want to use it, because in C++ you can't re-seat a reference. What you do is you declare node to be just another name for this->root, so if you assign to node, you are overwriting your root node pointer. I am sure this is not what you intended. Instead, you want node to refer to a different object than it referred to before, so it needs to be something that references the root node and can be made to refer to something else. In C++, this means you want a pointer (as Jarod42 said in the comment). You have two choices at hand for the loop scanning the position where to insert:
Use a raw pointer to node instead of a unique_ptr to node. As you don't need ownership, a raw pointer to node is good enough: You can be sure the owning pointer (this->root) keeps alive as long a you need it, so there is no danger of the object disappearing.
Use a raw pointer to unique_ptr to node. This is essentially your approach, fixed to use a pointer instead of a reference.
As you say, you later need the unique_ptr to pass it to the balance function. If the balance function works out as it is now, and needs a unique_ptr argument, the decision is made: Having a copy of the raw pointer in node just doesn't do what you want, so you need the pointer-to-unique_ptr.

Init reference with invalid value

I have a class:
class node
{
public:
node& parent;
}
I want to set the parent value when I know its right value:
node parent;
...
node n; // here node.parent is a not valid value
n.parent = parent;
But I have to set it's value in the constructor too. How can I do?
You can't change what variable a reference references. So if you can't initialize it in the constructor, you don't want a reference. You can use a regular pointer, but it's probably better to use some kind of smart pointer appropriate to your particular use. The correct answer depends primarily on how the lifetime of the referenced object is managed.
If you want to use references and not pointers because you want to suggest that the class instance does not own the parent node, then you can use std::reference_wrapper from C++11's <functional>.
I would advise against using smart pointers (except maybe std::weak_ptr) if the parent holds references to the children and the children hold references to the parent. Using smart pointers in this case would create a cyclic dependency, which means your objects would never get destroyed.

C++ vector implementation - removing elements

I'm implementing a vector type. I'm not troubled by the algorithms or the data structure at all but I am unsure about a remove method. for instance:
bool Remove(Node* node)
{
/* rearrange all the links and extract the node */
delete node;
}
where node is a pointer to the current node that we are at. But if I delete node then how do I prevent this from happening:
Node* currentNode = MoveToRandNode();
Remove(currentNode);
cout << currentNode->value;
If currentNode were a pointer to a pointer it would be easier but...it's not.
You could add another level of abstraction to your iterator (which now is a raw pointer)
If you do not handle raw pointers, but create some sort of iterator class instead of a pointer, it is possible to invalidate the iterator, and thus failing controlled if anyone tries to access the iterator after it has been removed.
class Iterator {
Node operator*() {
if (node) return *node;
else throw Something();}
private:
Node* node;
}
Of course this wrapping of a pointer will come at a cost of some overhead (checking the pointer on each deref). So you will have to decide how safe you want to play. Either document as suggested by others or wrap for safety.
Step back first. You need to define who "owns" the memory pointed to by the vector. Is it the vector itself, or the code that uses the vector? Once you define this, the answer will be easy - either Remove() method should always delete it or never.
Note that you've just scratched the surface of the possible bugs and you answer to "who owns it" will help with other possible issues like:
If you copy a vector, do you need to copy the items within it, or just the pointers (e.g. do a shallow or deep copy
When you destroy a vector, should you destroy the items within it?
When you insert an item, should you make a copy of the item, or does the vector take ownership of it?
well, you cannot do that, but some modifications to your code can improve safety.
Add ref
bool Remove(Node*& node)
{
/* rearrange all the links and extract the node */
delete node;
node = nullptr;
}
check for nullptr
if(currentNode)
cout << currentNode->value;
probably you need to try std::shared_ptr
This is similar to "iterator invalidation". E.g., if you have a std::list l and a std::list::iterator it pointing into that list, and you call l.erase(it), then the iterator it is invalidated -- i.e., if you use it in any way then you get undefined behavior.
So following that example, you should include in your documentation of the Remove method something along the lines: "the pointer node is invalidated, and may not be used or dereferenced after this method returns."
(Of course, you could also just use std::list, and not bother to re-invent the wheel.)
For more info on iterator invalidation, see: http://www.angelikalanger.com/Conferences/Slides/CppInvalidIterators-DevConnections-2002.pdf
In addition what innochenti wrote.
I think you have to decide what is expected/desired behavior of cout << currentNode->value;:
Error - (as innochenti wrote node = nullptr)
Default Value - create node devault_value (which has some default value for its value), and after delete node; do node=default_value