I am implementing AVL tree in C++ and using unique_ptr for children.
struct Node
{
const int key;
std::unique_ptr<Node> left, right;
Node* parent;
std::size_t height; ///< for avl tree.
Node(const int key) : key(key), height(0) {}
};
class AVL
{
std::unique_ptr<Node> root;
public:
AVL(int rootKey) : root(std::unique_ptr<Node>(new Node(rootKey))) {
}
void insert(std::unique_ptr<Node> newNode) {
std::unique_ptr<Node> & node = root;
Node* parentWeak;
while(node.get()) {
parentWeak = node->parent;
if (node->key == newNode->key)
throw std::runtime_error("Key already present");
if (node->key < newNode->key)
node = node->right;
else
node = node->left;
}
auto parent = parentWeak;
const int key = newNode->key;
if (parent == nullptr) {
// there is no root
root = std::move(newNode);
} else {
if (parent->key < newNode->key) {
assert(NULL == parent->right.get());
parent->right = std::move(newNode);
} else {
assert(NULL == parent->left.get());
parent->left = std::move(newNode);
}
}
// Now increment the height upto down.
incrementHeight(key);
// balance starting from parent upwards untill we find some dislanace in height
balance(parent, key);
}
};
I am getting compiler errors on line node = node->right;. Which is right because it can be possible with only std::move semantics. but that would be wrong because i want to just iterate over the tree, otherwise it would just remove them from the child-list.
However, i need the unique_ptr also, as it would passed in function balance as it would modify the pointers and re-balance the tree.
If i use shared_ptr it would all work. However, i do not need to share the ownership with others. Or am i misunderstanding ownership ?
Your problem seems to be caused by a lack of understanding how to use unique_ptr in real programs, which is related to the concept of ownership. If a something owns an object, it means, this something is responsible for keeping the object alive as long as this something keeps owning the object, and is responsible to destroy the object as soon as nothing owns the object anymore.
Both unique_ptr and shared_ptr can be used to own objects. The difference, you seem to be aware of, is that an object pointed to by unique_ptr can only have a single owner, while there might be multiple shared_ptr objects sharing ownership of a specific object. If a unique_ptr is destroyed or assigned a different value, by definition it can destroy the object it previously pointed to, as a unique_ptr is the single (unique) owner of an object.
Now you have to think about your tree: You can use shared_ptr for everything, which will likely (seems to) work, as objects are kept alive as long as there are references to them. If there really is the parent member in node which you use in your method but did not declare in the node structuer, you would be likely to create reference cycles, though, creating the danger of keeping objects around way too long (or even forever, this is called a memory leak), as shared_ptr in C++ is purely reference-counted. Two objects containing shared_ptrs pointing to each other keep themselves alive forever, even if no other pointer points to them. It seems like in your shared_ptr solution, the parent member was a weak_ptr which is a sensible way to work around this problem, although possibly not the most efficient one.
You seem to want to improve performance and strictness of your code by using unique_ptr instead of shared_ptr which is commonly accepted as a very good idea, as it forces you to deal with ownership in much greater detail. Your choice that the tree owns the root node, and each node owns the children is a sound design. You seem to have removed the parent pointer, because it can not be a unique_ptr, as in that case, a node would be owned by its parents and any childrens it might have, violating the constraint that an object pointed to by unique_ptr may only have one owner. Also, the parent member can not be a weak_ptr, as weak_ptr can only be used with objects managed by shared_ptr. If you want to translate a design from shared_ptr to unique_ptr, you should consider changing weak_ptrs into raw pointers. A non-owning pointer to an object managed by unique_ptr that detects expiration of that object does not exist (it would not be effienctly implementable with the typical C++ memory management). If you need the property of being able to detect a non-owning pointer to be stale, keep using shared_ptr. The overhead for tracking non-owning pointers is almost as big as full shared-ownership semantics, so there is no middle ground in the standard library.
Finally, let's discuss the insert method. The node variable quite surely is not what you want. You correctly found out (possibly by a compiler error message) that node can not be a unique_ptr, as that would take away ownership from the tree object. In fact, having this variable refer to the root pointer in the tree is the right solution, as you don't want to mess around with ownership at this point, but just want to be able to get a grip on some node. But declaring it as a reference does not fit to the way you want to use it, because in C++ you can't re-seat a reference. What you do is you declare node to be just another name for this->root, so if you assign to node, you are overwriting your root node pointer. I am sure this is not what you intended. Instead, you want node to refer to a different object than it referred to before, so it needs to be something that references the root node and can be made to refer to something else. In C++, this means you want a pointer (as Jarod42 said in the comment). You have two choices at hand for the loop scanning the position where to insert:
Use a raw pointer to node instead of a unique_ptr to node. As you don't need ownership, a raw pointer to node is good enough: You can be sure the owning pointer (this->root) keeps alive as long a you need it, so there is no danger of the object disappearing.
Use a raw pointer to unique_ptr to node. This is essentially your approach, fixed to use a pointer instead of a reference.
As you say, you later need the unique_ptr to pass it to the balance function. If the balance function works out as it is now, and needs a unique_ptr argument, the decision is made: Having a copy of the raw pointer in node just doesn't do what you want, so you need the pointer-to-unique_ptr.
Related
I have only been using raw pointers for linked list with templates. For example, the member data, Node<T>* head; and when I am inserting a node one of the lines would be head = new Node<T>(data);.
However, now I need to use a smart pointer and I am not sure how I would change it to use smart pointers. Would the member data be changed to shared_ptr<Node<T>> head; and the other line would change to
head = shared_ptr<Node<T>>( new <Node<T>>(data) );?
You do not "need" to use a smart pointer for a linked list, because that statement doesn't make sense. You do not use smart pointers for low-level data structures. You use smart pointers for high-level program logic.
As far as low-level data structures are concerned, you use a standard container class from the C++ standard library, like std::list [*], which solves all your memory-management problems anyway, without using any smart pointers internally.
If you really really need your own highly specialised/optimised custom container class because the entire C++ standard library is unfit for your requirements and you need a replacement for std::list, std::vector, std::unordered_map and other optimised, tested, documented and safe containers – which I very much doubt! –, then you have to manage memory manually anyway, because the point of such a specialised class will almost certainly be the need for techniques like memory pools, copy-on-write or even garbage collection, all of which conflict with a typical smart pointer's rather simplistic deletion logic.
In the words of Herb Sutter:
Never use owning raw pointers and delete, except in rare cases when
implementing your own low-level data structure (and even then keep
that well encapsulated inside a class boundary).
Something along those lines is also expressed in Herb Sutter's and Bjarne Stroustrup's C++ Core Guidelines:
This problem cannot be solved (at scale) by transforming all owning
pointers to unique_ptrs and shared_ptrs, partly because we need/use
owning "raw pointers" as well as simple pointers in the implementation
of our fundamental resource handles. For example, common vector
implementations have one owning pointer and two non-owning pointers.
Writing a linked-list class in C++ with raw pointers can be a useful academic exercise. Writing a linked-list class in C++ with smart pointers is a pointless academic exercise. Using any of these two self-made things in production code is almost automatically wrong.
[*] Or just std::vector, because due to cache locality that will almost always be the better choice anyway.
There are basically two alternatives to set up a smart-pointer enhanced list:
Using std::unique_ptr:
template<typename T>
struct Node
{
Node* _prev;
std::unique_ptr<Node> _next;
T data;
};
std::unique_ptr<Node<T> > root; //inside list
That would be my first choice. The unique-pointer _next takes care there are no memory leaks, whereas _prev is an observing pointer. However, copy constructor and such things -- in case you need them -- need to be defined and implemented by hand.
Using shared_ptr:
template<typename T>
struct Node
{
std::weak_ptr<Node> _prev; //or as well Node*
std::shared_ptr<Node> _next;
T data;
};
std::shared_ptr<Node<T> > root; //inside list
This is alternative is copyable by design and adds further safety because of the weak_ptr, see below. It is less performant than the unique_ptr when it comes to structural changes of the list, such as insertions and removals, e.g. due to thread safety in shared_ptr's control block.
Yet, traversing the list, i.e. dereferencing the pointers, should be as performant as for the unique_ptr.
In both approaches the idea is that one node owns the complete remaining list. Now when a node goes out of scope, there is no danger that the remaining list becomes a memory leak, as the nodes are iteratively destructed (starting from the last one).
The _prev pointer is in both options only an observing pointer: it's task is not to keep the previous nodes alive, but only to provide a link to visit them.
For that, a Node * is usually sufficient (--note: observing pointer means you never do memory related stuff like new, delete on the pointer).
If you want more safety, you can also use a std::weak_ptr which prevents from things like
std::shared_ptr<Node<T> > n;
{
list<T> li;
//fill the list
n = li.root->next->next; //let's say that works for this example
}
n->_prev; //dangling pointer, the previous list does not exists anymore
Using a weak_ptr, you can lock() it and in this way chack whether _prev is still valid.
I would look at the interface of std::list, which is a C++ implementation of linked lists. It seems that you are approaching the templating of your Linked list class wrong. Ideally your linked list should not care about ownership semantics (i.e. whether it is instantiated with raw ptrs, smart pointers or stack allocated variables). An example of ownership sematics with STL containers follows. However, there are better examples of STL and ownership from more authoritative sources.
#include <iostream>
#include <list>
#include <memory>
using namespace std;
int main()
{
// Unique ownership.
unique_ptr<int> int_ptr = make_unique<int>(5);
{
// list of uniquely owned integers.
list<unique_ptr<int>> list_unique_integers;
// Transfer of ownership from my parent stack frame to the
// unique_ptr list.
list_unique_integers.push_back(move(int_ptr));
} // list is destroyed and the integers it owns.
// Accessing the integer here is not a good idea.
// cout << *int_ptr << endl;
// You can make a new one though.
int_ptr.reset(new int(6));
// Shared ownership.
// Create a pointer we intend to share.
shared_ptr<int> a_shared_int = make_shared<int>(5);
{
// A list that shares ownership of integers with anyone that has
// copied the shared pointer.
list<shared_ptr<int>> list_shared_integers;
list_shared_integers.push_back(a_shared_int);
// Editing and reading obviously works.
const shared_ptr<int> a_ref_to_int = list_shared_integers.back();
(*a_ref_to_int)++;
cout << *a_ref_to_int << endl;
} // list_shared_integers goes out of scope, but the integer is not as a
// "reference" to it still exists.
// a_shared_int is still accessible.
(*a_shared_int)++;
cout << (*a_shared_int) << endl;
} // now the integer is deallocated because the shared_ptr goes
// out of scope.
A good exercise to understand ownership, memory allocation/deallocation, and shared pointers is to do a tutorial where you implement your own smart pointers. Then you will understand exactly how to use smart pointers and you will have one of those xen moments where you realise how pretty much everything in C++ comes back to RAII (ownership of resources).
So back to the crux of your question. If you want to stick to Nodes of type T, don't wrap the node in a smart pointer. The Node destructor must delete the underlying raw pointer. The raw pointer may point to a smart pointer itself specified as T. When your "LinkedList"'s class destructor is called it iterates through all Nodes with Node::next and calls delete node; after it obtained the pointer to the next node.
You could create a list where nodes are smart pointers... but this is a very specialised linked list probably called SharedLinkedList or UniqueLinkedList with very different sematics for object creation, popping, etc. Just as an example, a UniqueLinkedList would move a node in the return value when popping a value to a caller. To do metaprogramming for this problem would require the use of partial specialization for different types of T passed. Example, something like:
template<class T>
struct LinkedList
{
Node<T> *head;
};
// The very start of a LinkedList with shared ownership. In all your access
// methods, etc... you will be returning copies of the appropriate pointer,
// therefore creating another reference to the underlying data.
template<class T>
struct LinkedList<std::shared_ptr<T>>
{
shared_ptr<Node<T>> head;
};
Now you start implementing your own STL! You can already see potential for problems as mentioned in the comments to your question with this approach. If nodes have shared_ptr next it will result in a call to that shared Node's destructor, which will call the next shared Node destructor and so forth (stack overflow due to the recursion is possible). So that is why I don't care much for this approach.
Structure will look like
template<typename T> struct Node
{
T data;
shared_ptr<Node<T>> next;
};
Creating of node will look like
shared_ptr<Node<int>> head(new Node<int>);
or
auto head = make_shared<Node>(Node{ 1,nullptr });
dont use smart pointer in graph like data structure because it may cause stack overflow an many performance issue due to recursive call of destructor or inc, decr reference count wich it non optimal due to how dfs and bfs algorithm work
I'm trying to design a tree class in C++, but I'm running into some trouble with node destruction.
If I destroy a node, I don't want to destroy it's entire sub-tree because there might be something else pointed to it. So the obvious solution is the use reference counting. I'd have a weak pointer to the parent, and a vector of shared pointers to the child nodes. That way if a node is destroyed, it's children are only destroyed if nothing is pointing to them.
But I run into another problem here: adding a child to a node. weak_ptr only works if there's already a shared_ptr pointing to an object. And if I adding a child to a node, I don't know where to find a shared_ptr that's pointing to it. So what do I do here?
To expand on David Rodriguez's idea, a skeleton tree might look like this:
struct node : std::enable_shared_from_this<node>
{
std::vector<std::shared_ptr<node>> children;
std::weak_ptr<node> parent;
void add_child()
{
auto n = std::make_shared_node>();
n->parent = std::weak_ptr<node>(shared_from_this());
children.emplace_back(n);
}
}
auto root = std::make_shared<node>();
root.add_child();
root.add_child();
root.add_child();
root.children[0].add_child();
(Of course a real-world node would have a non-trivial constructor with payload values, and add_child would take similar arguments or be a template...)
You might want to look into enable_shared_from_this that allows you to obtain the shared_ptr directly from the object. It still requires that the object is managed by a shared_ptr, but you don't need to find who is holding it.
I'm implementing a vector type. I'm not troubled by the algorithms or the data structure at all but I am unsure about a remove method. for instance:
bool Remove(Node* node)
{
/* rearrange all the links and extract the node */
delete node;
}
where node is a pointer to the current node that we are at. But if I delete node then how do I prevent this from happening:
Node* currentNode = MoveToRandNode();
Remove(currentNode);
cout << currentNode->value;
If currentNode were a pointer to a pointer it would be easier but...it's not.
You could add another level of abstraction to your iterator (which now is a raw pointer)
If you do not handle raw pointers, but create some sort of iterator class instead of a pointer, it is possible to invalidate the iterator, and thus failing controlled if anyone tries to access the iterator after it has been removed.
class Iterator {
Node operator*() {
if (node) return *node;
else throw Something();}
private:
Node* node;
}
Of course this wrapping of a pointer will come at a cost of some overhead (checking the pointer on each deref). So you will have to decide how safe you want to play. Either document as suggested by others or wrap for safety.
Step back first. You need to define who "owns" the memory pointed to by the vector. Is it the vector itself, or the code that uses the vector? Once you define this, the answer will be easy - either Remove() method should always delete it or never.
Note that you've just scratched the surface of the possible bugs and you answer to "who owns it" will help with other possible issues like:
If you copy a vector, do you need to copy the items within it, or just the pointers (e.g. do a shallow or deep copy
When you destroy a vector, should you destroy the items within it?
When you insert an item, should you make a copy of the item, or does the vector take ownership of it?
well, you cannot do that, but some modifications to your code can improve safety.
Add ref
bool Remove(Node*& node)
{
/* rearrange all the links and extract the node */
delete node;
node = nullptr;
}
check for nullptr
if(currentNode)
cout << currentNode->value;
probably you need to try std::shared_ptr
This is similar to "iterator invalidation". E.g., if you have a std::list l and a std::list::iterator it pointing into that list, and you call l.erase(it), then the iterator it is invalidated -- i.e., if you use it in any way then you get undefined behavior.
So following that example, you should include in your documentation of the Remove method something along the lines: "the pointer node is invalidated, and may not be used or dereferenced after this method returns."
(Of course, you could also just use std::list, and not bother to re-invent the wheel.)
For more info on iterator invalidation, see: http://www.angelikalanger.com/Conferences/Slides/CppInvalidIterators-DevConnections-2002.pdf
In addition what innochenti wrote.
I think you have to decide what is expected/desired behavior of cout << currentNode->value;:
Error - (as innochenti wrote node = nullptr)
Default Value - create node devault_value (which has some default value for its value), and after delete node; do node=default_value
I was recently introduced to the existence of auto_ptr and shared_ptr and I have a pretty simple/naive question.
I try to implement a data structure and I need to point to the children of a Node which (are more than 1 and its) number may change. Which is the best alternative and why:
class Node
{
public:
// ...
Node *children;
private:
//...
}
class Node
{
public:
// ...
share_ptr<Node> children;
private:
//...
}
I am not sure, but I think auto_ptr does not work for arrays. I am not, also, sure about whether I should use double pointers. Thanks for any help.
You're right that auto_ptr doesn't work for arrays. When it destroys the object it owns, it uses delete object;, so if you used new objects[whatever];, you'll get undefined behavior. Perhaps a bit more subtly, auto_ptr doesn't fit the requirements of "Copyable" (as the standard defines the term) so you can't create a container (vector, deque, list, etc.) of auto_ptr either.
A shared_ptr is for a single object as well. It's for a situation where you have shared ownership and need to delete the object only when all the owners go out of scope. Unless there's something going on that you haven't told us about, chances are pretty good that it doesn't fit your requirements very well either.
You might want to look at yet another class that may be new to you: Boost ptr_vector. At least based on what you've said, it seems to fit your requirements better than either auto_ptr or shared_ptr would.
I have used std::vector<std::shared_ptr<Node> > children successfully in a similar situation.
The main benefit of using a vector of shared_ptrs rather than an array is that all of the resource management is handled for you. This is especially handy in two situations:
1) When the vector is no longer in scope, it automatically calls delete on all of its contents. In this case, the reference count of the child Node will drop by 1 and if nothing else is referencing it, delete will be called on the object.
2) If you are referencing the Node elsewhere, there is no risk of being left with a dangling pointer to a deleted object. The object will only be deleted when there are no more references to it.
Unless you want behaviour that is substantially more complicated (perhaps there is a reason why an array is necessary), I would suggest this might be a good approach for you.
A simple implementation of the idea:
class Node {
private:
T contents;
std::vector<std::shared_ptr<Node> > children;
public:
Node(T value) : contents(value) {};
void add_child(T value) {
auto p = std::make_shared<Node>(value);
children.push_back(p);
}
std::shared_ptr<Node> get_child(size_t index) {
// Returning a shared pointer ensures the node isn't deleted
// while it is still in use.
return children.at(index);
}
void remove_child(size_t index) {
// The whole branch will be destroyed automatically.
// If part of the tree is still needed (eg. for undo), the
// shared pointer will ensure it is not destroyed.
children.erase(children.begin() + index);
}
};
auto_ptr is deprecated in favor of std::unique_ptr and btw. std::unique_ptr does work for arrays. You just need c++11 support. And there is already lots of resources about smart pointers and move semantics out there. The main difference between auto_ptr and unique_ptr is that auto_ptr does a move when you call the copy constructor and unique_ptr forbids the copy constructor, but allows a move when calling the move constructor. Therefore you need c++11 support with move semantics.
Stroustrup discusses the question of "What is an auto_ptr and why isn't there an auto_array" and concludes that there no need for the latter since the desired functionality can be accomplished with a vector.
http://www.stroustrup.com/bs_faq2.html#auto_ptr
I have general question regarding the use of pointers vs. references in this particular scenario.
Let's say that I have a function that is going to do some computation and store the value inside an object for later use by the caller. I can implement this by using either pointers or references.
Although, I would prefer using references because I trying avoiding pointers as much as possible, are there any pros/cons of one approach over the other.
The code using Pointers would be as follows:
Node*& computeNode() {
// Do some computation before creating a node object.
Node* newNode = new Node;
newNode->member1 = xyz;
newNode->member2 = abc;
// and so on ...
return newNode;
}
The code using references could do something like this:
void computeNode(Node& newNode) {
// Do some computation before assigning values to the node object.
newNode.member1 = xyz;
newNode.member2 = abc;
// and so on.
}
The differences that I can see are as follows:
When using the pointer method, the newNode object is allocated on the Heap. So, unless I call delete on it, it is not going to get deleted. However, in the reference method, whether newNode is allocated on the Heap/Stack depends on what the caller did to create the newNode object.
Whenever we use references, the number of arguments needed to pass to the function increases by at least 1. This is fine, only I find it a bit counter-intuitive to pass the return object also to a function call unless I name the function in such a way that it becomes obvious to the API user.
By using references, I can simulate the return of multiple objects. In the pointer method, I think I will have to wrap all the objects in another structure (like a pair class) and then return it. That increases the overhead.
However, I do not know if usually one is preferred over the other. And if there are any function naming conventions in C++ that let the developer know that he is supposed to pass the return object also as an argument.
You could try returning an auto_ptr or shared_ptr. That would eliminate the issues with delete.
The second approach is probably preferable because there is no possibility of a memory leak, in the event you forget to delete the returned pointer.
It's usually good practice to code in such a way that each function or object which allocates heap memory also deallocates that memory. Your first example violates that practice, making it the function caller's responsibility to deallocate the memory. This makes memory leaks more likely, because now every time the function is called there is another opportunity to forget to delete the returned pointer.
You may also want to consider returning the object by value (which will return a copy of the object) in cases where the size of the object is not that large. Even though this will require a copy to be created, if the object is not so large it won't impact performance. (This method will become a lot more attractive in the future with C++0x move semantics.)
I think your first option should be returning by value (or perhaps make the constructor compute the members?):
Node computeNode()
{
Node n;
n.x = abc;
n.y = xyz;
return n;
}
This may look inefficient, but it is quite possible that copying is elided with NRVO.
If the Node needs to be dynamically allocated anyway, you should return the pointer by value (a copy of the pointer):
Node* computeNode();
Otherwise you will be returning a reference to a local variable (pointer).
I prefer using the second approach to send back information (as you said, allows for multiple "returns" without using an extra structure) and generally return an error or a success code.
Also, I set the purely input arguments as const & to distinguish between the input and output variables.
You can return by value and avoid copies in some situations, by using const references like this :
Node computeNode() {
// Do some computation before creating a node object.
Node newNode;
newNode.member1 = xyz;
newNode.member2 = abc;
return newNode;
}
const Node &n = computeNode();
The lifetime of the temporary object in computeNode is extended upto the scope of the reference n
If the alternatives are really as given, it’s not clear why you need a reference/pointer at all; you could also just return by value:
Node computeNode() {
// Do some computation before creating a node object.
Node newNode;
newNode.member1 = xyz;
newNode.member2 = abc;
return newNode;
}
Despite what many people think, this isn’t actually very inefficient because the compiler can (and will!) elide most of the unnecessary copies.
Semantically, this is the solution that you want, unless the node gets stored somewhere else as well and you need to preserve reference identity.