The C++ STL class std::map implements O(log(n)) look-up using a binary tree. But with trees, it's not immediately obvious how an iterator would work. What does the ++ operator actually mean in a tree structure? Whereas the concept of "next element" has an obvious implementation in an array, for me it's not so obvious in a tree. How would one implement a tree iterator?
For an inorder traversal (probably works for others too), if you have a parent-pointer in your nodes you can do a non-recursive traversal. It should be possible to just store two pointers in your iterator: you need an indication of where you are, and you'll probably (I'm not doing the research now) need something like a "previous" pointer so you can figure out your current movement direction (i.e. do I need to go into the left subtree, or did I just come back from it).
"Previous" will probably be something like "parent", if we've just entered the node; "left" if we're coming back from the left subtree, "right" if we are coming back from the right subtree, and "self" if the last node we returned was our own.
I would like to add my two cents worth as a comment, but since I am not able to I shall have to add an answer. I have been googling and was frustrated because all the answers I found, these excepted, assumed a stack or some other variably-sized data structure. I did find some code. It shows that it can be done without a stack but I found it hard to follow and so decided to attack the problem from first principles.
The first thing to note is that the algorithm is "left-greedy". Thus, when we start at the root we immediately go as far left as possible, since the leftmost node is the one we need first. This means that we never need to consider the left-subtree. It has already been iterated over.
The order of iteration is left subtree, node, right subtree. So if we are positioned at a given node we know that its left subtree and the node itself have been visited and that we should next visit the right subtree, if any, going as far left as possible.
Otherwise, we must go up the tree. if we are going from a left child to its parent then the parent comes next. (Afterwards we will visit its right subtree, as already covered.)
The final case is when we are going from a right child to its parent. The parent has been visited already so we must go up again. In fact we must keep going up until we reach the root or the tree, or find ourselves moving to a parent from its left child. As we have already seen, the parent is the next node in this case. (The root may be indicated by a null pointer, as in my code, or some special sentinel node.)
The following code could easily be adapted for an STL-style iterator
// Go as far left from this node as you can.
// i.e. find the minimum node in this subtree
Node* Leftmost(Node* node)
{
if (node == nullptr)
return nullptr;
while (node->left != nullptr)
node = node->left;
return node;
}
// Start iterating from a root node
Node* First(Node* root)
{
return Leftmost(root);
}
// The iteration is current at node. Return the next node
// in value order.
Node* Next(Node* node)
{
// Make sure that the caller hasn't failed to stop.
assert(node != nullptr);
// If we have a right subtree we must iterate over it,
// starting at its leftmost (minimal) node.
if (node->right != nullptr)
return Leftmost(node->right);
// Otherwise we must go up the tree
Node* parent = node->parent;
if (parent == nullptr)
return nullptr;
// A node comes immediately after its left subtree
if (node == parent->left)
return parent;
// This must be the right subtree!
assert(node == parent->right);
// In which case we need to go up again, looking for a node that is
// its parent's left child.
while (parent != nullptr && node != parent->left)
{
node = parent;
parent = node->parent;
}
// We should be at a left child!
assert(parent == nullptr || node == parent->left);
// And, as we know, a node comes immediately after its left subtree
return parent;
}
Consider the set of all elements in the map that are not less than the current element that are also not the current element. The "next element" is the element from that set of elements that is less than all other elements in that set.
In order to use a map, you must have a key. And that key must implement a "less than" operation. This determines the way the map is formed, such that the find, add, remove, increment, and decrement operations are efficient.
Generally the map internally uses a tree of some kind.
Standard implementation of map iterator operator++ watch in stl_tree.h:
_Self&
operator++() _GLIBCXX_NOEXCEPT
{
_M_node = _Rb_tree_increment(_M_node);
return *this;
}
_Rb_tree_increment implementation is discussed here
Related
I'm creating a recursive function for a binary search tree that removes the minimum node, which would be the leftmost node in the tree. I start at the root and traverse down from there. I'm trying to understand why I'm getting an invalid read of size 8 error. I'm pretty sure the current node I'm at will never be NULL and I created a conditional for if the tree is empty.
void removeMinimumValue()
{
removeMinimumValue(root);
}
void removeMinimumValue(BSTNode *node)
{
if(root==NULL)
exit(1);
else if (node->leftChild==NULL)
delete node;
else
removeMinimumValue(node->leftChild);
}
I suppose this happens during the second removal of the minimum value?
During the first removal, you delete the lowest node, but it's parent still has a reference to it (a "dangling pointer"). Next iteration, the function will try to read the deleted node.
I'd also like to add that if the lowest node has a right child, you have to add it as the left child,of it's parent node. Otherwise you're losing those nodes. You don't even have to check if this node has right children, because if it's NULL, you'd have to write NULL to the left child of the parent of the deleted node.
So change delete node; to
node->parent->leftChild = node->rightChild;
delete node;
I am looking to build my own map class. (Which will behave exactly like the C++ STL) I want to be able to iterate through all the elements in order by key value.
I implemented my map as an unbalanced binary search tree.
So my question is how to do an iterator increment efficiently. One inefficient way is to iterate through every single element in the tree to find the next lowest key. Is there a faster way to do this?
Thank you.
It depends a bit on the implementation details. If the nodes of your unbalanced binary search tree have a "parent" pointer, you could use that to traverse it. Your implementation of ++iterator could look a bit like this:
if (current_node.has_right_child()) {
// We go to the right subtree of the current node and
// take the smallest element of that subtree.
current_node = current_node.right_child();
while (current_node.has_left_child()) {
current_node = current_node.left_child();
}
} else {
// We have to go up. If the current element is the left child of the parent,
// we can just go to the right child of the parent.
// If it is the right child, we have to go further up
while (true) {
if (!current_node.has_parent()) {
// We got up to the root and never found a right child.
// So we are at the end of the iteration.
current_node = NULL;
break;
}
Node* parent = current_node.parent();
bool is_left_child = parent.left_child() == current_node;
current_node = parent;
if (is_left_child) {
// if this was the left child, then the parent is the correct next element.
break;
}
// if this was the right child, we have to go further up
// until we leave this subtree, so we continue iterating.
}
}
If your binary tree does NOT have parent nodes, you could store the parents in the iterator. I.e. you could maintain a vector parents; in which you store the parents of the current node up to the root. If this is still needed, I can provide an implementation, but because you edited my "non parent pointer" version with parent pointers, it seems that you have parent pointers. So I leave it away.
I'm working through a Binary Search Tree tutorial. And I find this function destroy_tree(node* leaf). Its behaviour worries me - I can't imagine how the call stack looks like, can you explain it to me?
void btree::destroy_tree(node* leaf)
{
if (leaf !=NULL)
{
destroy_tree(leaf->left);
destroy_tree(leaf->right);
delete leaf;
}
}
For questions about recursive functions, sometimes it helps to just think of or draw a simple tree and just map out on paper how the function goes through it.
First thing, it's been a while since I used c++, but for the sake of this example I'm going to change your code to:
void btree::destroy_tree(node* leaf)
{
if(leaf !=NULL)
{
if (leaf->left != NULL)
destroy_tree(leaf->left);
if (leaf->right != NULL)
destroy_tree(leaf->right);
delete leaf;
}
}
just so there's less stuff on the stack.
Think about how the logic of this function works recursively through a tree. Take the following tree example which I snagged from Wikipedia
Let's say you call destroy_tree(root). The function destroy_tree(root) calls destroy_tree(node->left) first, then destroy_tree(node->right). This means that left children are always iterated through before ANY right child is. So to use the numbers in the above tree, the tree would traverse in the order: 8,3,1,6,4,7,10,14,13. You can see based on this that all left children are traversed. No right child will be traversed while there is still an untraversed left.
The call stack should look similar as the program runs. Calling destroy_tree(left) will call ``destroy_tree()` on every consecutive left node before any right nodes are reached.
I'm trying to restore memory allocated in a tree by traversing the tree and deleting the memory as necessary. For example, suppose I have the following tree structure:
struct tree
{
int *value;
tree *left;
tree *right;
}
tree *root; //always points to the root of this tree
I know that we have to visit each value after every recursive call, delete it, then move to the next node (which can be left or right), but the recursive process seems very counter intuitive (particularly the part where we move to the left or move to the right).
I'm trying to follow the rule of "do something with the root, recursively call the left, then recursively call the right," but the way the code functions is confusing to me. How can I preserve the invariant of root? If someone can perhaps explain the concept pictorially that would be great.
as you need the tree for traversal during delete the idea is to delete on return
void del_tree(tree *t) {
if (t->left) {
// there is a left subtree existing. delete it
del_tree(t->left); // first go deeper on left side
// left branch now completely empty
delete t->left; // nothing left behind t->left
t->left=0; // just in case
} else {
// there is no left subtree existing
// we are in a leaf or in an unbalanced node
}
if (t->right) {
// there is a right subtree existing. delete it
del_tree(t->right); // now go deeper on right side
// right branch now completely empty
delete t->right; // nothing left behind t->right
t->right=0; // just in case
} else {
// there is no rigt subtree existing
// we are in a leaf or this was an unbalanced node
// (before we deleted the left subtree)
}
// both branches are now completely empty
// either they were from the beginning (leaf)
// or we have successfully reduced this node to a leaf
// now do the node visit
if (t->value) {
delete t->value; // tidy up
t->value=0; // just in case
}
// now we are completely clean and empty
// after return t will be deleted
}
void main() {
tree *my_tree;
// stuff
del_tree(my_tree); // delete the whole tree
delete my_tree; // delete the remaining root node
}
a very important aspect on recursion is when to stop. i assume that a NULL pointer in your struct indicates that there is no subtree.
the strategy is now to go as deep as possible if (t->left) del_tree(t->left);
when we reach a NULL pointer on both, left and right we are stranded in a leaf. we now clean the leaf (deleting value) and return. on return delete t->left; is executed, this node has nothing left on the left subtree and continues on its right subtree.
here i found a nice image of the traversal
the problem of deleting a tree is divided into 3 parts. deleting the left subtree, deleting the right subtree and cleaning up self. the deleting of a subtree (left or right) is very much the same procedure as deleting the tree itself. so you use the same function, this is called recursion.
think of deleting a file system structure. you decide for the strategy to delete the 'left' folder structure first, then you delete the subtree 'right' and finally you delete the file 'value'. when during the execution of this strategy you change into a folder (no matter whether left or right) you notice that the problem looks the same. so you apply this strategy again to any folder in the tree.
what happens is, that you change into the directory left repeatedly unless there is no more directory in the current one. you delete the file 'value'. then you go back one folder and delete the folder named 'left'. now you look for a folder named 'right', change into it, find no folders, delete file 'value' and return to the previous folder. you delete the now empty 'right' and finally delete the file 'values as well. next is to do a further return (backtracking). and so on.
you cannot delete non empty folders during going deeper. you have to delete when retreating.
You can think about your tree as a tree with a root and two leafs and each of the leafs point to the root of an other tree.
In fact, this is how you preserve the "invariant of the root", because as soon as you follow a pointer of a leaf, you end up at the root of an other tree.
root -> branch -> leaf
|
V
branch -> leaf
can be also considered as
root -> tree1
|
V
tree2
which is in turn
root -> (root -> leaf)
|
V
(root -> leaf)
So when you follow a branch in the original, you end up in a root again.
I am working on a binary search tree in C++ at the moment and I have reached the stage where I have to write the remove/delete function(using recursive approach, x = change(x)). I have two options:
to stop at the parent of the node of the node to be deleted;
to get to the node to delete and then call a function that will
return the parent
Approach 1: less expensive, more code
Approach 2: less code, more expensive
Which approach is better according to you, and why?
I disagree that those are your only two options.
I think a simpler solutions is to ask each node weather it should be deleted. If it decides yes then it is deleted and returns the new node that should replace it. If it decides no then it returns itself.
// pseudo code.
deleteNode(Node* node, int value)
{
if (node == NULL) return node;
if (node->value == value)
{
// This is the node I want to delete.
// So delete it and return the value of the node I want to replace it with.
// Which may involve some shifting of things around.
return doDelete(node);
}
else if (value < node->value)
{
// Not node. But try deleting the node on the left.
// whatever happens a value will be returned that
// is assigned to left and the tree will be correct.
node->left = deleteNode(node->left, value);
}
else
{
// Not node. But try deleting the node on the right.
// whatever happens a value will be returned that
// is assigned to right and the tree will be correct.
node->right = deleteNode(node->right, value);
}
// since this node is not being deleted return it.
// so it can be assigned back into the correct place.
return node;
}
The best approach would be to traverse upto the parent of the node to be deleted, and then delete that child node. Eventually using this approach you always visit the child node, since you always have to confirm the child node is the node u want to delete.
I find that the most efficient form for writing functions for tree data structures in general is the following psuedocode format.
function someActionOnTree() {
return someActionOnTree(root)
}
function someActionOnTree (Node current) {
if (current is null) {
return null
}
if (current is not the node I seek) {
//logic for picking the next node to move to
next node = ...
next node = someActionOnTree(next node)
}
else {
// do whatever you need to do with current
// i.e. give it a child, delete its memory, etc
current = ...
}
return current;
}
This recursive function recurses over the vertex set of a data structure. For every iteration of the algorithm, it either looks for a node to recurse the function on, and overwrites the data structure's reference to that node with the value of the algorithm's iteration on that node. Otherwise, it overwrites the node's value (and possibly perform a different set of logic). Finally, the function returns a reference to the parameter node, which is essential for the overwriting step.
This is a generally the most efficient form of code I've found for tree data structures in C++. The concepts apply other structures as well - you can use recursion of this form, where the return value is always a reference to a fixed point in the planar representation of your data structure (basically, always return whatever is supposed to be at the spot you're looking at).
Here's an application of this style to a binary search tree delete function to embellish my point.
function deleteNodeFromTreeWithValue( value ) {
return deleteNodeFromTree(root, value)
}
function deleteNodeFromTree(Node current, value) {
if (current is null) return null
if (current does not represent value) {
if (current is greater than my value) {
leftNode = deleteNodeFromTree(leftNode, value)
} else {
rightNode = deleteNodeFromTree(rightNode, value)
}
}
else {
free current's memory
current = null
}
return current
}
Obviously, there are many other ways to write this code, but from my experience, this has turned out to be the most effective method. Note that performance isn't really hit by overwriting pointers, since the hardware already cached the nodes. If you're looking into improving performance of your search tree, I'd recommend looking into specialized trees, like self-balancing ones (AVL trees), B-trees, red-black trees, etc.