When programming a simple binary search tree data structure (non self-balancing), the advice most resources give when deleting a node with two children is to copy the data out of one of the left child to the node that is being deleted. Is this bad practice? Wouldn't some sort of pointer manipulation provide faster results? Is there a BST rotation algorithm that can generalize this?
Yes, you don't want to copy the node, you just want to "move" it (i.e., change pointers around) to put it into the spot of the one you're deleting. If you're not trying to maintain balance, you don't really need to do any kind of rotation though -- you just pick the right-most node in the left sub-tree (or the left-most node in the right sub-tree). You delete that from its current place, and insert it into the place of the node you need to delete (all strictly by manipulating pointers).
Copying data has O(1) complexity vs. possible O(N) when manipulating pointers: the source node (the right-most node in the left sub-tree or the left-most node in the right sub-tree) may have one child and a sub-tree under it. Unlike a single node insertion, merging sub-trees would be required in this case. To avoid copying overhead, one should store pointers to objects rather than objects in BST.
Related
I am trying to implement a B-tree and from what I understand this is how you split a node:
Attempt to insert a new value V at a leaf node N
If the leaf node has no space, create a new node and pick a middle value of N and anything right of it move to the new node and anything to the left of the middle value leave in the old node, but move it left to free up the right indices and insert V in the appropriate of the now two nodes
Insert the middle value we picked into the parent node of N and also add the newly created node to the list of children of the parent of N (thus making N and the new node siblings)
If the parent of N has no free space, perform the same operation and along with the values also split the children between the two split nodes (so this last part applies only to non-leaf nodes)
Continue trying to insert the previous split's middle point into the parent until you reach the root and potentially split the root itself, making a new root
This brings me to the question - how do I traverse upwards, am I supposed to keep a pointer of the parent?
Because I can only know if I have to split the leaf node when I have reached it for insertion. So once I have to split it, I have to somehow go back to its parent and if I have to split the parent as well, I have to keep going back up.
Otherwise I would have to re-traverse the tree again and again each time to find the next parent.
Here is an example of my node class:
template<typename KEY, typename VALUE, int DEGREE>
struct BNode
{
KEY Keys[DEGREE];
VALUE Values[DEGREE];
BNode<KEY, VALUE, DEGREE>* Children[DEGREE + 1];
BNode<KEY, VALUE, DEGREE>* Parent;
bool IsLeaf;
};
(Maybe I should not have an IsLeaf field and instead just check if it has any children, to save space)
Even if you don't use recursion or an explicit stack while going down the tree, you can still do it without parent pointers if you split nodes a bit sooner with a slightly modified algorithm, which has this key characteristic:
When encountering a node that is full, split it, even when it is not a leaf.
With this pre-emptive splitting algorithm, you only need to keep a reference to the immediate parent (not any other ancestor) to make the split possible, since now it is guaranteed that a split will not lead to another, cascading split more upwards in the tree. This algorithm requires that the maximum degree (number of children) of the B-tree is even (as otherwise one of the two split nodes would have too few keys to be considered valid).
See also Wikipedia which describes this alternative algorithm as follows:
An alternative algorithm supports a single pass down the tree from the root to the node where the insertion will take place, splitting any full nodes encountered on the way preemptively. This prevents the need to recall the parent nodes into memory, which may be expensive if the nodes are on secondary storage. However, to use this algorithm, we must be able to send one element to the parent and split the remaining 𝑈−2 elements into two legal nodes, without adding a new element. This requires 𝑈 = 2𝐿 rather than 𝑈 = 2𝐿−1, which accounts for why some textbooks impose this requirement in defining B-trees.
The same article defines 𝑈 and 𝐿:
Every internal node contains a maximum of 𝑈 children and a minimum of 𝐿 children.
For a comparison with the standard insertion algorithm, see also Will a B-tree with preemptive splitting always have the same height for any input order of keys?
You don't need parent pointers if all your operations start at the root.
I usually code the insert recursively, such that calling node.insert(key) either returns null or a new key to insert at its parent's level. The insert starts with root.insert(key), which finds the appropriate child and calls child.insert(key).
When a leaf node is reached the insert is performed, and non-null is returned if the leaf splits. The parent would then insert the new internal key and return non-null if it splits, etc. If root.insert(key) returns non-null, then it's time to make a new root
I have been wanting to write remove() method for my Binary Search Tree (which happens to be an array representation). But before writing it, I must consider all cases. Omitting all cases (since they are easy) except when the node has two children, in all the explanations I have read so far, most of the cases I see remove an element from an already balanced binary search tree. In the few cases where I have seen an element being removed from an unbalanced binary search tree, I find that they balance it through zigs and zags, and then remove the element.
Is there a way that I can possibly remove an element from an unbalanced binary search tree without having to balance it beforehand?
If not, would it be easier to write an AVL tree (in array representation)?
You don't need to balance it, but you do need to recursively go down the tree performing some swaps here and there so you actually end up with a valid BST.
Deletion of a node with 2 children in an (unbalanced) BST: (from Wikipedia)
Call the node to be deleted N. Do not delete N. Instead, choose either its in-order successor node or its in-order predecessor node, R. Copy the value of R to N, then recursively call delete on R until reaching one of the first two cases.
Deleting a node with two children from a binary search tree. First the rightmost node in the left subtree, the inorder predecessor 6, is identified. Its value is copied into the node being deleted. The inorder predecessor can then be easily deleted because it has at most one child. The same method works symmetrically using the inorder successor labelled 9.
Although, why do you want an unbalanced tree? All operations on it take on it take longer (or at least as long), and the additional overhead to balance doesn't change the asymptotic complexity of any operations. And, if you're using the array representation where the node at index i has children at indices 2i and 2i+1, it may end up fairly sparse, i.e. there will be quite a bit of wasted memory.
How do i implement a Queue using a BST.
Is this the way to do it, keep on inserting the nodes in the tree while maintaining a count value associated with each and every node,but while deletion BST should work like queue(FIFO) so start deleting from BST with the node having lowest count value in the tree.
Did i get the question and solution right? If not,then please explain me the question.
A BST is really an inappropriate data structure to use to back a queue. You really ought to use a linked list instead, because it would be way faster, less complicated, and plain old better.
However, if you insist on using a BST...
You would use the BST as a priority queue, and define a wrapper type that also holds a 'queue index', which is what the items would be sorted by. You would have to define the comparison to take into account the current queue index though, because otherwise you could only ever add as many items as the difference between the highest and lowest values of your index type.
You can have a Queue like this:
BST // to store data
pointer to head; // Points to the head of the Queue
pointer to tail // Points to the tail of the Queue
You add to the nodes structs of BST also a pointer to another node that will represent the order of insertion.
struct Node{
int x;
//left pointer
//right pointer
struct Node *next_queune_element;
}
During the insertion
When you want to add an element, you first access the node that the pointer tail points to and make it point to the new element that you just inserted (the BST node). Then you update the tail pointer to point to the new element.
During the deletion
When you remove an element, you first access the node that the head pointer points to, you store the next_queune_element in an auxiliary temporary variable and remove the node. Finally, make the head pointer to point to the auxiliary temporary variable.
I think a binary tree would be the desired data structure here and not a binary search tree. Using a binary tree for implementing a queue might be usefull when doing functional programming. You can do it using a binary tree which stays height balanced after each push and pop operation so they will always be O(log n). Push and pop look like:
a function to insert an element to the very left of the tree (the push function);
a function to delete an element from the very right of the tree (the pop fuction).
In both cases rebalancing won't violate the insertion order. Both are easy too implement as well. You are in fact using an AVL tree with altered insert functions. A bonus is the elements do not need to be (totally) orderable.
I want to add Bi-Directional Iterator (like Iterator exported by std::set) in my Parametrized BinaryTree class but I'm unable to comeup with any algorithm.
Simply structure of Binary tree node is , it contains three pointers , left , right , parent:
With the given structure you want to proceed like this:
To start the iteration you would find the left-most node.
To go to the next node the operation depends on where you currently are:
If your node has a right child you go to this child and find its left-most successor (if there is no left child the node you are on is the next node).
If you nodes doesn't have a right child you move up the chain of parents until you find a parent for which you used the link to the left node: the next node becomes this node.
To move in the other direction you reverse the roles of left and right.
Effectively, this is implements a stack-less in-order traversal of the tree. If your tree isn't changed while iterating (an unlikely scenario) or you don't have a link to the parent node, you can maintain the stack explicitly in the iterator.
A good approach to this issue may be to first write your recursive pre-order algorithm, without using templates, and then you can from that create a templated version and implement the correct Iterators.
Just a thought.
You can't use recursion to implement an iterator in C++ because your iterator needs to return from all processing before it can return the result.
Only languages like C# and Python, that have a concept of yield can use recursion to create iterators.
Your iterator needs to maintain a stack of yet-to-be-visited nodes.
Of the top of my head, I think the algorithm is something like:
Keep going down and to the left
Every time you come across a right branch, add it to the stack
If at any point you can't go left, pop the first branch off the stack and begin visiting that in the same way.
How do multisets work? If a set can't have a value mapped to a key, does it only hold keys?
Also, how do associative containers work? I mean vector and deque in the memory is located sequentially it means that deleting/removing (except beginning [deque] and end [vector, deque]) are slow if they are large.
And list is a set of pointers which are not sequentially located in the memory which causes longer search but faster delete/remove.
How are sets, maps, multisets and multimaps stored and how do they work?
These 4 containers are typically all implemented using "nodes". A node is an object that stores one element. In the [multi]set case, the element is just the value; in the [multi]map case each node stores one key and its associated value. A node also stores multiple pointers to other nodes. Unlike a list, the nodes in sets and maps form a tree. You'd typically arrange it such that branches on the "left" of a certain node have values less than that node, while branches on the "right" of a certain node have values higher than that node.
Operations like finding a map key/set value are now quite fast. Start at the root node of the tree. If that matches, you're done. If the root is larger, search in the left branch. If the root is smaller than the value you're looking for, follow the pointer to the right branch. Repeat until you find a value or an empty branch.
Inserting an element is done by creating a new node, finding the location in the tree where it should be placed, and then inserting the node there by adjusting the pointers around it. Finally, there is a "rebalancing" operation to prevent your tree from ending up all out of balance. Ideally each right and left branch is about the same size. Rebalancing works by shifting some nodes from the left to the right or vice versa. E.g. if you have values {1 2 3} and your root node would be 1, you'd have 2 and 3 on the left branch and an empty right branch:
1
\
2
\
3
This is rebalanced by picking 2 as the new root node:
2
/ \
1 3
The STL containers use a smarter, faster rebalancing technique but that level of detail should not matter. It's not even specified in the standard which better technique should be used so implementations can differ.
There can be any implementation, as long as they match the standard specifications for those containers.
AFAIK, the associative containers are implemented as binary trees (red-black). More details... depending on the implementation.
All associate container classes(map,multi-map,set,multi-set)are implemented with Red and Black(R-B Tree) tree. So the R-B tree implementation could be similar to this:-
struct Rb_node {
int value;
struct node *left, *right;
int color;
int size;
};