Tree data structure implementation using c++ - c++

I have given below assignment by my teacher to implement but I am not sure what can be effective approach to implement the same. I will appreciate if somebody share pointers to proceed further.
Problem description:
Assume a tree object having a depth of "n" (n is a known number). For the sake of simplicity, it can be assumed that each node in the tree is a simple string. The number of siblings and children for each node and at each level could be a variable number. It can also be assumed that there will be an API which can return us the number of children for each node (A dummy API can be used which returns a random whole number for this exercise). Also, each parent node has a node number range associated with it. The children of that parent will have node numbers which fall in the parent range. In effect:
1. Each node will have a range (whole number) to identify its children
2. Each node will also have a unique node number to identify itself
With these requirements in place, design the following:
1. The tree data structure
2. An efficient numbering algorithm for the tree nodes
The tree should support the following operations:
Insertion of individual nodes at any level
Bulk insertion of nodes at any level (As far as possible, bulk insertion should not result in any renumbering of any parent/sibling node number ranges)

Related

Should B-Tree nodes contain a pointer to their parent (C++ implementation)?

I am trying to implement a B-tree and from what I understand this is how you split a node:
Attempt to insert a new value V at a leaf node N
If the leaf node has no space, create a new node and pick a middle value of N and anything right of it move to the new node and anything to the left of the middle value leave in the old node, but move it left to free up the right indices and insert V in the appropriate of the now two nodes
Insert the middle value we picked into the parent node of N and also add the newly created node to the list of children of the parent of N (thus making N and the new node siblings)
If the parent of N has no free space, perform the same operation and along with the values also split the children between the two split nodes (so this last part applies only to non-leaf nodes)
Continue trying to insert the previous split's middle point into the parent until you reach the root and potentially split the root itself, making a new root
This brings me to the question - how do I traverse upwards, am I supposed to keep a pointer of the parent?
Because I can only know if I have to split the leaf node when I have reached it for insertion. So once I have to split it, I have to somehow go back to its parent and if I have to split the parent as well, I have to keep going back up.
Otherwise I would have to re-traverse the tree again and again each time to find the next parent.
Here is an example of my node class:
template<typename KEY, typename VALUE, int DEGREE>
struct BNode
{
KEY Keys[DEGREE];
VALUE Values[DEGREE];
BNode<KEY, VALUE, DEGREE>* Children[DEGREE + 1];
BNode<KEY, VALUE, DEGREE>* Parent;
bool IsLeaf;
};
(Maybe I should not have an IsLeaf field and instead just check if it has any children, to save space)
Even if you don't use recursion or an explicit stack while going down the tree, you can still do it without parent pointers if you split nodes a bit sooner with a slightly modified algorithm, which has this key characteristic:
When encountering a node that is full, split it, even when it is not a leaf.
With this pre-emptive splitting algorithm, you only need to keep a reference to the immediate parent (not any other ancestor) to make the split possible, since now it is guaranteed that a split will not lead to another, cascading split more upwards in the tree. This algorithm requires that the maximum degree (number of children) of the B-tree is even (as otherwise one of the two split nodes would have too few keys to be considered valid).
See also Wikipedia which describes this alternative algorithm as follows:
An alternative algorithm supports a single pass down the tree from the root to the node where the insertion will take place, splitting any full nodes encountered on the way preemptively. This prevents the need to recall the parent nodes into memory, which may be expensive if the nodes are on secondary storage. However, to use this algorithm, we must be able to send one element to the parent and split the remaining 𝑈−2 elements into two legal nodes, without adding a new element. This requires 𝑈 = 2𝐿 rather than 𝑈 = 2𝐿−1, which accounts for why some textbooks impose this requirement in defining B-trees.
The same article defines 𝑈 and 𝐿:
Every internal node contains a maximum of 𝑈 children and a minimum of 𝐿 children.
For a comparison with the standard insertion algorithm, see also Will a B-tree with preemptive splitting always have the same height for any input order of keys?
You don't need parent pointers if all your operations start at the root.
I usually code the insert recursively, such that calling node.insert(key) either returns null or a new key to insert at its parent's level. The insert starts with root.insert(key), which finds the appropriate child and calls child.insert(key).
When a leaf node is reached the insert is performed, and non-null is returned if the leaf splits. The parent would then insert the new internal key and return non-null if it splits, etc. If root.insert(key) returns non-null, then it's time to make a new root

Is there a way to access non leaf nodes in a C++ Boost rtree

Sorry in advance, this a very specific question and I cannot provide any piece of code as this is for my job, thus confidential.
I am using the Boost R-trees, and an algorithm that I need to implement requires to access the non leaf nodes of the tree. With Boost rtree library, I only can access leaf nodes in an easy way. I noticed that there is a function to print all the nodes including the non leaf nodes (which means they exist, they are computed), with their position, their level in the tree etc, but I cannot access them the same way than the leaf nodes.
For now, the best solution that I have is to implement a visitor for the tree and overload the operator () to gather the nodes (this is what the print method does to access the nodes).
My question is, does anybody know an easier way to access the non leaf nodes ? Because this one does not seem to be efficient, and I'm loosing time each time I want to access a non leaf node. Moreover, I need to replicate the structure of the tree without the points, and I cannot do that if I cannot access the non leaf nodes.
Thank you in advance !
I don't know what would you like to do exactly so this will be a general answer.
In order to access the tree nodes for the first time you have to traverse the tree structure. In Boost.Geometry rtree visitor pattern is used for that. You could do it manually but internally Boost.Variant is used to represent the nodes so you'll end up with variant visitor instead. At this point you have a few options depending what are you going to do with the nodes. Are you going to modify the r-tree? Will the rtree be moved in memory? Will the addresses of nodes change? How many nodes are you going to access? Do you want to store some kind of reference to a node and traverse the tree structure from that point? Do you want to traverse the structure downward or upward?
One option as you noticed is to traverse the tree structure each time. This is a good approach if the tree structure can change. The obvious drawback is that you have to check all child nodes at each node using some condition (whatever you do in order to pick the node of interest).
If the tree structure does not change but the tree is copied to a different place in memory you can represent the node as a path from the root to the node of interest as list of indexes of child nodes. E.g. a list {1, 2, 3} meaning: traverse the tree using child node 1 of root node, then at the next level pick child node 2, then your node will be child node 3 at the next level. In this case you still have to traverse the tree but doesn't have to check conditions again.
If the tree does not change and nodes stays in the same place in memory you can simply use pointers or references.

Is there any data structure in C++ STL for performing insertion, searching and retrieval of kth element in log(n)?

I need a data structure in c++ STL for performing insertion, searching and retrieval of kth element in log(n)
(Note: k is a variable and not a constant)
I have a class like
class myClass{
int id;
//other variables
};
and my comparator is just based on this id and no two elements will have the same id.
Is there a way to do this using STL or I've to write log(n) functions manually to maintain the array in sorted order at any point of time?
Afaik, there is no such datastructure. Of course, std::set is close to this, but not quite. It is a red black tree. If each node of this red black tree was annotated with the tree weight (the number of nodes in the subtree rooted at this node), then a retrieve(k) query would be possible. As there is no such weight annotation (as it takes valuable memory and makes insert/delete more complex as weights have to be updated), it is impossible to answer such a query efficently with any search tree.
If you want to build such a datastructure, use a conventional search tree implementation (red-black,AVL,B-Tree,...) and add a weight field to each node that counts the number of entries in its subtree. Then searching for the k-th entry is quite simple:
Sketch:
Check the weight of the child nodes, and find the child c which has the largest weight (accumulated from left) that is not greater than k
Subtract from k all weights of children that are left of c.
Descend down to c and call this procedure recursively.
In case of a binary search tree, the algorithm is quite simple since each node only has two children. For a B-tree (which is likely more efficient), you have to account as many children as the node contains.
Of course, you must update the weight on insert/delete: Go up the tree from the insert/delete position and increment/decrement the weight of each node up to the root. Also, you must exchange the weights of nodes when you do rotations (or splits/merges in the B-tree case).
Another idea would be a skip-list where the skips are annotated with the number of elements they skip. But this implementation is not trivial, since you have to update the skip length of each skip above an element that is inserted or deleted, so adjusting a binary search tree is less hassle IMHO.
Edit: I found a C implementation of a 2-3-4 tree (B-tree), check out the links at the bottom of this page: http://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtree.html
You can not achieve what you want with simple array or any other of the built-in containers. You can use a more advanced data structure for instance a skip list or a modified red-black tree(the backing datastructure of std::set).
You can get the k-th element of an arbitrary array in linear time and if the array is sorted you can do that in constant time, but still the insert will require shifting all the subsequent elements which is linear in the worst case.
As for std::set you will need additional data to be stored at each node to be able to get the k-th element efficiently and unfortunately you can not modify the node structure.

Remove an element from unbalanced binary search tree

I have been wanting to write remove() method for my Binary Search Tree (which happens to be an array representation). But before writing it, I must consider all cases. Omitting all cases (since they are easy) except when the node has two children, in all the explanations I have read so far, most of the cases I see remove an element from an already balanced binary search tree. In the few cases where I have seen an element being removed from an unbalanced binary search tree, I find that they balance it through zigs and zags, and then remove the element.
Is there a way that I can possibly remove an element from an unbalanced binary search tree without having to balance it beforehand?
If not, would it be easier to write an AVL tree (in array representation)?
You don't need to balance it, but you do need to recursively go down the tree performing some swaps here and there so you actually end up with a valid BST.
Deletion of a node with 2 children in an (unbalanced) BST: (from Wikipedia)
Call the node to be deleted N. Do not delete N. Instead, choose either its in-order successor node or its in-order predecessor node, R. Copy the value of R to N, then recursively call delete on R until reaching one of the first two cases.
Deleting a node with two children from a binary search tree. First the rightmost node in the left subtree, the inorder predecessor 6, is identified. Its value is copied into the node being deleted. The inorder predecessor can then be easily deleted because it has at most one child. The same method works symmetrically using the inorder successor labelled 9.
Although, why do you want an unbalanced tree? All operations on it take on it take longer (or at least as long), and the additional overhead to balance doesn't change the asymptotic complexity of any operations. And, if you're using the array representation where the node at index i has children at indices 2i and 2i+1, it may end up fairly sparse, i.e. there will be quite a bit of wasted memory.

What does "tree" refer to in breadth and depth based searches?

I need some working snippets on C++ code regarding breadth/depth first searches. Also, in the links below, when using the term tree, is it in reference to a binary tree or more specefically a red and black tree? Or is this a more abstract tree of sorts? Does anyone have a link to working code for these searches...along with constructing the tree?
Tree seems to refer to some sort of constuct with in the "graph"? I believe this is some sort of math I have not taken yet.
breadth or depth first search 1
breadth or depth first search 2
The tree in question is the thing they're searching. It's kinda hard to understand search algorithms without knowing what it is they are searching through.
A tree is a type of graph. A graph is a series of nodes (which presumably represent some data) with connections between certain nodes. A tree is a graph where the connections between nodes form a hierarchy. For any given node in the graph, it has exactly one "parent" that points to it, and it points to zero or more child nodes. And the nodes cannot form circles; a parent cannot point to a child who points to that parent.
Basically, like branches on a tree.
The term "tree" refers to any data structure that can be abstractly looked at as a tree.
A "tree" is a data structure in which there are parent nodes and child nodes, and each child has a single parent, with a single "root" node not having a parent.
If a node in your tree has multiple parents, it is called a "graph".
A tree is a special case of a directed acyclic graph (basically a bunch of 'nodes' with arrows ('edges') pointing at each other, such that there cannot be a loop of arrows) in which the following two conditions hold:
No node has more than one incoming edge
There exists a single distinguished node (the 'root') from which all other nodes are reachable.
The nodes reachable via an outgoing edge from some node N are often called N's children.
Breadth-first and depth-first search apply to generic trees (indeed, they apply to all DAGs). However there are some more specific types:
Binary trees are trees in which no node has more than two outgoing edges; outgoing edges are labelled, usually as 'left' and 'right'
Search trees are binary trees in which each node has a key; further, the key in some node N is greater than the child on its left edge (if any) and less than the child on its right edge (if any). This allows for very fast searching for a specific key.
Red-black trees are a specific kind of search tree in which a moderately complex algorithm is used to make sure all keys are approximately the same distance from the root.