How to find the top k largest elements more efficiently - c++

How to find the k largest elements in a binary search tree faster than in O(logN + k)
I implemented the algorithm with the said asymptotics, but how to make it faster?

Extend your tree data structure with the following:
Make your tree threaded, i.e. add a parent reference to each node.
Maintain a reference to the node that has the maximum value (the "rightmost" node). Keep it up to date as nodes are added/removed.
With that information you can avoid the first descent from the root to the rightmost node, and start collecting values immediately. If the binary tree is well balanced, then the rightmost node will be on (or near) the bottom layer of the tree. Then the walk along the tree in reversed inorder sequence -- for finding the 𝑘 greatest valued nodes -- will make you traverse a number of edges that is O(𝑘).
Alternative structures, such as B+ tree and skip list can also provide O(𝑘) access to the 𝑘 greatest values they store.

Related

Find the median of binary search tree, C++

Once I was interviewed by "One well known company" and the interviewer asked me to find the median of BST.
int median(treeNode* root)
{
}
I started to implement the first brute-force solution that I came up with. I fill all the data into a std::vector<int> with inorder traversal (to get everything sorted in the vector) and got the middle element.
So my algo is O(N) for inserting every element in the vector and query of middle element with O(1), + O(N) of memory.
So is there more effective way (in terms of memory or in terms of complexity) to do the same thing.
Thanks in advance.
It can be done in O(n) time and O(logN) space by doing an in-order traversal and stopping when you reach the n/2th node, just carry a counter that tells you how many nodes have been already traversed - no need to actually populate any vector.
If you can modify your tree to ranks-tree (each node also has information about the number of nodes in the subtree it's a root of) - you can easily solve it in O(logN) time, by simply moving torward the direction of n/2 elements.
Since you know that the median is the middle element of a sorted list of elements, you can just take the middle element of your inorder traversal and stop there, without storing the values in a vector. You might need two traversals if you don't know the number of nodes, but it will make the solution use less memory (O(h) where h is the height of your tree; h = O(log n) for balanced search trees).
If you can augment the tree, you can use the solution I gave here to get an O(log n) algorithm.
The binary tree offers a sorted view for your data but in order to take advantage of it, you need to know how many elements are in each subtree. So without this knowledge your algorithm is fast enough.
If you know the size of each subtree, you select each time to visit the left or the right subtree, and this gives an O(log n) algorithm if the binary tree is balanced.

Is there any data structure in C++ STL for performing insertion, searching and retrieval of kth element in log(n)?

I need a data structure in c++ STL for performing insertion, searching and retrieval of kth element in log(n)
(Note: k is a variable and not a constant)
I have a class like
class myClass{
int id;
//other variables
};
and my comparator is just based on this id and no two elements will have the same id.
Is there a way to do this using STL or I've to write log(n) functions manually to maintain the array in sorted order at any point of time?
Afaik, there is no such datastructure. Of course, std::set is close to this, but not quite. It is a red black tree. If each node of this red black tree was annotated with the tree weight (the number of nodes in the subtree rooted at this node), then a retrieve(k) query would be possible. As there is no such weight annotation (as it takes valuable memory and makes insert/delete more complex as weights have to be updated), it is impossible to answer such a query efficently with any search tree.
If you want to build such a datastructure, use a conventional search tree implementation (red-black,AVL,B-Tree,...) and add a weight field to each node that counts the number of entries in its subtree. Then searching for the k-th entry is quite simple:
Sketch:
Check the weight of the child nodes, and find the child c which has the largest weight (accumulated from left) that is not greater than k
Subtract from k all weights of children that are left of c.
Descend down to c and call this procedure recursively.
In case of a binary search tree, the algorithm is quite simple since each node only has two children. For a B-tree (which is likely more efficient), you have to account as many children as the node contains.
Of course, you must update the weight on insert/delete: Go up the tree from the insert/delete position and increment/decrement the weight of each node up to the root. Also, you must exchange the weights of nodes when you do rotations (or splits/merges in the B-tree case).
Another idea would be a skip-list where the skips are annotated with the number of elements they skip. But this implementation is not trivial, since you have to update the skip length of each skip above an element that is inserted or deleted, so adjusting a binary search tree is less hassle IMHO.
Edit: I found a C implementation of a 2-3-4 tree (B-tree), check out the links at the bottom of this page: http://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtree.html
You can not achieve what you want with simple array or any other of the built-in containers. You can use a more advanced data structure for instance a skip list or a modified red-black tree(the backing datastructure of std::set).
You can get the k-th element of an arbitrary array in linear time and if the array is sorted you can do that in constant time, but still the insert will require shifting all the subsequent elements which is linear in the worst case.
As for std::set you will need additional data to be stored at each node to be able to get the k-th element efficiently and unfortunately you can not modify the node structure.

Remove an element from unbalanced binary search tree

I have been wanting to write remove() method for my Binary Search Tree (which happens to be an array representation). But before writing it, I must consider all cases. Omitting all cases (since they are easy) except when the node has two children, in all the explanations I have read so far, most of the cases I see remove an element from an already balanced binary search tree. In the few cases where I have seen an element being removed from an unbalanced binary search tree, I find that they balance it through zigs and zags, and then remove the element.
Is there a way that I can possibly remove an element from an unbalanced binary search tree without having to balance it beforehand?
If not, would it be easier to write an AVL tree (in array representation)?
You don't need to balance it, but you do need to recursively go down the tree performing some swaps here and there so you actually end up with a valid BST.
Deletion of a node with 2 children in an (unbalanced) BST: (from Wikipedia)
Call the node to be deleted N. Do not delete N. Instead, choose either its in-order successor node or its in-order predecessor node, R. Copy the value of R to N, then recursively call delete on R until reaching one of the first two cases.
Deleting a node with two children from a binary search tree. First the rightmost node in the left subtree, the inorder predecessor 6, is identified. Its value is copied into the node being deleted. The inorder predecessor can then be easily deleted because it has at most one child. The same method works symmetrically using the inorder successor labelled 9.
Although, why do you want an unbalanced tree? All operations on it take on it take longer (or at least as long), and the additional overhead to balance doesn't change the asymptotic complexity of any operations. And, if you're using the array representation where the node at index i has children at indices 2i and 2i+1, it may end up fairly sparse, i.e. there will be quite a bit of wasted memory.

Merging AVL trees using an empty tree (C++ templates)

As part of an AVL template I am working on (C++ templates) I was trying to merge 2 AVL trees in O(n1+n2) complexity when n1+n2 is the total elements in both trees.
I thought about the next algorithm.
Inorder traversal on the 1st tree and build an array/list - O(n1)
Inorder traversal on the 2nd tree and build an array/list - O(n2)
Merge sort of those 2 arrays and build final sorted array/list in the size of n1+n2 - O(n1+n2)
Build an empty almost complete binary tree with n1+n2 vertices - O(n1+n2).
Inorder traversal on that almost complete binary tree while updating the vertices with the elemets in the merged array/list.
My question is how do I actually build the empty almost complete binary tree with n1+n2 vertices?
If the nodes issue by the merge sort are stored in a vector, it can be done relatively easily. Your nodes are already sorted, so you can "insert" the nodes in the following fashion:
Build your root node from the element at 1/2 of the array;
Build the root's child nodes using the elements at 1/4 and 3/4 of the array;
Repeat 2 recursively.
This should feel to you as an in-order traversal of a binary tree that happens to be represented as a sorted array.
Note that for this to work, you need to build the tree with balancing "turned off". This is most likely going to require you to make this a private method of your class, probably a special constructor.

Concatenating/Merging/Joining two AVL trees

Assume that I have two AVL trees and that each element from the first tree is smaller then any element from the second tree. What is the most efficient way to concatenate them into one single AVL tree? I've searched everywhere but haven't found anything useful.
Assuming you may destroy the input trees:
remove the rightmost element for the left tree, and use it to construct a new root node, whose left child is the left tree, and whose right child is the right tree: O(log n)
determine and set that node's balance factor: O(log n). In (temporary) violation of the invariant, the balance factor may be outside the range {-1, 0, 1}
rotate to get the balance factor back into range: O(log n) rotations: O(log n)
Thus, the entire operation can be performed in O(log n).
Edit: On second thought, it is easier to reason about the rotations in the following algorithm. It is also quite likely faster:
Determine the height of both trees: O(log n).
Assuming that the right tree is taller (the other case is symmetric):
remove the rightmost element from the left tree (rotating and adjusting its computed height if necessary). Let n be that element. O(log n)
In the right tree, navigate left until you reach a node whose subtree is at most one 1 taller than left. Let r be that node. O(log n)
replace that node with a new node with value n, and subtrees left and r. O(1)
By construction, the new node is AVL-balanced, and its subtree 1 taller than r.
increment its parent's balance accordingly. O(1)
and rebalance like you would after inserting. O(log n)
One ultra simple solution (that works without any assumptions in the relations between the trees) is this:
Do a merge sort of both trees into one merged array (concurrently iterate both trees).
Build an AVL tree from the array - take the middle element to be the root, and apply recursively to left and right halves.
Both steps are O(n). The major issue with it is that it takes O(n) extra space.
The best solution I read to this problem can be found here. Is very close to meriton's answer if you correct this issue:
In the third step of the algorithm navigates left until you reach the node whose sub tree has the same height as the left tree. This is not always possible, (see counterexample image). The right way to do this step is two find for a subtree with height h or h+1 where h is the height of the left tree
I suspect that you'll just have to walk one tree (hopefully the smaller) and individually add each of it's elements to the other tree. The AVL insert/delete operations are not designed to handle adding a whole subtree at a time.