Concatenating/Merging/Joining two AVL trees - c++

Assume that I have two AVL trees and that each element from the first tree is smaller then any element from the second tree. What is the most efficient way to concatenate them into one single AVL tree? I've searched everywhere but haven't found anything useful.

Assuming you may destroy the input trees:
remove the rightmost element for the left tree, and use it to construct a new root node, whose left child is the left tree, and whose right child is the right tree: O(log n)
determine and set that node's balance factor: O(log n). In (temporary) violation of the invariant, the balance factor may be outside the range {-1, 0, 1}
rotate to get the balance factor back into range: O(log n) rotations: O(log n)
Thus, the entire operation can be performed in O(log n).
Edit: On second thought, it is easier to reason about the rotations in the following algorithm. It is also quite likely faster:
Determine the height of both trees: O(log n).
Assuming that the right tree is taller (the other case is symmetric):
remove the rightmost element from the left tree (rotating and adjusting its computed height if necessary). Let n be that element. O(log n)
In the right tree, navigate left until you reach a node whose subtree is at most one 1 taller than left. Let r be that node. O(log n)
replace that node with a new node with value n, and subtrees left and r. O(1)
By construction, the new node is AVL-balanced, and its subtree 1 taller than r.
increment its parent's balance accordingly. O(1)
and rebalance like you would after inserting. O(log n)

One ultra simple solution (that works without any assumptions in the relations between the trees) is this:
Do a merge sort of both trees into one merged array (concurrently iterate both trees).
Build an AVL tree from the array - take the middle element to be the root, and apply recursively to left and right halves.
Both steps are O(n). The major issue with it is that it takes O(n) extra space.

The best solution I read to this problem can be found here. Is very close to meriton's answer if you correct this issue:
In the third step of the algorithm navigates left until you reach the node whose sub tree has the same height as the left tree. This is not always possible, (see counterexample image). The right way to do this step is two find for a subtree with height h or h+1 where h is the height of the left tree

I suspect that you'll just have to walk one tree (hopefully the smaller) and individually add each of it's elements to the other tree. The AVL insert/delete operations are not designed to handle adding a whole subtree at a time.

Related

How to find the top k largest elements more efficiently

How to find the k largest elements in a binary search tree faster than in O(logN + k)
I implemented the algorithm with the said asymptotics, but how to make it faster?
Extend your tree data structure with the following:
Make your tree threaded, i.e. add a parent reference to each node.
Maintain a reference to the node that has the maximum value (the "rightmost" node). Keep it up to date as nodes are added/removed.
With that information you can avoid the first descent from the root to the rightmost node, and start collecting values immediately. If the binary tree is well balanced, then the rightmost node will be on (or near) the bottom layer of the tree. Then the walk along the tree in reversed inorder sequence -- for finding the 𝑘 greatest valued nodes -- will make you traverse a number of edges that is O(𝑘).
Alternative structures, such as B+ tree and skip list can also provide O(𝑘) access to the 𝑘 greatest values they store.

Binary Tree Questions

Currently studying for an exam, and whilst reading through some notes, I had a few questions.
I know that the height of a Binary Search Tree is Log(n). Does this mean the depth is also Log(n)?
What is maximum depth of a node in a full binary tree with n nodes? This related to the first question; if the height of a Binary Tree is Log(n), would the maximum depth also be Log(n)?
I know that the time complexity of searching for a node in a binary search tree is O(Log(n)), which I understand. However, I read that the worst case time complexity is O(N). In what scenario would it take O(N) time to find an element?
THIS IS A PRIORITY QUEUE/ HEAP QUESTION. In my lecture notes, it says the following statement:
If we use an array for Priority Queues, en-queuing takes O(1) and de-queuing takes O(n). In a sorted Array, en-queue takes O(N) and de-queue takes O(1).
I'm having a hard time understanding this. Can anyone explain?
Sorry for all the questions, really need some clarity on a few of these topics.
Caveat: I'm a little rusty, but here goes ...
Height and depth of a binary tree are synonymous--more or less. height is the maximum depth along any path from root to leaf. But, when you traverse a tree, you have a concept of current depth. root node has depth 0, its children have depth 1, its grandchildren depth 2. If we stop here, the height of the tree is 3, but the maximum depth [we visited] is 2. Otherwise, they are often interchanged when talking about the tree overall.
Before we get to some more of your questions, it's important to note that binary trees come in various flavors. Balanced or unbalanced. With a perfectly balanced tree, all nodes except those at maximum height will have their left/right links non-null. For example, with n nodes in the tree, let n = 1024. Perfectly balanced the height is log2(n) which is 10 (e.g. 1024 == 2^10).
When you search a perfectly balanced tree, the search is O(log2(n)) because starting from the root node, you choose to follow either left or right, and each time you do, you eliminate 1/2 of the nodes. In such a tree with 1024 elements, the depth is 10 and you make 10 such left/right decisions.
Most tree algorithms, when you add a new node, will rebalance the tree on the fly (e.g. AVL or RB (red black)) trees. So, you get a perfectly balanced tree, all the time, more or less.
But ...
Let's consider a really bad algorithm. When you add a new node, it just appends it to the left link on the child with the greatest depth [or the new node becomes the new root]. The idea is fast append, and "we'll rebalance later".
If we search this "bad" tree, if we've added n nodes, the tree looks like a doubly linked list using the parent link and the left link [remember all right links are NULL]. This is linear time search or O(n).
We did this deliberately, but it can still happen with some tree algorithm and/or combinations of data. That is, the data is such that it gets naturally placed on the left link because that's where it's correct to place it based on the algorithm's placement function.
Priority queues are like regular queues except each piece of data has a priority number associated with it.
In an ordinary queue, you just push/append onto the end. When you dequeue, you shift/pop from the front. You never need to insert anything in the middle. Thus, enqueue and dequeue are both O(1) operations.
The O(n) comes from the fact that if you have to do an insertion into the middle of an array, you have to "part the waters" to make space for the element you want to insert. For example, if you need to insert after the first element [which is array[0]], you will be placing the new element at array[1], but first you have to move array[1] to array[2], array[2] to array[3], ... For an array of n, this is O(n) effort.
When removing an element from an array, it is similar, but in reverse. If you want to remove array[1], you grab it, then you must "close the gap" left by your removal by array[1] = array[2], array[2] = array[3], ... Once again, an O(n) operation.
In a sorted array, you just pop off the end. It's the one you want already. Hence O(1). To add an element, its an insertion into the correct place. If your array is 1,2,3,7,9,12,17 and you want to add 6, that's new value for array[4], and you have to move 7,9,12,17 out of the way as above.
Priority queue just appends to the array, hence O(1). But to find the correct element to dequeue, you scan the array array[0], array[1], ... remembering the first element position for a given priority, if you find a better priority, you remember that. When you hit the end, you know which element you need, say it's j. Now you have to remove j from array, and that an O(n) operation as above.
It's slightly more complex than all that, but not by two much.

Find the median of binary search tree, C++

Once I was interviewed by "One well known company" and the interviewer asked me to find the median of BST.
int median(treeNode* root)
{
}
I started to implement the first brute-force solution that I came up with. I fill all the data into a std::vector<int> with inorder traversal (to get everything sorted in the vector) and got the middle element.
So my algo is O(N) for inserting every element in the vector and query of middle element with O(1), + O(N) of memory.
So is there more effective way (in terms of memory or in terms of complexity) to do the same thing.
Thanks in advance.
It can be done in O(n) time and O(logN) space by doing an in-order traversal and stopping when you reach the n/2th node, just carry a counter that tells you how many nodes have been already traversed - no need to actually populate any vector.
If you can modify your tree to ranks-tree (each node also has information about the number of nodes in the subtree it's a root of) - you can easily solve it in O(logN) time, by simply moving torward the direction of n/2 elements.
Since you know that the median is the middle element of a sorted list of elements, you can just take the middle element of your inorder traversal and stop there, without storing the values in a vector. You might need two traversals if you don't know the number of nodes, but it will make the solution use less memory (O(h) where h is the height of your tree; h = O(log n) for balanced search trees).
If you can augment the tree, you can use the solution I gave here to get an O(log n) algorithm.
The binary tree offers a sorted view for your data but in order to take advantage of it, you need to know how many elements are in each subtree. So without this knowledge your algorithm is fast enough.
If you know the size of each subtree, you select each time to visit the left or the right subtree, and this gives an O(log n) algorithm if the binary tree is balanced.

Node at a longest distance from another node in a tree

Given a input a tree we need to answer the queries of the type,
a) given a node of the above tree, which is the node which is at the longest distance from that node.
b) remove a particular set of edges from the tree.
I have been trying this for a long time but the best solution I could come up with was,
For a query of type a call dfs function which would return the farthest node in O(N), but i need to do better.
For a query of type b, just update the tree [remove edge if exists].
So my solution above is roughly O(K*N) where K is the number of queries and N is the number of Nodes.
Since your tree is a general tree, i.e., it has no notion of being balanced or even having a root, the best you can do for a one-off query is O(n). However, I think you can set up the tree once taking O(n) time and then have each following query take constant time.
The idea is to find the "middle" of the tree which separate the tree into to roughly equal sized trees, calling the parts arbitrary, e.g., left and right. You then label all nodes in their respective parts with the part they are in and store a left and a right node which are farthest away from the middle. When you get a query for a node you just look at the node's label and report the stored node on the other side.
Given the comment [and the unwarranted downvote] it seems the solution requires a bit more explanation. First off, the furthest apart node for a given node is, in general, not unique. Imagine for example a path with exactly three nodes. There are two furthest away nodes for the middle node. Either one of them is a solution. Based on that, the idea is to find a node in the tree which is located in the middle of the path between the two farthest apart nodes in the tree (of the distance between these nodes is odd, a node on either side can be chosen such that the distances differ by just one): If the farthest apart nodes are l nodes apart, the middle node has a path of length l/2 to both of them or a path of l/2 to one and l/2+1 to the other.
Using this middle node to separate the tree into two halves, randomly called the left and the right half, makes it possible to determine the farthest apart node for any given node if each node knows whether it is in the left or the right half: the longest path will go through the middle node into the other half and from there to the node farthest away from the middle. Let's call the length of the longest path in the left part ll and the length of the longest path in the right part lr. Without loss of generality, have lr < ll (just swap the names around). The respective farthest apart nodes from the middle are called nl and nr. Note that it is OK if there are multiple subtrees leading from the middle node which are considered part of the right part as long as one of the longest path (or the longest path if it is unique) is in the left part.
There are three cases to consider when you want to state the furthest apart node from a node n:
The node n is the middle node. In this case, the furthest apart node is clearly nl.
The node n is in the right part of the tree. The longest path you can construct travels to the middle and then to nl, i.e., the furthest apart node is clearly nl, too
The node n is in the left part of the tree. Again, the longest path you can construct travels to the middle but from there to nr.
The only question remaining is how to find the middle node in O(n) time:
Find all leaf nodes and put them into a queue, labeling them with 1 and giving them a distance of 0. This can be done in O(n) time [and space].
Read (but don't extra, yet) the first node from the queue and find all adjacent nodes. If there is a node with a label which is less than its number of adjacent nodes, increment the label. If the label now matches the number of adjacent nodes, add the node to the queue and give it a distance one bigger than the first node from the queue.
If there is only one node in the queue, this node is the middle node and this step terminates.
Otherwise, extract the front node and continue processing the queue (i.e., step 2).
As a final pass, find the adjacent node with the biggest distance label and consider the tree hanging off this node the left tree. While labeling the nodes as left nodes with a BFS keep track of the last node in the queue to find nl. Consider all other subtrees right trees and label them with BFS, too, as right nodes, also finding nr.
I guess, the preprocessing of the tree can be done more elegantly, possibly using few passes, too, but I'm sure the above approach does work.

How to locate immediate predecessor with O(log n) time complexity

First, of all i would like to let anyone know that this is an assignment and i've finished the locate immediate predecessor with O(n), but i would like to do it with O(log n), i know it's possible since the tree is an AVL tree.
The way i've done it with O(n) is i divide the tree into 2 based on the key(record) and do a max search for the left tree and min search for the right tree. I know it's not log n since after i narrowed the solution, i still have to process all the nodes in the left or right tree so at best it's still 1/2n.
I can see the pattern of the solutions but still can't wrap my mind around it. i'm thinking about using root and node pointer but i'm still not sure of how to implement it.
Any pointers would be appreciated, i've googled and tried to solve this problem to no avail for several days now.
Given a node N in an AVL tree, there are three cases:
N has a left child L. Then the immediate predecessor of N must be the right-most deepest descendent of L. To locate it, you need to descend into the subtree of L, taking the right branch at each sub-node. There can be at most log n levels, so this is O(log n).
N has no left child, but is itself the right child of a parent P. Then P must be the immediate predecessor, located in O(1) time.
N has no left child and is the left child of a parent P. Then walk up the tree towards the root until you find a node that is the right child of an ascendent A. If there is no such A, N does not have any predecessor; otherwise A is the immediate predecessor of N. Again, there can be at most log n parent levels to check, so this is also O(log n).
Determining which of the three applies can obviously be done in O(1) time, so the total time complexity is O(log n).
Example AVL tree for reference (this is the same example as given on the Wikipedia page for AVL tree, but I've recreated the graph rather than copying the image; the source can be forked from here if anybody would like to make modifications):
Nodes 17 and 50 are examples of case 1; node 76 is an example of case 2; node 9 is an example of case 3 with no predecessor; node 19 is an example of case 3 with predecessors. If you think through each of the cases looking at examples from the tree above, you'll be able to confirm that the statements are true. This may be easier than going through a formal proof (which nevertheless could be given).
i actually figured out an easier way to solve this problem without using parent or child pointer.
Here's what i did:
Save each node as i traverse the tree recursively and save all nodes that has record less than target.
if it's a leaf then return your temp pointer to the caller.