Find the median of binary search tree, C++ - c++

Once I was interviewed by "One well known company" and the interviewer asked me to find the median of BST.
int median(treeNode* root)
{
}
I started to implement the first brute-force solution that I came up with. I fill all the data into a std::vector<int> with inorder traversal (to get everything sorted in the vector) and got the middle element.
So my algo is O(N) for inserting every element in the vector and query of middle element with O(1), + O(N) of memory.
So is there more effective way (in terms of memory or in terms of complexity) to do the same thing.
Thanks in advance.

It can be done in O(n) time and O(logN) space by doing an in-order traversal and stopping when you reach the n/2th node, just carry a counter that tells you how many nodes have been already traversed - no need to actually populate any vector.
If you can modify your tree to ranks-tree (each node also has information about the number of nodes in the subtree it's a root of) - you can easily solve it in O(logN) time, by simply moving torward the direction of n/2 elements.

Since you know that the median is the middle element of a sorted list of elements, you can just take the middle element of your inorder traversal and stop there, without storing the values in a vector. You might need two traversals if you don't know the number of nodes, but it will make the solution use less memory (O(h) where h is the height of your tree; h = O(log n) for balanced search trees).
If you can augment the tree, you can use the solution I gave here to get an O(log n) algorithm.

The binary tree offers a sorted view for your data but in order to take advantage of it, you need to know how many elements are in each subtree. So without this knowledge your algorithm is fast enough.
If you know the size of each subtree, you select each time to visit the left or the right subtree, and this gives an O(log n) algorithm if the binary tree is balanced.

Related

Are time complexity of pre-order and DFS on a balanced binary tree same?

I read from an answer that pre-order is a type of DFS:link
Yet,for a balanced binary tree,the time complexity for a tree-traversal is O(logn)link and the DFS has a time complexity of O(N)link.
So, is pre-order traversal not a type of DFS or I misunderstood the concept?
Thanks.
The time complexity for a preorder, inorder, or postorder traversal of a binary search tree is always Θ(n), where is the number of nodes in the tree. One way to see this is that in each case, each node is visited once and exactly once, and each edge is visited exactly twice (once descending downward, and once ascending upward).
You had mentioned in your question that the time complexity of a tree traversal on a balanced tree is O(log n). The O(log n) here actually refers to the space complexity (how much auxiliary memory is needed) rather than the time complexity (how many operations will be performed). The reason for this is that all of these tree traversals, in their typical implementation, need to store a stack of the nodes that have been visited so far so that the traversal can back up higher in the tree when necessary. This means that the auxiliary space needed is proportional to the height of the tree, which in a balanced tree is O(log n) and in an arbitrary BST will be O(n).
So in that sense, the best answer to your question is probably "DFS, inorder traversals, preorder traversals, and postorder traversals of a BST always take the same amount of time (Θ(n)), and the space complexity depends on the height of the tree, which can range between Θ(log n) and Θ(n)."

Binary Tree Questions

Currently studying for an exam, and whilst reading through some notes, I had a few questions.
I know that the height of a Binary Search Tree is Log(n). Does this mean the depth is also Log(n)?
What is maximum depth of a node in a full binary tree with n nodes? This related to the first question; if the height of a Binary Tree is Log(n), would the maximum depth also be Log(n)?
I know that the time complexity of searching for a node in a binary search tree is O(Log(n)), which I understand. However, I read that the worst case time complexity is O(N). In what scenario would it take O(N) time to find an element?
THIS IS A PRIORITY QUEUE/ HEAP QUESTION. In my lecture notes, it says the following statement:
If we use an array for Priority Queues, en-queuing takes O(1) and de-queuing takes O(n). In a sorted Array, en-queue takes O(N) and de-queue takes O(1).
I'm having a hard time understanding this. Can anyone explain?
Sorry for all the questions, really need some clarity on a few of these topics.
Caveat: I'm a little rusty, but here goes ...
Height and depth of a binary tree are synonymous--more or less. height is the maximum depth along any path from root to leaf. But, when you traverse a tree, you have a concept of current depth. root node has depth 0, its children have depth 1, its grandchildren depth 2. If we stop here, the height of the tree is 3, but the maximum depth [we visited] is 2. Otherwise, they are often interchanged when talking about the tree overall.
Before we get to some more of your questions, it's important to note that binary trees come in various flavors. Balanced or unbalanced. With a perfectly balanced tree, all nodes except those at maximum height will have their left/right links non-null. For example, with n nodes in the tree, let n = 1024. Perfectly balanced the height is log2(n) which is 10 (e.g. 1024 == 2^10).
When you search a perfectly balanced tree, the search is O(log2(n)) because starting from the root node, you choose to follow either left or right, and each time you do, you eliminate 1/2 of the nodes. In such a tree with 1024 elements, the depth is 10 and you make 10 such left/right decisions.
Most tree algorithms, when you add a new node, will rebalance the tree on the fly (e.g. AVL or RB (red black)) trees. So, you get a perfectly balanced tree, all the time, more or less.
But ...
Let's consider a really bad algorithm. When you add a new node, it just appends it to the left link on the child with the greatest depth [or the new node becomes the new root]. The idea is fast append, and "we'll rebalance later".
If we search this "bad" tree, if we've added n nodes, the tree looks like a doubly linked list using the parent link and the left link [remember all right links are NULL]. This is linear time search or O(n).
We did this deliberately, but it can still happen with some tree algorithm and/or combinations of data. That is, the data is such that it gets naturally placed on the left link because that's where it's correct to place it based on the algorithm's placement function.
Priority queues are like regular queues except each piece of data has a priority number associated with it.
In an ordinary queue, you just push/append onto the end. When you dequeue, you shift/pop from the front. You never need to insert anything in the middle. Thus, enqueue and dequeue are both O(1) operations.
The O(n) comes from the fact that if you have to do an insertion into the middle of an array, you have to "part the waters" to make space for the element you want to insert. For example, if you need to insert after the first element [which is array[0]], you will be placing the new element at array[1], but first you have to move array[1] to array[2], array[2] to array[3], ... For an array of n, this is O(n) effort.
When removing an element from an array, it is similar, but in reverse. If you want to remove array[1], you grab it, then you must "close the gap" left by your removal by array[1] = array[2], array[2] = array[3], ... Once again, an O(n) operation.
In a sorted array, you just pop off the end. It's the one you want already. Hence O(1). To add an element, its an insertion into the correct place. If your array is 1,2,3,7,9,12,17 and you want to add 6, that's new value for array[4], and you have to move 7,9,12,17 out of the way as above.
Priority queue just appends to the array, hence O(1). But to find the correct element to dequeue, you scan the array array[0], array[1], ... remembering the first element position for a given priority, if you find a better priority, you remember that. When you hit the end, you know which element you need, say it's j. Now you have to remove j from array, and that an O(n) operation as above.
It's slightly more complex than all that, but not by two much.

Is there any data structure in C++ STL for performing insertion, searching and retrieval of kth element in log(n)?

I need a data structure in c++ STL for performing insertion, searching and retrieval of kth element in log(n)
(Note: k is a variable and not a constant)
I have a class like
class myClass{
int id;
//other variables
};
and my comparator is just based on this id and no two elements will have the same id.
Is there a way to do this using STL or I've to write log(n) functions manually to maintain the array in sorted order at any point of time?
Afaik, there is no such datastructure. Of course, std::set is close to this, but not quite. It is a red black tree. If each node of this red black tree was annotated with the tree weight (the number of nodes in the subtree rooted at this node), then a retrieve(k) query would be possible. As there is no such weight annotation (as it takes valuable memory and makes insert/delete more complex as weights have to be updated), it is impossible to answer such a query efficently with any search tree.
If you want to build such a datastructure, use a conventional search tree implementation (red-black,AVL,B-Tree,...) and add a weight field to each node that counts the number of entries in its subtree. Then searching for the k-th entry is quite simple:
Sketch:
Check the weight of the child nodes, and find the child c which has the largest weight (accumulated from left) that is not greater than k
Subtract from k all weights of children that are left of c.
Descend down to c and call this procedure recursively.
In case of a binary search tree, the algorithm is quite simple since each node only has two children. For a B-tree (which is likely more efficient), you have to account as many children as the node contains.
Of course, you must update the weight on insert/delete: Go up the tree from the insert/delete position and increment/decrement the weight of each node up to the root. Also, you must exchange the weights of nodes when you do rotations (or splits/merges in the B-tree case).
Another idea would be a skip-list where the skips are annotated with the number of elements they skip. But this implementation is not trivial, since you have to update the skip length of each skip above an element that is inserted or deleted, so adjusting a binary search tree is less hassle IMHO.
Edit: I found a C implementation of a 2-3-4 tree (B-tree), check out the links at the bottom of this page: http://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtree.html
You can not achieve what you want with simple array or any other of the built-in containers. You can use a more advanced data structure for instance a skip list or a modified red-black tree(the backing datastructure of std::set).
You can get the k-th element of an arbitrary array in linear time and if the array is sorted you can do that in constant time, but still the insert will require shifting all the subsequent elements which is linear in the worst case.
As for std::set you will need additional data to be stored at each node to be able to get the k-th element efficiently and unfortunately you can not modify the node structure.

Why does inserting sequential elements in a tree require more time than inserting random elements into a tree?

This is not homework I'm taking a data structures class and we recently finished trees. At the end of class, my professor showed this image.
ConcreteBTree is a binary tree that doesnt self balance. I have a few questions about the times it took to complete these procedures.
Why does it take so much more time to insert 100,000 sequential elements into ConcreteBTree than it takes to insert random elements into it? My intuition would be that since elements are sequential, it should take less time than it takes to insert 1,000,000 random elements.
Why are the times of insert() and find() of ConcreteBTree with random elements so close together? Is it because both have the same time complexity? I thought insert was O(1) and find was O(n)
I'd really like to understand what is going on here, any explanation would be greatly appreciated. Thanks
Inserting sequential items( 1,2,3,4...) to a binary tree will cause it to always add the nodes to the same side( left for example ) .
When you insert random items you will add nodes randomly left and right.
Adding sequentially will cause the list to behave as a ordinary linked list ( for the sequential items) because new items will have to visit every previously added item and that will take O(n) steps , when adding randomly it will take O( log N) steps on average.
Armin's answered Q1.
2.Why are the times of insert() and find() of ConcreteBTree with random elements so close together? Is it because both have the same time complexity? I thought insert was O(1) and find was O(n)
insert and find have to do the same work - they go down through whatever weird tree you've put together looking for that last node under which the value either is linked or would be (and will be in the case of insert), so they do the same number of comparisons and node traversals, taking similar time.
Insertion of random elements in a balanced tree is O(log2N). Your insertions of random values into an tree that doesn't self-rebalance will be a bit but not dramatically worse as some branches will end up considerably longer than others - you'll probably get some kind of bell curve of branch lengths. insert's only O(1) if you already know the node in the tree under which the insert is to be done (i.e. that find step above is normally needed). find's only O(n) if every node in the tree has to be visited, which is only the case for a pathologically unbalanced tree, effectively forming a linked list, as you've already been told you can generate by inserting pre-sorted elements.

Concatenating/Merging/Joining two AVL trees

Assume that I have two AVL trees and that each element from the first tree is smaller then any element from the second tree. What is the most efficient way to concatenate them into one single AVL tree? I've searched everywhere but haven't found anything useful.
Assuming you may destroy the input trees:
remove the rightmost element for the left tree, and use it to construct a new root node, whose left child is the left tree, and whose right child is the right tree: O(log n)
determine and set that node's balance factor: O(log n). In (temporary) violation of the invariant, the balance factor may be outside the range {-1, 0, 1}
rotate to get the balance factor back into range: O(log n) rotations: O(log n)
Thus, the entire operation can be performed in O(log n).
Edit: On second thought, it is easier to reason about the rotations in the following algorithm. It is also quite likely faster:
Determine the height of both trees: O(log n).
Assuming that the right tree is taller (the other case is symmetric):
remove the rightmost element from the left tree (rotating and adjusting its computed height if necessary). Let n be that element. O(log n)
In the right tree, navigate left until you reach a node whose subtree is at most one 1 taller than left. Let r be that node. O(log n)
replace that node with a new node with value n, and subtrees left and r. O(1)
By construction, the new node is AVL-balanced, and its subtree 1 taller than r.
increment its parent's balance accordingly. O(1)
and rebalance like you would after inserting. O(log n)
One ultra simple solution (that works without any assumptions in the relations between the trees) is this:
Do a merge sort of both trees into one merged array (concurrently iterate both trees).
Build an AVL tree from the array - take the middle element to be the root, and apply recursively to left and right halves.
Both steps are O(n). The major issue with it is that it takes O(n) extra space.
The best solution I read to this problem can be found here. Is very close to meriton's answer if you correct this issue:
In the third step of the algorithm navigates left until you reach the node whose sub tree has the same height as the left tree. This is not always possible, (see counterexample image). The right way to do this step is two find for a subtree with height h or h+1 where h is the height of the left tree
I suspect that you'll just have to walk one tree (hopefully the smaller) and individually add each of it's elements to the other tree. The AVL insert/delete operations are not designed to handle adding a whole subtree at a time.