Node at a longest distance from another node in a tree - c++

Given a input a tree we need to answer the queries of the type,
a) given a node of the above tree, which is the node which is at the longest distance from that node.
b) remove a particular set of edges from the tree.
I have been trying this for a long time but the best solution I could come up with was,
For a query of type a call dfs function which would return the farthest node in O(N), but i need to do better.
For a query of type b, just update the tree [remove edge if exists].
So my solution above is roughly O(K*N) where K is the number of queries and N is the number of Nodes.

Since your tree is a general tree, i.e., it has no notion of being balanced or even having a root, the best you can do for a one-off query is O(n). However, I think you can set up the tree once taking O(n) time and then have each following query take constant time.
The idea is to find the "middle" of the tree which separate the tree into to roughly equal sized trees, calling the parts arbitrary, e.g., left and right. You then label all nodes in their respective parts with the part they are in and store a left and a right node which are farthest away from the middle. When you get a query for a node you just look at the node's label and report the stored node on the other side.
Given the comment [and the unwarranted downvote] it seems the solution requires a bit more explanation. First off, the furthest apart node for a given node is, in general, not unique. Imagine for example a path with exactly three nodes. There are two furthest away nodes for the middle node. Either one of them is a solution. Based on that, the idea is to find a node in the tree which is located in the middle of the path between the two farthest apart nodes in the tree (of the distance between these nodes is odd, a node on either side can be chosen such that the distances differ by just one): If the farthest apart nodes are l nodes apart, the middle node has a path of length l/2 to both of them or a path of l/2 to one and l/2+1 to the other.
Using this middle node to separate the tree into two halves, randomly called the left and the right half, makes it possible to determine the farthest apart node for any given node if each node knows whether it is in the left or the right half: the longest path will go through the middle node into the other half and from there to the node farthest away from the middle. Let's call the length of the longest path in the left part ll and the length of the longest path in the right part lr. Without loss of generality, have lr < ll (just swap the names around). The respective farthest apart nodes from the middle are called nl and nr. Note that it is OK if there are multiple subtrees leading from the middle node which are considered part of the right part as long as one of the longest path (or the longest path if it is unique) is in the left part.
There are three cases to consider when you want to state the furthest apart node from a node n:
The node n is the middle node. In this case, the furthest apart node is clearly nl.
The node n is in the right part of the tree. The longest path you can construct travels to the middle and then to nl, i.e., the furthest apart node is clearly nl, too
The node n is in the left part of the tree. Again, the longest path you can construct travels to the middle but from there to nr.
The only question remaining is how to find the middle node in O(n) time:
Find all leaf nodes and put them into a queue, labeling them with 1 and giving them a distance of 0. This can be done in O(n) time [and space].
Read (but don't extra, yet) the first node from the queue and find all adjacent nodes. If there is a node with a label which is less than its number of adjacent nodes, increment the label. If the label now matches the number of adjacent nodes, add the node to the queue and give it a distance one bigger than the first node from the queue.
If there is only one node in the queue, this node is the middle node and this step terminates.
Otherwise, extract the front node and continue processing the queue (i.e., step 2).
As a final pass, find the adjacent node with the biggest distance label and consider the tree hanging off this node the left tree. While labeling the nodes as left nodes with a BFS keep track of the last node in the queue to find nl. Consider all other subtrees right trees and label them with BFS, too, as right nodes, also finding nr.
I guess, the preprocessing of the tree can be done more elegantly, possibly using few passes, too, but I'm sure the above approach does work.

Related

How to find the top k largest elements more efficiently

How to find the k largest elements in a binary search tree faster than in O(logN + k)
I implemented the algorithm with the said asymptotics, but how to make it faster?
Extend your tree data structure with the following:
Make your tree threaded, i.e. add a parent reference to each node.
Maintain a reference to the node that has the maximum value (the "rightmost" node). Keep it up to date as nodes are added/removed.
With that information you can avoid the first descent from the root to the rightmost node, and start collecting values immediately. If the binary tree is well balanced, then the rightmost node will be on (or near) the bottom layer of the tree. Then the walk along the tree in reversed inorder sequence -- for finding the 𝑘 greatest valued nodes -- will make you traverse a number of edges that is O(𝑘).
Alternative structures, such as B+ tree and skip list can also provide O(𝑘) access to the 𝑘 greatest values they store.

Remove an element from unbalanced binary search tree

I have been wanting to write remove() method for my Binary Search Tree (which happens to be an array representation). But before writing it, I must consider all cases. Omitting all cases (since they are easy) except when the node has two children, in all the explanations I have read so far, most of the cases I see remove an element from an already balanced binary search tree. In the few cases where I have seen an element being removed from an unbalanced binary search tree, I find that they balance it through zigs and zags, and then remove the element.
Is there a way that I can possibly remove an element from an unbalanced binary search tree without having to balance it beforehand?
If not, would it be easier to write an AVL tree (in array representation)?
You don't need to balance it, but you do need to recursively go down the tree performing some swaps here and there so you actually end up with a valid BST.
Deletion of a node with 2 children in an (unbalanced) BST: (from Wikipedia)
Call the node to be deleted N. Do not delete N. Instead, choose either its in-order successor node or its in-order predecessor node, R. Copy the value of R to N, then recursively call delete on R until reaching one of the first two cases.
Deleting a node with two children from a binary search tree. First the rightmost node in the left subtree, the inorder predecessor 6, is identified. Its value is copied into the node being deleted. The inorder predecessor can then be easily deleted because it has at most one child. The same method works symmetrically using the inorder successor labelled 9.
Although, why do you want an unbalanced tree? All operations on it take on it take longer (or at least as long), and the additional overhead to balance doesn't change the asymptotic complexity of any operations. And, if you're using the array representation where the node at index i has children at indices 2i and 2i+1, it may end up fairly sparse, i.e. there will be quite a bit of wasted memory.

How to indicate preorder of a spanning tree using the algorithm BFS

I'm doing an implementation of the BFS algorithm in c++ to find a spanning tree, the output for a spanning tree should be shown in preorder, but I have a doubt in the implementation, how I can build a tree if not exactly know how many children have each node?. Considering a tree structure recursive The data structure of the tree can be written as:
typedef struct node
{
int val;
struct node *left, *right;
}*tree; //tree has been typedefed as a node pointer.
But do not think it works this implementation as mentioned before.
This is my function to return the tree in preorder:
void preorder(tree t)
{
if(t == NULL)
return;
printf("%d ", t->val);
preorder(t->left);
preorder(t->right);
}
I also wonder if there is any way to do the preorder of the nodes without using a tree structure.
I have seen two concrete questions in the posting:
Is it possible to have a data structure using more than two children in a tree? Of course this is possible. Interestingly, it is even possible with the node structure you posted! Just consider the left pointer to be a pointer to the first child and the right pointer to point to the next sibling. Since breadth first search of a graph implicitly builds up a spanning tree, you can then walk this tree in preorder if you actually represent it somehow.
Can you do a preorder walk without using a tree structure? Yes, this is possible, too. Essentially, DFS and BFS are conceptually no different for this: you just have a data structure maintaining the nodes to be visited next. For DFS this is a stack, for BFS this is a queue. You get a preorder walk of the tree (i.e. you visit all children of a node after the parent) if you emit the node number when you insert it into the data structure maintaining the nodes to be visited.
To expand a bit on the second point: a preorder walk of a tree just means that each node is processed prior to it child nodes. When you do a graph search you want to traverse through a connected component of a graph, visiting each node just once, you effectively create an implicit tree. That is, your start node become the root node of the tree. Whenever you visit a node you search for adjacent nodes which haven't been visited, i.e. which isn't marked. If there is such a node, the incident edge becomes a tree node and you mark the node. Since there is always only just one node being actively held you need to remember the nodes which aren't processed, yet, in some data structure, e.g. a stack or a queue (instead of using a stack explicitly you could do recursion which creates the stack implicitly). Now, if you emit the node number the first time you see a node you clearly process it prior to its children, i.e. you end up writing the node number the order of a preorder walk.
If you don't understand this, please whip out a sheet of paper and draw a graph and a queue:
the nodes with hollow circles and their node number next to them
the edges with thin lines
the queue is just rectangles which doesn't contain anything at the start
Now choose a node to become the start node of your search which is the same as the root node of your tree. Write its number into the first empty position in the queue and mark i.e. fill the node. Now proceed with the search:
look at the node indicated by front of the queue and find an adjacent node which isn't filled:
append the node at the back of the queue (i.e. right behind the last node in the rectangle)
mark (i.e. fill) the node
make the line connecting the two nodes thicker: it is a tree edge now
if there are no further unmarked adjacent nodes tick the front node in the queue off (i.e. remove it from the queue) and move on to the next node until there are no further nodes
Now the queue rectangle contains a preorder walk of the spanning tree implied by a breadth first search of the graph. The spanning tree is visible using the thicker lines. The algorithm would also work if you treated the rectangle for the queue as a stack but it would be a bit messier because you end up with ticked off nodes between nodes still to be processed: instead of looking at the first unticked node you would look at the last unticked node.
When working with graph algorithms I found it quite helpful to visualize the algorithm. Although it would be nice to have the computer maintain the drawing, the low-tech alternative of drawing things on paper and possibly indicating active nodes by a number of labeled pencils works as well if not better.
Just a comment on the code: whenever you are reading any input, make sure that you successfully read the data. BTW, your code is clearly only C and not C++ code: variable length arrays are not available in C++. In C++ you would use std::vector<int> followOrder(vertexNumber) instead of int followOrder[vertexNumber]. Interestingly, the code isn't C either because it uses e.g. std::queue<int>.

Please suggest some algorithm to find the node in a tree whose distance to its farthest node is minimum among all the nodes

Please suggest some algorithm to find the node in a tree whose distance to its farthest node is minimum among all the nodes.
Its is not a graph and it is not weighted.
Choose an arbitrary node v in the tree T.
Run BFS making v as the root of T.
BFS outputs the distances from v to all the other nodes of T.
Now choose a node u that is farthest from v.
Run again BFS making u as the root.
On the new distance output, find a node w that is farthest from u.
Consider the path between u and w.
This is the longest path in the tree T.
The node in the midle of the path is the center of the tree T.
Note that there may exist two centers in a tree. If so, they are neighbours.
Performance: O(n), where n is the number of nodes of T.
Proof
Claim: a leaf (u) that is furthest from some node v lies on the longest path.
If we prove it, then the algorithm is correct, since it first finds u, and, since it's one end of the longest path, uses DFS to find this path itself.
Proof of the claim: Let's use retucto ad absurdum. Assume u---r is the longest path in the tree; and for some node v neither v---u, nor v---r is the longest path from v. Instead, the longest path is v---k. We have two cases:
a) u---r and v--k have a common node o. Then v--o--u and v--o--r are shorter than u---o---k. Then o---r is shorter than o---k. Then u---o---r is not the longest path in graph, because u---o---k is longer. It contradicts our assumption.
b) u---r and v--k don't have common nodes. But since the graph is connected, there are nodes o1 and o2 on each of these paths, such that the path between them o1--o2 doesn't contain any other nodes on these two paths. The contradiction to the assumption is the same as in point a), but with o1--o2 instead of mere o (in fact, point a is just a special case of b, where o1=o2).
This proves the claim and hence the correctness of the algorithm.
(this is proof written by Pavel Shved, and the original author might have a shorter one).
Remove leaves. If more than 2 nodes left, repeat. The node (or 2 nodes) left will be the node you are looking for.
Why this works:
The node(s) are in the middle of the longest path P in the tree. Their max distance to any node is at most half of the length of the path (otherwise it would not be the longest one). Any other node on P will obviously have greater distance to the further end of P than the found node(s). Any node n not on P will have its furthest node at least (distance from n to the closest node on P, say c) + (distance from c to the further end of P), so again more than the node(s) found by the algorithm.
You can use Johnson's algorithm for sparse graphs, but otherwise use the Floyd-Warshall algorithm simply because it is trivial to implement.
Essentially you want to find the distance from every node to every other node, and then just trivially search for the property you want.
You could use Dijkstra's algorithm (http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm) on each node in turn, to find all the distances from that node to every other node; scan the resulting list to get the distance to the farthest node. Once you've Dijkstra'd every node, another scan will give you the minimum of that maximal distances.
Dijkstra is usually regarded as having runtime O(v^2), where v is the number of nodes; you'd be running it once per node, which will increase the time to O(v^3) in a naive implementation. You may be able to make gains by storing the results of earlier nodes' Dijkstra runs and using them as known values in later runs.
As others have said in comments:
A tree is a graph - an undirected connected acyclic graph to be exact - see "Tree" (Graph theory).

Concatenating/Merging/Joining two AVL trees

Assume that I have two AVL trees and that each element from the first tree is smaller then any element from the second tree. What is the most efficient way to concatenate them into one single AVL tree? I've searched everywhere but haven't found anything useful.
Assuming you may destroy the input trees:
remove the rightmost element for the left tree, and use it to construct a new root node, whose left child is the left tree, and whose right child is the right tree: O(log n)
determine and set that node's balance factor: O(log n). In (temporary) violation of the invariant, the balance factor may be outside the range {-1, 0, 1}
rotate to get the balance factor back into range: O(log n) rotations: O(log n)
Thus, the entire operation can be performed in O(log n).
Edit: On second thought, it is easier to reason about the rotations in the following algorithm. It is also quite likely faster:
Determine the height of both trees: O(log n).
Assuming that the right tree is taller (the other case is symmetric):
remove the rightmost element from the left tree (rotating and adjusting its computed height if necessary). Let n be that element. O(log n)
In the right tree, navigate left until you reach a node whose subtree is at most one 1 taller than left. Let r be that node. O(log n)
replace that node with a new node with value n, and subtrees left and r. O(1)
By construction, the new node is AVL-balanced, and its subtree 1 taller than r.
increment its parent's balance accordingly. O(1)
and rebalance like you would after inserting. O(log n)
One ultra simple solution (that works without any assumptions in the relations between the trees) is this:
Do a merge sort of both trees into one merged array (concurrently iterate both trees).
Build an AVL tree from the array - take the middle element to be the root, and apply recursively to left and right halves.
Both steps are O(n). The major issue with it is that it takes O(n) extra space.
The best solution I read to this problem can be found here. Is very close to meriton's answer if you correct this issue:
In the third step of the algorithm navigates left until you reach the node whose sub tree has the same height as the left tree. This is not always possible, (see counterexample image). The right way to do this step is two find for a subtree with height h or h+1 where h is the height of the left tree
I suspect that you'll just have to walk one tree (hopefully the smaller) and individually add each of it's elements to the other tree. The AVL insert/delete operations are not designed to handle adding a whole subtree at a time.