Height function in Binary Search Tree - c++

The following is the code I used for finding the height in a BST. Although it works perfectly, I wrote this code by trial and error. Can anyone please explain how it works step by step? A dry run example of the code would be much appreciated.
int Tree::height(tree * Height)
{
if(Height->left==NULL && Height->right==NULL)
{
return 0;
}
else
{
l=height(Height->left);
r=height(Height->right);
if (l>r)
{
l=l+1;
return l;
}
else
{
r=r+1;
return r;
}
}
}

I will suggest you to change the name of the parameter to "node", accordingly to its meaning and also with lowercase letter.
Then, this code checks immediately if the root has children, if not it returns 0.
Then recursively you visit all the nodes of the tree from left to right and that is correct. When it reaches a leaf the value returned is 0, both for l and r, so the r value is incremented, and the execution continues.
When the recursion is over you have the left and the right height of the tree minus 1 (the leaf counted 0 before) so 1 is added and you have the entire height.
Pay attention that this method returns the height of the tree but you cannot know which leaf is the deepest because you increment always r when 0 is returned in both l and r.

your getHeight()-function is a variant of this
int getHeight(tree *p)
{
if (!p)
return 0;
int left = getHeight(p->left);
int right = getHeight(p->right);
return max(left, right) +1;
}
The most important thing to note is that it uses recursion meaning that the function calls itself (getHeight() is called inside getHeight() itself)
in your code getHeight() is named height()
The height of the tree is equivalent to the depth of the recursion
The function getHeight() is recursively called as often as the lowest level of the Binary Search Tree is reached.The depth of recursion is the number (factor) that getHeight() is recursively called. Every call of getHeight() either increases a counter by one so in the end the value of the counter is the height of the Binary Search Tree or at the lowest level of the BST the number of 'level jumps' is determined by max(left, right) +1; .
This is the process how getHeight() determines the height of the Binary Search Tree.
See the wikipedia article on recursion https://en.wikipedia.org/wiki/Recursion_(computer_science)
The arrow-operator -> is used when a member of a structure is referenced by a pointer (in this case the structure 'p' is the tree or its current brach point 'left' and 'right' are the outgoing branches)
The critical point to understand is how recursion works. In physics and mathematics recursion is analog to self-mapping or self-reference
In Functional Programming / Lambda calculus (https://en.wikipedia.org/wiki/Functional_programming) recursion is the only used programming technique. It is a counterpart to Imperative Programming.
Alan Turing proved that every program that can be written in Imperative Programming can also be written in Functional Programming / lambda calculus (using recursion)

Related

Maximum number of left nodes/children in a path

I am trying to find a way to output the amount of most left nodes found in a path.
For example:
The max nodes in this Binary Search Tree would be 2 (Goes from 5 ->3->1 and excluding the root).
What is the best way to approach this?
I have seen this thread which is fairly similar to what I am trying to achieve.
Count number of left nodes in BST
but there is like one line in the code that I don't understand.
count += countLeftNodes(overallRoot.left, count++);
overallRoot.left
My guess is that it calls a function on the object, but I can't figure out what goes into that function and what it would return.
Any answers to these two questions would be appreciated.
The answer you linked shows how to traverse the tree, but you need a different algorithm to get your count, since as you have noted, that question is trying to solve a slightly different problem.
At any given point in the traversal, you will have the current left count: this will be passed down the tree as a second parameter to countLeftNodes(). That starts with zero at the root, and is increased by one whenever you go into the left child of a node, but is set to zero when you enter the right node.
Then for both the left and right traversals, you set the left count to the greater of its current value, and the return from the recursive call to countLeftNodes(). And then this final value is what you return from countLeftNodes()
Here's a shot at the algorithm #dgnuff illustrated:
void maxLeftNodesInPath(Node *root, int count, int *best) {
if (root) {
maxLeftNodesInPath(root->left, ++count, best);
maxLeftNodesInPath(root->right, 0, best);
}
else if (count > *best) {
*best = count - 1;
}
}
The explanation is pretty much the same: keep accumulating on a count while traversing left, reset when moving to a right child, and if at a leaf, update the best.

Propagating values from the leaf to the root

I have solved quite a few questions related to trees, however, I still don't feel confident about one particular aspect of trees (recursion in general):
How do you propagate values from the leaf to the root?
For example, consider we have a binary tree wherein we have to find the root to leaf path with the minimum sum. For the tree image here, the sum would be 7 (corresponding to two paths 0-3-2-1-1 or 0-6-1).
I wrote the following code:
struct Node
{
int cost;
vector<Node *> children;
Node *parent;
};
int getCheapestCost( Node *rootNode )
{
if(!rootNode) return 0;
return dfs(rootNode, INT_MAX, 0);
}
int dfs(Node* rootNode, int minVal, int currVal) {
if(!rootNode) return;
currVal+=rootNode->cost;
if(rootNode->children.empty()) {
minVal = min(minVal, currVal);
return minVal;
}
for(auto& neighbor: rootNode->children) {
dfs(neighbor, minVal, currVal);
}
return currVal; //this is incorrect, but what should I return?
}
I know the last return currVal is incorrect - but then what should I return? Technically, I only want to return the value of minVal when I reach the leaf nodes (and no value when I am at the intermediate nodes). So, how do I propagate the minVal from the leaf nodes to the topmost root node?
P.S.: I am preparing for interviews and this is a big pain area for me since I get stuck at this point almost every time. I would highly appreciate any help. Thanks.
Edit: For this particular one, I somehow wrote a solution using pass by reference.
Inside your for save the minVal from all children and return minVal instead of currVal.
for(auto& neighbor: rootNode->children) {
minVal = min(minVal, dfs(neighbor, minVal, currVal));
}
return minVal;
That way you're always returning the minVal, through the recursion all the way to the first call.
Edit: Explanation
I'll use the tree you provided in your question as an example. We'll start by entering the tree at the root(0). It'll add 0 to the currVal, won't enter the first if, then enter the for. Once it's there, the function will be called again, from the first child.
At the first node (5), it'll add that value, check if it's the end, and go to the next node (4), adds again, currVal is now 9. Then, since (4) has no children, it'll return min(currVal, minVal). At this point, minVal is INT_MAX, so it returns 9.
Once this value is returned, we go back to the function that called it, which was at node(5), exactly at the point when we called (4), and we'll (with my modification) compare whichever value it returned with minVal.
min(minVal, dfs(neighbor, minVal, currVal))
At this point, it's important to notice that the current minVal is still INT_MAX, as it's not a reference, and this was the value passed to the function. And as a result, we now set it to 9.
If (5) had other children, we would now enter a new instance of dfs and at the once we had a result, compare that value with 9, but since we don't, we end the for loop and return minVal, going back to the root node(0).
From there, I believe you can guess what happens, we enter node(3) which branches to (2)->(1)->(1) and (0)->(10), returning 7 and 13 to the for loop respectively, and node (6) will finally also return 7 to (0)'s for loop.
In the end, (0) will first compare INT_MAX with 9, then with 7 and finally with 7 again, returning 7 to getCheapestCost.
In short:
Your code will keep entering dfs until it finds a node without children, once that happens, it'll return the minVal it got from that node, and return to the function that called it, which is the parent node.
Once in the parent node, you need to check which children provided the minimum minVal, by comparing that with your previous minVal (from other children, branches or INT_MAX). After checking all children, minValue is returned to the next parent, which compares with its children until it reaches the root node.

Segmentation fault in recursive function when using smart pointers

I get a segmentation fault in the call to
auto n1=std::make_shared<Node>(n,n->x+i,n->y+j);
after a few recursive calls. Strange thing is that it's always at the same point in time. Can anyone spot the problem?
This is an implementation for a dynamic programming problem and here I'm accumulating the costs of a path. I have simplified the cost function but in this example the problem still occurs.
void HorizonLineDetector::dp(std::shared_ptr<Node> n)
{
n->cost= 1 + n->prev->cost;
//Check if we reached the last column(done!)
if (n->x==current_edges.cols-1)
{
//Save the info in the last node if it's the cheapest path
if (last_node->cost > n->cost)
{
last_node->cost=n->cost;
last_node->prev=n;
}
}
else
{
//Check for neighboring pixels to see if they are edges, launch dp with all the ones that are
for (int i=0;i<2;i++)
{
for (int j=-1;j<2;j++)
{
if (i==0 && j==0) continue;
if (n->x+i >= current_edges.cols || n->x+i < 0 ||
n->y+j >= current_edges.rows || n->y+j < 0) continue;
if (current_edges.at<char>(n->y+j,n->x+i)!=0)
{
auto n1=std::make_shared<Node>(n,n->x+i,n->y+j);
//n->next.push_back(n1);
nlist.push_back(n1);
dp(n1);
}
}
}
}
}
class Node
{
public:
Node(){}
Node(std::shared_ptr<Node> p,int x_,int y_){prev=p;x=x_;y=y_;lost=0;}
Node(Node &n1){x=n1.x;y=n1.y;cost=n1.cost;lost=n1.lost;prev=n1.prev;}//next=n1.next;}
std::shared_ptr<Node> prev; //Previous and next nodes
int cost; //Total cost until now
int lost; //Number of steps taken without a clear path
int x,y;
Node& operator=(const Node &n1){x=n1.x;y=n1.y;cost=n1.cost;lost=n1.lost;prev=n1.prev;}//next=n1.next;}
Node& operator=(Node &&n1){x=n1.x;y=n1.y;cost=n1.cost;lost=n1.lost;prev=n1.prev;n1.prev=nullptr;}//next=n1.next;n1.next.clear();}
};
Your code looks like a pathological path search, in that it checks almost every path and doesn't keep track of paths it has already checked you can get to more than one way.
This will build recursive depth equal to the length of the longest path, and then the next longest path, and ... down to the shortest one. Ie, something like O(# of pixels) depth.
This is bad. And, as call stack depth is limited, will crash you.
The easy solution is to modify dp into dp_internal, and have dp_internal return a vector of nodes to process next. Then write dp, which calls dp_internal and repeats on its return value.
std::vector<std::shared_ptr<Node>>
HorizonLineDetector::dp_internal(std::shared_ptr<Node> n)
{
std::vector<std::shared_ptr<Node>> retval;
...
if (current_edges.at<char>(n->y+j,n->x+i)!=0)
{
auto n1=std::make_shared<Node>(n,n->x+i,n->y+j);
//n->next.push_back(n1);
nlist.push_back(n1);
retval.push_back(n1);
}
...
return retval;
}
then dp becomes:
void HorizonLineDetector::dp(std::shared_ptr<Node> n)
{
std::vector<std::shared_ptr<Node>> nodes={n};
while (!nodes.empty()) {
auto node = nodes.back();
nodes.pop_back();
auto new_nodes = dp_internal(node);
nodes.insert(nodes.end(), new_nodes.begin(), new_nodes.end());
}
}
but (A) this will probably just crash when the number of queued-up nodes gets ridiculously large, and (B) this just patches over the recursion-causes-crash, doesn't make your algorithm suck less.
Use A*.
This involves keeping track of which nodes you have visited and what nodes to process next with their current path cost.
You then use heuristics to figure out which of the ones to process next you should check first. If you are on a grid of some sort, the heuristic is to use the shortest possible distance if nothing was in the way.
Add the cost to get to the node-to-process, plus the heuristic distance from that node to the destination. Find the node-to-process that has the least total. Process that one: you mark it as visited, and add all of its adjacent nodes to the list of nodes to process.
Never add a node to the list of nodes to process that you have already visited (as that is redundant work).
Once you have a solution, prune the list of nodes to process against any node whose current path value is greater than or equal to your solution. If you know your heuristic is a strong one (that it is impossible to get to the destination faster), you can even prune based off of the total of heuristic and current cost. Similarly, don't add to the list of nodes to process if it would be pruned by this paragraph.
The result is that your algorithm searches in a relatively strait line towards the target, and then expands outwards trying to find a way around any barriers. If there is a relatively direct route, it is used and the rest of the universe isn't even touched.
There are many optimizations on A* you can do, and even alternative solutions that don't rely on heuristics. But start with A*.

Path of the diameter of a binary tree

I have a binary tree and a method for the size of the longest path (the diameter):
int diameter(struct node * tree)
{
if (tree == 0)
return 0;
int lheight = height(tree->left);
int rheight = height(tree->right);
int ldiameter = diameter(tree->left);
int rdiameter = diameter(tree->right);
return max(lheight + rheight + 1, max(ldiameter, rdiameter));
}
I want the function to return also the exact path (list of all the nodes of the diameter).
How can I do it?
Thanks
You have two options:
A) Think.
B) Search. Among the first few google hits you can find this: http://login2win.blogspot.hu/2012/07/print-longest-path-in-binary-tree.html
Choose A) if you want to learn, choose B) if you do not care, only want a quick, albeit not necessarily perfect solution.
There are many possible solutions, some of them:
In a divide and conquer approach you will probably end up with maintaining the so far longest paths on both sides, and keep only the longer.
The quoted solution does two traversals, one for determining the diameter, and the second for printing. This is a nice trick to overcome the problem of not knowing whether we are at the deepest point in approach 1.
Instead of a depth first search, do a breadth first one. Use a queue. Proceed level by level, for each node storing the parent. When you reach the last level (no children added to queue), you can print the whole path easily, because the last printed node is on (one) longest path, and you have the parent links.
Add a property struct node * next to the node struct. Before the return statement, add a line like this tree->next = (ldiameter > rdiameter ? tree->left : tree->right) to get the longer path node as the next node. After calling diameter(root), you should be able to iterate through all of the next nodes from the root to print the largest path.
I think the following may work... compute the diameter as follows in O(N) time.
// this is a c++ code
int findDiameter(node *root, int &max_length, node* &max_dia_node, int parent[], node* parent_of_root){
if(!root) return 0;
parent[root->val] = parent_of_root->val;
int left = findDiameter(root->left, max_length);
int right = findDiameter(root->right, max_length);
if(left+right+1 > max_length){
max_dia_node = root;
max_length = left+right+1;
}
return 1 + max(left,right);
}
So in this function number of things is happening. First max_length is calculating the max diameter of the tree. And along with that I am assigning the max_dia_node to this node.
This is the node through which I will have my max diameter pass through.
Now using this information we can find the max depth left child and right child of this node (max_dia_node). From that we can have the actual nodes via "parent" array.
This is two traversal of the tree.

How to find average of secondary element in a Binary Search Tree node

I'm trying to create a function that finds the average of some data within the nodes of a tree. The problem is, every node contains two pieces of data and unlike other BSTs, the primary data from which it is built is a string. Finding the average of number-based elements in a tree isn't an issue for me, but since each node contains a string (a person's name) and a seemingly random number (the weight of said person), the tree is actually in complete disarray, and I have no idea how to deal with it.
Here is my node so you see what I mean:
struct Node {
string name;
double weight;
Node* leftChild;
Node* rightChild;
};
Node* root;
Here's the function during one of its many stages:
// This isn't what I'm actually using so don't jump to conclusions
double nameTree::averageWeight(double total, double total, int count) const
{
if (parent != NULL)
{ //nonsense, nonsense
averageWeight(parent->leftChild, total, count);
averageWeight(parent->rightChild, total, count);
count++;
total = total + parent->weight;
return total;
}
return (total / count);
}
In an effort to traverse the tree, I tried some recursion but every time I manage to count and total everything, something gets screwey and it ends up doing return(total/count) each time. I've also tried an array implementation by traversing the tree and adding the weights to the array, but that didn't work because the returns and recursion interfered, or something.
And just because I know someone is going to ask, yes, this is for a school assignment. However, this is one out of like, 18 functions in a class so it's not like I'm asking anyone to do this for me. I've been on this one function for hours now and I've been up all night and my brain hurts so any help would be vastly appreciated!
You could try something like:
//total number of tree nodes
static int count=0;
// Calculate the total sum of the weights in the tree
double nameTree::calculateWeight(Node *parent)
{
double total=0;
if (parent != NULL)
{
//nonsense, nonsense
//Calculate total weight for left sub-tree
total+=calculateWeight(parent->leftChild);
//Calculate weight for right sub-tree
total+=calculateWeight(parent->rightChild);
//add current node weight
total+=parent->weight;
}
count++;
//if it is a leaf it will return 0
return total;
}
double averageWeight()
{
double weightSum;
weightSum=calculateWeight();
if(count!=0)
return (weightSum/count);
else
{
cout<<"The tree is empty";
return 0;
}
}
I don't have a compiler here but I believe it works.
To calculate the average you need two numbers: the total value and the number of elements in the set. You need to provide a function (recursive is probably the simplest) that will walk the tree and either return a pair<double,int> with those values or else modify some argument passed as reference to store the two values.
As of your code, averageWeight returns a double, but when you call it recursively you are ignoring (discarding) the result. The count argument is passed by copy, which means that the modifications applied in the recursive calls will not be visible by the caller (which then does not know how much parent->weight should weight towards the result.
This should be enough to get you started.