C++ - threaded tree, ordered traversal - c++

I've just implemented a threaded tree in C++, and now I'm trying to cout all the elements in order.
The tree was a binary sorted tree (not balanced) before I've threaded it.
I've tried doing this:
E min = _min(root); //returns the minimum element of the tree
E max = _max(root); //returns the maximum element of the tree
while (min != max)
{
std::cout << min << ", ";
min = _successor(root, min);
}
std::cout << max << ", ";
std::cout << std::endl;
but since the tree is now threaded, my successor function always returns the minimum of the whole tree (basically, it goes once in the right subtree, and then goes in the left subtree as many times as possible, until it finds a leaf.) So when I try to call this function, it only cout 1's (because 1 is the minimum value of my tree).
Also, I've tried something else:
E min = _min(root); //returns min element of the tree
E max = _max(root); //returns max element of the tree
Node* tmp = _getNode(root, min); //returns the node of the specified element, therefore the minimum node of the tree
while(tmp->data < max)
{
std::cout << tmp->data << ", ";
tmp = _getNode(root, tmp->data)->rightChild; //gets the right child node of tmp
}
std::cout << tmp->data << ", ";
However, by doing this, there are values that are ignored. (See image below)
(Green links have been added after the threading of the tree.)
If you see, for example, the node #6 never gets visited from the very last algorithm, because it's not the right child of any node in the tree...
Here's the output of the previous function:
1, 2, 3, 5, 7, 8, 11, 71
Does anyone have an idea of how I could fix this, or any tips for my problem?
Thanks
EDIT: After all I just had to traverse the tree from the minimum to the maximum AND modify my _predecessor and _successor methods, so they wouldn't check in subtrees that are threaded. :)
Hope it helps future readers.

Try
Node* n = _min(root);
while (n->right) {
cout << n->val << ", ";
n = _successor(n);
}
cout << n->val << endl;
This is basically your first code (note that I assume that the tree is non-empty as do you). This also won't give you a trailing ','.
The important thing is to get your successor function correct. It should be like this
Node* _successor(Node* n) {
if (is_thread(o, RIGHT)) return o->right;
return _min(o->right);
}
And for completeness
Node* _min(Node* n) {
while (!is_thread(o, LEFT)) n = o->left;
return n;
}
For both of these all the green arrows are threads.

I've never seen threaded trees before, but I'll take a stab at this anyway. To build an inorder traversal, you could approach the root of the tree from two directions at once:
Start at the root.
Follow all left links until you find one that points to null. That element is the tree's minimum value.
Follow all right links until you reach the root. If you've built the tree correctly, this should traverse every element in increasing order.
Repeat steps 2 and 3 in the opposite direction (find the max element, walk backwards).
Join these two lists with the root in the middle.
That's probably not the fastest algorithm but I think it'll produce a correct answer. And you didn't have to use recursion, which I guess is the whole point for using a threaded tree.

After all I just had to traverse the tree from the minimum to the maximum
AND
modify my _predecessor and _successor methods, so they wouldn't check in subtrees that are threaded. :)
Hope it helps future readers.

Related

Complexity of printing all root to leaf paths in binary tree

In https://www.techiedelight.com/print-all-paths-from-root-to-leaf-nodes-binary-tree/, the code for printing root to leaf for every leaf node is provided below.
They state the algorithm is O(n), but I think it should be O(n log n) where n is the number of nodes. A standard DFS is typically O(n + E), but printing the paths seems to add a log n. Suppose h is the height of the perfect binary tree. There are n/2 nodes on the last level, hence n/2 paths that we need to print. Each path has h + 1 (let's just say it's h for mathematical simplicity) nodes. So we need end up printing h * n/2 nodes when printing all the paths. We know h = log2(n). So h * n/2 = O(n log n)?
Is their answer wrong, or is there something wrong with my analysis here?
#include <iostream>
#include <vector>
using namespace std;
// Data structure to store a binary tree node
struct Node
{
int data;
Node *left, *right;
Node(int data)
{
this->data = data;
this->left = this->right = nullptr;
}
};
// Function to check if a given node is a leaf node or not
bool isLeaf(Node* node) {
return (node->left == nullptr && node->right == nullptr);
}
// Recursive function to find paths from the root node to every leaf node
void printRootToleafPaths(Node* node, vector<int> &path)
{
// base case
if (node == nullptr) {
return;
}
// include the current node to the path
path.push_back(node->data);
// if a leaf node is found, print the path
if (isLeaf(node))
{
for (int data: path) {
cout << data << " ";
}
cout << endl;
}
// recur for the left and right subtree
printRootToleafPaths(node->left, path);
printRootToleafPaths(node->right, path);
// backtrack: remove the current node after the left, and right subtree are done
path.pop_back();
}
// The main function to print paths from the root node to every leaf node
void printRootToleafPaths(Node* node)
{
// vector to store root-to-leaf path
vector<int> path;
printRootToleafPaths(node, path);
}
int main()
{
/* Construct the following tree
1
/ \
/ \
2 3
/ \ / \
4 5 6 7
/ \
8 9
*/
Node* root = new Node(1);
root->left = new Node(2);
root->right = new Node(3);
root->left->left = new Node(4);
root->left->right = new Node(5);
root->right->left = new Node(6);
root->right->right = new Node(7);
root->right->left->left = new Node(8);
root->right->right->right = new Node(9);
// print all root-to-leaf paths
printRootToleafPaths(root);
return 0;
}
The time comlexity of finding path is O(n) where it iterates through all nodes once.
The time comlexity of "print one path" is O(log n).
To print all paths (n/2 leaf), it takes O( n log n )
Then you need to compare node traverse cost and print path cost.
I believe in most of modern OS, print cost is much greater than node traverse cost.
So the actual time complexity is O(n log n) ( for print ).
I assume the website might ignore print cost so it claims time complexity is O(n).
The complexity is O(n log n) for a balanced binary tree, but for an arbitrary binary tree, the worst case is O(n2).
Consider a tree consisting of:
n/2 nodes in a linked list on their rightChild pointers; and
At the end of that, n/2 nodes arranged into a tree with n/4 leaves.
Since the n/4 leaves are all more than n/2 nodes deep, there are more than n2/8 total nodes in all the paths, and that is O(n2)
The algorithm traverses O(n) nodes. The total prints it does is O(n lg n) for a balanced tree or O(n^2) for an arbitrary tree.
It depends on what operations have cost.
For example, storing or incrementing an n bit number or pointer is often treated as an O(1) operation. In any physical computer, if you have 2^100 nodes you'll need lg(2^100) bit pointers (or node names) which will require more time to copy than 64 or 32 bit node names. Jn a certain sense, copying a pointer should take O(lg n) time!
But we don't care. We implicitly set the price of operations, and give O notation costs in terms of those operations.
Here, it is plausible they counted printing the entire path as an O(1) operation, and counted node traversals, to get an O(n) cost. Maybe they did it even notice, no more than you noticed the max node count implied by 32 or 64 bit pointers. They failed to tell you how they are pricing things.
The same thing happens in the specification of std library algorithms; it guarantees a max number of calls of a predicate.

Counting the number of nodes in a level of a binary search tree

Like the title says, I want to count the nodes in for any given level of the tree. I already know how to make member functions for counting all the nodes of the tree, just not sure how to approach a specific level. Here's what I've tried. Any help is appreciated.
First parameter is a point to a character array inputted by the user. root is a private variable representing the "oldest" node.
int TreeType::GetNodesAtLevel(ItemType* itemArray, int level)
{
TreeNode* p = root;
if (itemArray == NULL)
return;
if (level == 0)
{
cout << p->info << " ";
return;
}
else
{
GetNodesAtLevel(itemarray->left, level); //dereference in one and not the other was just testing
GetNodesAtLevel(*itemarray->right, level); //neither seems to work
}
}
The way to do it is by using a queue (employing level order traversal - BFS). Now follow this:
Take two variables, count_level and count_queue (keeping total nodes in a queue).
for a tree like this:
A
/ \
B C
/ \ \
K L D
/
E
initially count_level = 0 and count_queue = 0. Now:
Add a node to the queue(at this point A, increment count_queue to 1).
Now when you find count_level = 0 do this -> count_level = count_queue.
Add the kid nodes while dequeuing till the count_level becomes 0. So at this point the follow step 2 and that will give you the no of nodes at level beneath what just has been processed.

Topological Sorting Algorithm not working right

Ok I have to do a topological sorting algorithm on a graph. I need to find the node with in degree of 0 and queue it, then print it and remove all the edges going to it. I am removing the edges by decrementing the number of edges going into it in the countList map. I have the adjacency list as a map, and the count of in degrees for each node as a map. My algorithm is only accessing the first element of the adjacency list. so my output queue is only displaying the first key of the adjacency list map. over and over. I stopped the while loop at 25 so it wouldn't be infinite.
string z = "";
string a = "";
cout << "Queue: ";
do{
for(it = countList.begin(); it!=countList.end(); ++it){
if(it->second == 0){
Q.push(it->first);
countList.at(it->first)--;
z = adjList[it->first];
cout <<"z: " << z <<endl;
//remove edges
for(int i = 0; i< z.length(); i++){
a = z.at(i);
cout << "z at " <<i << " : " <<a <<endl;
countList.at(a)--;
}//end for
}//end if
//cout << Q.front() << ", ";
//Q.pop();
}//end for
cout << Q.front() << ", ";
Q.pop();
}while(!Q.empty());
Can someone help me with understanding why it is not iterating through the countList and is only staying on the first element?
Thank you.
So I changed the countList.at(a)-+1, to countList.at(a)-- for proper decrementation.
Now the output is more than just the first vertex that was 0 in degree. But the output is still wrong.
Here is the whole thing.
My variable declarations
vector<string> E;
map<string, string> adjList;
map<string, int>countList;
map<string, int>::iterator it;
queue<string> Q;
I don't want to put up the code for the adjacencyList or countList but here are how they look.
//These represent the edges between the two paired nodes
AdjacencyList: (1,2) (1,4) (1,3) (2,4) (2,5) (3,6) (4,6) (4,7) (4,3) (5,4) (5,7) (7,6)
//The first is the node name and the second element is how many edges come into that node.
countList: (1,0) (2,1) (3,2) (4,3) (5,1) (6,3) (7,2)
My output should be either:
Queue: 1,2,5,4,3,7,6
//or
Queue: 1,2,5,4,7,3,6
OK I added
countList.at(it->first)--;
after I push the vertex onto the queue. So that should decrement the count of that vertex to -1.
This narrowed my output alot.
OK IT WORKS NOW!!!
I changed the while loop to stop after the queue is empty and printed the queue in the while loop and it fixed the problem.
My output now is:
Queue: 1, 2, 5, 4, 7, 3, 6,
Ok this code will only work if the node names are only single values.
How would I change the adjList mapping for values that node names are longer than a single character?
Perhaps a linked list being pointed to by the key value? And if so how would I do that?
Ok, now we're getting somewhere.
The very first version (before your edit) did decrement the incoming edge count incorrectly.
Now there is another issue: In each iteration, you repeatedly take nodes that have already been taken (node #1 is a good example) because they still have zero count (number of incoming edges). By decrementing their ancestors again and again, some of the counts will drop below zero (such as for node #2).
You have to somehow mark the nodes that have already been used and do not use them again and again in each cycle. This can either be achieved a) by setting some flag for the node, b) using a set of used nodes, c) by removing the node from the list, or (probably the simpliest) d) by setting their edge count to a negative number (for instance, -1) after putting them into the output queue.
After your second edit, the algorithm as such should work (it works for me after some minor tweeks). However, the usage of adjList is pretty strange -- how do you exactly include multiple edges for one node into a map?

binary tree -print the elements according to the level

This question was asked to me in an interview:
lets say we have above binary tree,how can i produce an output like below
2 7 5 2 6 9 5 11 4
i answered like may be we can have a level count variable and print all the elements sequentially by checking the level count variable of each node.
probably i was wrong.
can anybody give anyidea as to how we can achieve that?
You need to do a breadth first traversal of the tree. Here it is described as follows:
Breadth-first traversal: Depth-first
is not the only way to go through the
elements of a tree. Another way is to
go through them level-by-level.
For example, each element exists at a
certain level (or depth) in the tree:
tree
----
j <-- level 0
/ \
f k <-- level 1
/ \ \
a h z <-- level 2
\
d <-- level 3
people like to number things starting
with 0.)
So, if we want to visit the elements
level-by-level (and left-to-right, as
usual), we would start at level 0 with
j, then go to level 1 for f and k,
then go to level 2 for a, h and z, and
finally go to level 3 for d.
This level-by-level traversal is
called a breadth-first traversal
because we explore the breadth, i.e.,
full width of the tree at a given
level, before going deeper.
The traversal in your question is called a level-order traversal and this is how it's done (very simple/clean code snippet I found).
You basically use a queue and the order of operations will look something like this:
enqueue F
dequeue F
enqueue B G
dequeue B
enqueue A D
dequeue G
enqueue I
dequeue A
dequeue D
enqueue C E
dequeue I
enqueue H
dequeue C
dequeue E
dequeue H
For this tree (straight from Wikipedia):
The term for that is level-order traversal. Wikipedia describes an algorithm for that using a queue:
levelorder(root)
q = empty queue
q.enqueue(root)
while not q.empty do
node := q.dequeue()
visit(node)
if node.left ≠ null
q.enqueue(node.left)
if node.right ≠ null
q.enqueue(node.right)
BFS:
std::queue<Node const *> q;
q.push(&root);
while (!q.empty()) {
Node const *n = q.front();
q.pop();
std::cout << n->data << std::endl;
if (n->left)
q.push(n->left);
if (n->right)
q.push(n->right);
}
Iterative deepening would also work and saves memory use, but at the expense of computing time.
If we are able to fetch the next element at same level, we are done. As per our prior knowledge, we can access these element using breadth first traversal.
Now only problem is how to check if we are at last element at any level. For this reason, we should be appending a delimiter (NULL in this case) to mark end of a level.
Algorithm:
1. Put root in queue.
2. Put NULL in queue.
3. While Queue is not empty
4. x = fetch first element from queue
5. If x is not NULL
6. x->rpeer <= top element of queue.
7. put left and right child of x in queue
8. else
9. if queue is not empty
10. put NULL in queue
11. end if
12. end while
13. return
#include <queue>
void print(tree* root)
{
queue<tree*> que;
if (!root)
return;
tree *tmp, *l, *r;
que.push(root);
que.push(NULL);
while( !que.empty() )
{
tmp = que.front();
que.pop();
if(tmp != NULL)
{
cout << tmp=>val; //print value
l = tmp->left;
r = tmp->right;
if(l) que.push(l);
if(r) que.push(r);
}
else
{
if (!que.empty())
que.push(NULL);
}
}
return;
}
I would use a collection, e.g. std::list, to store all elements of the currently printed level:
Collect pointers to all nodes in the current level in the container
Print the nodes listed in the container
Make a new container, add the subnodes of all nodes in the container
Overwrite the old container with the new container
repeat until container is empty
as an example of what you can do at an interview if you don't remember/don't know the "official" algorithm, my first idea was - traverse the tree in the regular pre-order dragging a level counter along, maintaining a vector of linked-lists of pointers to nodes per level, e.g.
levels[level].push_back(&node);
and in the end print the list of each level.

Top 10 Frequencies in a Hash Table with Linked Lists

The code below will print me the highest frequency it can find in my hash table (of which is a bunch of linked lists) 10 times. I need my code to print the top 10 frequencies in my hash table. I do not know how to do this (code examples would be great, plain english logic/pseudocode is just as great).
I create a temporary hashing list called 'tmp' which is pointing to my hash table 'hashtable'
A while loop then goes through the list and looks for the highest frequency, which is an int 'tmp->freq'
The loop will continue this process of duplicating the highest frequency it finds with the variable 'topfreq' until it reaches the end of the linked lists on the the hash table.
My 'node' is a struct comprising of the variables 'freq' (int) and 'word' (128 char). When the loop has nothing else to search for it prints these two values on screen.
The problem is, I can't wrap my head around figuring out how to find the next lowest number from the number I've just found (and this can include another node with the same freq value, so I have to check that the word is not the same too).
void toptenwords()
{
int topfreq = 0;
int minfreq = 0;
char topword[SIZEOFWORD];
for(int p = 0; p < 10; p++) // We need the top 10 frequencies... so we do this 10 times
{
for(int m = 0; m < HASHTABLESIZE; m++) // Go through the entire hast table
{
node* tmp;
tmp = hashtable[m];
while(tmp != NULL) // Walk through the entire linked list
{
if(tmp->freq > topfreq) // If the freqency on hand is larger that the one found, store...
{
topfreq = tmp->freq;
strcpy(topword, tmp->word);
}
tmp = tmp->next;
}
}
cout << topfreq << "\t" << topword << endl;
}
}
Any and all help would be GREATLY appreciated :)
Keep an array of 10 node pointers, and insert each node into the array, maintaining the array in sorted order. The eleventh node in the array is overwritten on each iteration and contains junk.
void toptenwords()
{
int topfreq = 0;
int minfreq = 0;
node *topwords[11];
int current_topwords = 0;
for(int m = 0; m < HASHTABLESIZE; m++) // Go through the entire hast table
{
node* tmp;
tmp = hashtable[m];
while(tmp != NULL) // Walk through the entire linked list
{
topwords[current_topwords] = tmp;
current_topwords++;
for(int i = current_topwords - 1; i > 0; i--)
{
if(topwords[i]->freq > topwords[i - 1]->freq)
{
node *temp = topwords[i - 1];
topwords[i - 1] = topwords[i];
topwords[i] = temp;
}
else break;
}
if(current_topwords > 10) current_topwords = 10;
tmp = tmp->next;
}
}
}
I would maintain a set of words already used and change the inner-most if condition to test for frequency greater than previous top frequency AND tmp->word not in list of words already used.
When iterating over the hash table (and then over each linked list contained therein) keep a self balancing binary tree (std::set) as a "result" list. As you come across each frequency, insert it into the list, then truncate the list if it has more than 10 entries. When you finish, you'll have a set (sorted list) of the top ten frequencies, which you can manipulate as you desire.
There may be perform gains to be had by using sets instead of linked lists in the hash table itself, but you can work that out for yourself.
Step 1 (Inefficient):
Move the vector into a sorted container via insertion sort, but insert into a container (e.g. linkedlist or vector) of size 10, and drop any elements that fall off the bottom of the list.
Step 2 (Efficient):
Same as step 1, but keep track of the size of the item at the bottom of the list, and skip the insertion step entirely if the current item is too small.
Suppose there are n words in total, and we need the most-frequent k words (here, k = 10).
If n is much larger than k, the most efficient way I know of is to maintain a min-heap (i.e. the top element has the minimum frequency of all elements in the heap). On each iteration, you insert the next frequency into the heap, and if the heap now contains k+1 elements, you remove the smallest. This way, the heap is maintained at a size of k elements throughout, containing at any time the k highest-frequency elements seen so far. At the end of processing, read out the k highest-frequency elements in increasing order.
Time complexity: For each of n words, we do two things: insert into a heap of size at most k, and remove the minimum element. Each operation costs O(log k) time, so the entire loop takes O(nlog k) time. Finally, we read out the k elements from a heap of size at most k, taking O(klog k) time, for a total time of O((n+k)log k). Since we know that k < n, O(klog k) is at worst O(nlog k), so this can be simplified to just O(nlog k).
A hash table containing linked lists of words seems like a peculiar data structure to use if the goal is to accumulate are word frequencies.
Nonetheless, the efficient way to get the ten highest frequency nodes is to insert each into a priority queue/heap, such as the Fibonacci heap, which has O(1) insertion time and O(n) deletion time. Assuming that iteration over the hash table table is fast, this method has a runtime which is O(n×O(1) + 10×O(n)) ≡ O(n).
The absolute fastest way to do this would be to use a SoftHeap. Using a SoftHeap, you can find the top 10 items in O(n) time whereas every other solution posted here would take O(n lg n) time.
http://en.wikipedia.org/wiki/Soft_heap
This wikipedia article shows how to find the median in O(n) time using a softheap, and the top 10 is simply a subset of the median problem. You could then sort the items that were in the top 10 if you needed them in order, and since you're always at most sorting 10 items, it's still O(n) time.