Finding the Most Common Words in a Binary Search Tree - c++

I'm writing a function to output the most common word in a Binary Search Tree but it outputs the word in the top of the bst in alphabetical order rather than most common word.
for example:
Input: abc abc abc abc xyz xyz xyz xyz xyz xyz
Output: abc
I really don't know what the issue is, any help would be greatly appreciated.
void WordAnalyzer::findCommon(TreeNode* root) {
if (root != NULL) {
findCommon(root->left);
if (prev != NULL) {
if (root->data == prev->data) {
currCount++;
}
else {
currCount = 1;
}
}
if (currCount > maxCount) {
maxCount = currCount;
maxWord = root->data;
}
prev = root;
findCommon(root->right);
}
}
string WordAnalyzer::getMostCommonWord() {
findCommon(root);
return maxWord;
}

It is not clear from the code how and where currCount is initialized, but if it is not initialized explicitly before this code runs, you have an undefined behavior.
When looking at the first (leftest) element in your BST, you set prev = root, but you do not set currCount, then when you visit the next element, you increase currCount by 1, but you never gave it an initial value, and it can contain basically any "garbage" value.

Just create a map with key as "word" and value as "count". Traverse the BST using any traversal (like Inorder), and increment the counts for corresponnding "word" in map. Then when whole traversal is over, check in your main function, the word with the most count in Map.
Here is a quick reference if you are not familiar with the Map data structure.
http://www.cplusplus.com/reference/map/map/insert/

Related

Construct binary tree from s-expression in c++

empty tree ::= ()
tree ::= empty tree | (w tree tree)
ex:
()
empty tree
(99(5()())(35(-5()())()))
99
/ \
5 35
/
-5
class Node
{
public:
int weight; // weight can be negative!
Node *left, *right;
Node():weight(0),left(NULL),right(NULL){}
Node(int d):weight(d),left(NULL),right(NULL){}
};
Construct a binary tree by given condition
I get problem with construct it, my program will crush and I have no idea about why it happened, the following is my code and I print out some information for debug, take (99(5()())(35(-5()())())) as a test case, it will print out 99(5( and crush, I think maybe problem is at which I deal with ) where I return node which is NULL, but I can’t find problem with it. By the way, this tree is expected to handle HUNDREDS of nodes in each tree, and Each of the test cases contains up to TEN-THOUSAND trees, will I run out of time with this program or what should I need to do?Thank for your time
Node* MyBinaryTreeOps::constructTree(Node *root, std::string treeStr)const
{
int idex = 1;//always look at the treeStr[1]
Node *cur=NULL;//use to pass in recursive call
if(treeStr[idex]!='('&&treeStr[idex]!=')'){//meet number create new node
stringstream ss;
while(treeStr[idex]!='('){
ss<<treeStr[idex];
if(treeStr.size()>1){//if size > 1 then remove the treeStr[1],to let treeStr[1] become next char in treeStr
treeStr.erase(1,1);
}
}
int num=0;
ss>>num;
std::cout<<num<<std::endl;//print out just for debug
std::cout<<treeStr[idex]<<std::endl;//print out just for debug
root = new Node(num);
}
if(treeStr[idex]==')'){//meet ')' return subtree constructed
if(treeStr.size()>1){
treeStr.erase(1,1);
}
return root;
}
if(treeStr[idex]=='('){//meet first '(' then construct left subtree
if(treeStr.size()>1){
treeStr.erase(1,1);
}
root->left = constructTree(cur,treeStr);
}
if(treeStr[idex]=='('){ //meet second '(' then construct right subtree
if(treeStr.size()>1){
treeStr.erase(1,1);
}
root->right = constructTree(cur,treeStr);
}
if(treeStr[idex]==')'){ //meet ')' return subtree constructed
if(treeStr.size()>1){
treeStr.erase(1,1);
}
return root;
}
}
I've tried this problem by myself and this is the function that I've wrote.
Steps of the algorithm:
Find a part of the sequence that represents weight of current node. Convert it to int and assign to node.
Slice string to remove weight, starting and ending brace.
Iterate over sequence to find point between two braces that divides children nodes.
Split children string into two sequences (We can slice starting tree and reuse it as sequence of one of the children nodes).
If child node has weight (length of its sequence is larger than 2) then create new node and recurse algorithm.
Additionally, here is my program with some test examples and a little bit extended Node class:
Node* constructTree(Node* root, std::string& treeString) {
// Find the weight of this node.
auto weightLeft = treeString.find_first_of("(") + 1;
auto weightRight = treeString.find_first_of("()", weightLeft);
auto weightString = treeString.substr(weightLeft, weightRight - weightLeft);
// Optional, we check if there is any weight, if there is not we leave zero
// weight from constructor.
// Works for something like that: ((1)(2)) -> (0(1)(2))
if (weightString.length() > 0) {
root->weight = std::stoi(weightString);
}
// Slice string to contain only children sequences.
treeString.erase(0, weightRight);
treeString.erase(treeString.length() - 1, 1);
// Looking for index in string where a left child ends and a right child starts.
// This point(index) is located where count of left braces and for braces
// is the same and the counts are not zero.
int splitPoint = -1;
int leftBraces = 0, rightBraces = 0;
for (int index = 0; index < treeString.length(); index++) {
char c = treeString[index];
if (c == '(') {
++leftBraces;
}
if (c == ')') {
++rightBraces;
}
if (leftBraces == rightBraces) {
splitPoint = index + 1;
break;
}
}
// If split point has been found then it means that this node has children.
if (splitPoint != -1) {
auto leftChildString = treeString.substr(0, splitPoint);
auto rightChildString = treeString.erase(0, splitPoint);
// Check for length so construct will stop if there is no child.
if (leftChildString.length() > 2) {
root->left = new Node();
constructTree(root->left, leftChildString);
}
if (rightChildString.length() > 2) {
root->right = new Node();
constructTree(root->right, rightChildString);
}
}
return root;
}

How to find a certain element of BST, given key?

So I'm trying to implement a TREE_SUCCESSOR(X) function for BST where X is the key of the node that I'm trying to find the successor of. So far I have this:
int BinarySearchTree::TREE_SUCCESSOR(node* x)
{
//i dont need a new node, I just need a pointer/reference to x.
node* y = NULL;
//node* parent = NULL;
if (x->right != NULL)
{
return FIND_MIN(x->right);
}
else
{
y = x->parent;
while (y != NULL && x == y->right)
{
x = y;
y = y->parent;
}
return y->key;
}
}
My problem is in the main function:
int main()
{
BinarySearchTree bst;
int num = 0;
cout << "Enter number you want to find the successor of: " <<endl;
cin >> num;
if(BST.root->key == num) //if i'm trying to find the successor of the root
{ TREE_SUCCESSOR(BST.root); }
else
{
while(BST.root->key != num) //if the user input does not equal the root key value
{
????
}
}
I want to find out how to traverse the BST to the node of the BST till the key = num. For example, if the tree had nodes 3,4,5 then TREE_SUCCESSOR(4), should return 5. How would I do this??
EDIT
So I decided used a TREE_SEARCH(key) that would find the node with a certain key and return it... and then pass that node into TREE_SUCCESSOR(X).
Do an in-order traversal.
After finding the element continue the traversal, the next element is the one you need.
You don't need any special case regarding if you're looking for the successor of the root, but you need to treat the case where the element is the last one in the traversal, i.e. the largest one one.
My first approach would be to search for examples on the internet "binary search tree successor".
But if I have a big enough ego, I may want to develop my own algorithm. I would draw a binary search tree. Next I would pick a node an figure out the steps to get to the successor. After I have the steps, I would go through the steps using different nodes on the tree and adjust the algorithm (steps) as necessary.
After I had the algorithm, I would code it up.
But you're not me, so you would want to search the internet for "c++ binary search tree successor function".

Understanding the logic to extract frequency from a binary file to create huffman tree

I have to claculate frequency from a binary files.
What i have in mind is i will do read the characters present in the file and then calculate frequency by the number of times that character repeats.
I do so using this code. And it works fine:
struct Node
{
unsigned char symbol;
int appear;
struct Node *link;
struct Node * left,*right;
};Node * head;
Somewhere in main i have like this to read the file:
ch = fgetc(fp);
while (fread(&ch,sizeof(ch),1,fp))
{
symbol(ch);
}
fclose(fp);
where add_symbol function is like this:
But i am not able to understand the logic of this code. Could any one please explain the questions i have asked in the code?
symbol(unsigned char sym)
{
Node*pt,*pt,*t;
int is_there=0;
pt = pt = head;
while (pt != NULL)
{
if (pt -> symbol == sym)
{
pt -> appear++;
is_there = 1;
break;
}
pt = pt;
pt = pt -> link;
}
if (!is_there)
{
// printf("\n is_there2 : %d\n",!is_there);
printf("sym2 : %d\n", sym);
t = (Node *) malloc(sizeof( Node));
t -> symbol = sym;
t -> appear = 1;
t -> left = NULL;
t -> right = NULL;
t->link = NULL;
if (head == NULL)
{
head = temp;
}
else
{
pt->link = temp;
}
}
}
To find the same frequency we need to first store the all the data somewhere.
(1) Where it is done ?
(2) We need to compare the symbol if that appears again or not?
(3) Please explain bit more the code the logic is same in c and c++ as well. So any language , No problems.
In explain i have the doubt that:
suppose 1 2 1 3 3 1 2 are the symbols in binary file.
On first time execution of addsymbol we do addsymbol(1); , Now we store the "1" to know if any other "1" comes in future or not ?
so we do pt->symbol if again equals to "1" then we increase frequency by one.
But on second execution of addsymbol we do addsymbol(2); which is not equal to "1" so again repeat.
On third time execution i got addsymbol(1); , this time i got "1" which equals to the "1" stored previously, so increases the frequency by "1".
What about the previous "2" ? Because we read the file only once by doing
while (fread(&ch,sizeof(ch),1,fp))
{
add_symbol(ch);
}
and if the "2" is already passed then we will not be able to count it. How this code persists this "2" and also finds it's frequency Please do not hesitate to ask me if you still don't undersand my question ?
The code does not store all the data, it only stores the symbols and counts in a linked list.
The code reads one symbol at a time, calling add_symbol() for each. The add_symbol function starts by looking up the symbol in its linked list. If the symbol is there, the function will just increment its count; otherwise, it will add the symbol to the tail of the list, and with a count of 1.
Edit: By request, here's how it would look if it were more decomposed:
void Huffman::add_symbol(unsigned char sym)
{
Node * foundNode = find_node_in_linked_list(sym);
if(foundNode != NULL)
foundNode->freq++;
else
add_freq1_node_at_end_of_list(sym);
}
Node* Huffman::find_node_in_linked_list(unsigned char sym)
{
Node* pCur = Start;
while(pCur != NULL)
{
if(pCur->symbol == ch)
return pCur;
pCur = pCur->next;
}
return NULL;
}
void Huffman::add_freq1_node_at_end_of_list(unsigned char sym)
{
//Get tail of list
Node* pTail = NULL;
Node* pCur = Start;
while(pCur != NULL)
{
pTail = pCur;
pCur = pCur->next;
}
//Now, pTail is either the last element, or NULL if the list is empty.
//Create the new object
//(should use the new keyword instead, but since the deletion code was not posted...
Node* pNew = static_cast< Node* >(malloc(sizeof *pNew));
if(pNew == NULL)
return;
pNew->symbol = sym;
pNew->freq = 1;
pNew->left = NULL;
pNew->right = NULL;
pNew->next = NULL;
pNew->is_processed = 0;
//Add the new node at the tail
if(pTail != NULL)
pTail->next = pNew;
else
Start = pNew;
}
Note that it's less efficient than the big function because it goes through the list twice when the symbol is not found (once to try and find the symbol, once to find the tail).
In fact, there's no reason to specifically add at the tail rather than insert at the head.
Quite frankly a linked list is not the most time-efficient way of storing the counts for up to 256 symbols. Personnally I'd recommend using a lookup table instead (a dumb vector of 256 structures, or even a dedicated histogram object that would just be a vector of 256 integers).
A few advices on your general design:
Step #1: In order to count the symbols, you can use a simple histogram:
include <limits.h>
int histogram[1<<CHAR_BIT] = {0};
unsigned char ch;
while (fread(&ch,sizeof(ch),1,fp))
histogram[ch]++;
Step #2: Now you need to use the histogram in order to build a Huffman tree:
Create an array of Node pointers, one for each entry in histogram with a value greater than 0.
Take this array and build a binary heap with the minimal value at the top.
Run the following algorithm, until there is one element left in the heap:
Extract the first two Node elements from the heap.
Create a new Node whose children are these two Node elements.
Insert the new Node back into the heap.
Step #3: Now that you have a Huffman tree, please note the following:
In order to encode the file, you need to use the leaves of the tree (given in the array of Node pointers created at the beginning of the previous step).
In order to decode the file, you need to use the root of the tree (which is the last element left in the heap at the end of the previous step).
You can see a full example at:
http://planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=9737&lngWId=3.

Print binary tree nodes

I'm programming a BinaryTree project. I finished all (insert, delete, create, find) but one function, the printing operation. I'm supposed to print it like this:
5
46
X557
XXX6XXX9
Basically print all the nodes, but print an X if the node is empty. I've been trying to figure out how to do this and I keep hitting a dead end. Would this be something like inorder-traversal?? Thank you
Use a Level-Order traversal (Breadth First Search) printing each node as you go through a level, with a newline at the end of each level.
You can find BFS pseudo-code here
You can use BFS but with a slight modification:
In simple BFS, after visiting a node you add its children to the queue. If no children,
nothing is added.
For your problem, if there are no children for a node that is visited, add a special node to the queue with its value as "x" so that it will print the "X" in your output correspondingly. Print a newline after each level.
As Dream Lane said, BFS would work here. I offered my own JAVA implementation here for your reference.
public static void printBST(Node root) {
// empty tree
if (root == null)
return;
Queue<Node> que = new LinkedList<Node>();
que.add(root);
boolean allChildNull = false;// end condition
while (que.size() > 0 && !allChildNull) {
allChildNull = true;
Queue<Node> childQue = new LinkedList<Node>();
for (Node n : que) {
// print out noe value, X for null
if (n == null)
System.out.printf("%1$s", "X");
else
System.out.printf("%1$s", n.value);
// add next level child nodes
if (n == null) {
childQue.add(null);
childQue.add(null);
} else {
childQue.add(n.left);
childQue.add(n.right);
if (n.left != null || n.right != null)
allChildNull = false;
}
}
System.out.printf("\n");// newline
que = childQue;
}
}

Tree search function

Any node can have any number of children. To search this tree i wrote something like this
function Search(key, nodes){
for (var i = 0; i < nodes.length; i++) {
if (nodes[i].key == key) {
return nodes[i];
}
if (nodes[i].hasOwnProperty('children')) {
return this.Search(key, nodes[i].children);
}
}
which doesn't quite work...any input?
You only recursively search the first node that has children.
You should rewrite that last conditional to something like this:
if (nodes[i].hasOwnProperty('children')) {
var node = this.Search(key, nodes[i].children);
if(node != null)
return node;
}
You also need to add a case for if the node is not found - for example, a return null at the very bottom of the function.
You seem to be missing a base case. What happens when you encounter a node that has no children and also is not the node you're looking for?
If this is Javascript, this in your code this.Search is probably what's giving you the problem. this means "the current Function object." Try replacing this.Search with just Search.