Construct binary tree from s-expression in c++ - c++

empty tree ::= ()
tree ::= empty tree | (w tree tree)
ex:
()
empty tree
(99(5()())(35(-5()())()))
99
/ \
5 35
/
-5
class Node
{
public:
int weight; // weight can be negative!
Node *left, *right;
Node():weight(0),left(NULL),right(NULL){}
Node(int d):weight(d),left(NULL),right(NULL){}
};
Construct a binary tree by given condition
I get problem with construct it, my program will crush and I have no idea about why it happened, the following is my code and I print out some information for debug, take (99(5()())(35(-5()())())) as a test case, it will print out 99(5( and crush, I think maybe problem is at which I deal with ) where I return node which is NULL, but I can’t find problem with it. By the way, this tree is expected to handle HUNDREDS of nodes in each tree, and Each of the test cases contains up to TEN-THOUSAND trees, will I run out of time with this program or what should I need to do?Thank for your time
Node* MyBinaryTreeOps::constructTree(Node *root, std::string treeStr)const
{
int idex = 1;//always look at the treeStr[1]
Node *cur=NULL;//use to pass in recursive call
if(treeStr[idex]!='('&&treeStr[idex]!=')'){//meet number create new node
stringstream ss;
while(treeStr[idex]!='('){
ss<<treeStr[idex];
if(treeStr.size()>1){//if size > 1 then remove the treeStr[1],to let treeStr[1] become next char in treeStr
treeStr.erase(1,1);
}
}
int num=0;
ss>>num;
std::cout<<num<<std::endl;//print out just for debug
std::cout<<treeStr[idex]<<std::endl;//print out just for debug
root = new Node(num);
}
if(treeStr[idex]==')'){//meet ')' return subtree constructed
if(treeStr.size()>1){
treeStr.erase(1,1);
}
return root;
}
if(treeStr[idex]=='('){//meet first '(' then construct left subtree
if(treeStr.size()>1){
treeStr.erase(1,1);
}
root->left = constructTree(cur,treeStr);
}
if(treeStr[idex]=='('){ //meet second '(' then construct right subtree
if(treeStr.size()>1){
treeStr.erase(1,1);
}
root->right = constructTree(cur,treeStr);
}
if(treeStr[idex]==')'){ //meet ')' return subtree constructed
if(treeStr.size()>1){
treeStr.erase(1,1);
}
return root;
}
}

I've tried this problem by myself and this is the function that I've wrote.
Steps of the algorithm:
Find a part of the sequence that represents weight of current node. Convert it to int and assign to node.
Slice string to remove weight, starting and ending brace.
Iterate over sequence to find point between two braces that divides children nodes.
Split children string into two sequences (We can slice starting tree and reuse it as sequence of one of the children nodes).
If child node has weight (length of its sequence is larger than 2) then create new node and recurse algorithm.
Additionally, here is my program with some test examples and a little bit extended Node class:
Node* constructTree(Node* root, std::string& treeString) {
// Find the weight of this node.
auto weightLeft = treeString.find_first_of("(") + 1;
auto weightRight = treeString.find_first_of("()", weightLeft);
auto weightString = treeString.substr(weightLeft, weightRight - weightLeft);
// Optional, we check if there is any weight, if there is not we leave zero
// weight from constructor.
// Works for something like that: ((1)(2)) -> (0(1)(2))
if (weightString.length() > 0) {
root->weight = std::stoi(weightString);
}
// Slice string to contain only children sequences.
treeString.erase(0, weightRight);
treeString.erase(treeString.length() - 1, 1);
// Looking for index in string where a left child ends and a right child starts.
// This point(index) is located where count of left braces and for braces
// is the same and the counts are not zero.
int splitPoint = -1;
int leftBraces = 0, rightBraces = 0;
for (int index = 0; index < treeString.length(); index++) {
char c = treeString[index];
if (c == '(') {
++leftBraces;
}
if (c == ')') {
++rightBraces;
}
if (leftBraces == rightBraces) {
splitPoint = index + 1;
break;
}
}
// If split point has been found then it means that this node has children.
if (splitPoint != -1) {
auto leftChildString = treeString.substr(0, splitPoint);
auto rightChildString = treeString.erase(0, splitPoint);
// Check for length so construct will stop if there is no child.
if (leftChildString.length() > 2) {
root->left = new Node();
constructTree(root->left, leftChildString);
}
if (rightChildString.length() > 2) {
root->right = new Node();
constructTree(root->right, rightChildString);
}
}
return root;
}

Related

Recursive function to calculate time complexity

I'm learning to calculate time complexity. I can calculate a simple case but I wonder how to calculate the time complexity of a recursive function. In normal function, it's clearly done knowing how many comparison, swaps,. . . In recursive function, I don't know how to count all cases that can happen because when I write a recursive function, I am basing it on some theory like "thinking recursively" that assumes that my function already works ... so I always confuse when calculating the time of the recursive function. The example function below that I wrote:
#define MAX 26 // represent 26 letter in alphabet
//--------- TRIE STRUCTURE-----------
struct Trie {
Trie* m_arr[MAX]; // leaves
bool m_isend; // check that leaf is the end of a word
string m_path; // store path from root to that leaf
Trie() { //constructor
m_path = ""; // path <empty>
for (int i = 0; i < MAX; i++) { // initialize leaves <NULL>
m_arr[i] = NULL;
}
m_isend = false; // check the end of word <false>
}
};
//-------------------
//-------INIT NODE---------
Trie* getNode() { // Dynamic a node
return(new Trie);
}
//-------------------
//----------INSERT WORD-------------
void insertWord(Trie* root, string str, string path = "") { // insert a word into trie (root<trie>, str<string want to insert>, path<store path of parent>)
if (str.length() == 0) { // if data is inputed has length equal to 0 -> return
return;
}
if (!root) { // if the trie that we want to add new word is empty, we will init it with NULL leaves
root = getNode();
}
if (!root->m_arr[str[0] - 'a']) { // the leaf that we want to jump to is NULL we have to init it
root->m_arr[str[0] - 'a'] = getNode();
}
if (root->m_arr[str[0] - 'a']->m_path.length() == 0) { // the leaf that we want to jump to has empty path -
// we have to add its parent's path to its path
root->m_arr[str[0] - 'a']->m_path += path;
root->m_arr[str[0] - 'a']->m_path.push_back(str[0]);
}
if (str.length() > 1) { //if lenght of word still great than 1, countinous with properties leaf
insertWord(root->m_arr[str[0] - 'a'], str.substr(1), root->m_arr[str[0] - 'a']->m_path);
}
else { //else finish process by turn flag (end of a word)
root->m_arr[str[0] - 'a']->m_isend = true;
}
}
//---------------
hope you can explain to me how to calculate time complexity of above function?

Adding a node to a complete tree

I'm trying to make complete tree from scratch in C++:
1st node = root
2nd node = root->left
3rd node = root->right
4th node = root->left->left
5th node = root->left->right
6th node = root->right->left
7th node = root->right->right
where the tree would look something like this:
NODE
/ \
NODE NODE
/ \ / \
NODE NODE NODE NODE
/
NEXT NODE HERE
How would I go about detecting where the next node would go so that I can just use one function to add new nodes? For instance, the 8th node would be placed at root->left->left->left
The goal is to fit 100 nodes into the tree with a simple for loop with insert(Node *newnode) in it rather than doing one at a time. It would turn into something ugly like:
100th node = root->right->left->left->right->left->left
Use a queue data structure to accomplish building a complete binary tree. STL provides std::queue.
Example code, where the function would be used in a loop as you request. I assume that the queue is already created (i.e. memory is allocated for it):
// Pass double pointer for root, to preserve changes
void insert(struct node **root, int data, std::queue<node*>& q)
{
// New 'data' node
struct node *tmp = createNode(data);
// Empty tree, initialize it with 'tmp'
if (!*root)
*root = tmp;
else
{
// Get the front node of the queue.
struct node* front = q.front();
// If the left child of this front node doesn’t exist, set the
// left child as the new node.
if (!front->left)
front->left = tmp;
// If the right child of this front node doesn’t exist, set the
// right child as the new node.
else if (!front->right)
front->right = tmp;
// If the front node has both the left child and right child, pop it.
if (front && front->left && front->right)
q.pop();
}
// Enqueue() the new node for later insertions
q.push(tmp);
}
Suppose root is node#1, root's children are node#2 and node#3, and so on. Then the path to node#k can be found with the following algorithm:
Represent k as a binary value, k = { k_{n-1}, ..., k_0 }, where each k_i is 1 bit, i = {n-1} ... 0.
It takes n-1 steps to move from root to node#k, directed by the values of k_{n-2}, ..., k_0, where
if k_i = 0 then go left
if k_i = 1 then go right
For example, to insert node#11 (binary 1011) in a complete tree, you would insert it as root->left->right->right (as directed by 011 of the binary 1011).
Using the algorithm above, it should be straightforward to write a function that, given any k, insert node#k in a complete tree to the right location. The nodes don't even need to be inserted in-order as long as new nodes are detected created properly (i.e. as the correct left or right children, respectively).
Assuming tree is always complete we may use next recursion. It does not gives best perfomance, but it is easy to understand
Node* root;
Node*& getPtr(int index){
if(index==0){
return root;
}
if(index%2==1){
return (getPtr( (index-1)/2))->left;
}
else{
return (getPtr( (index-2)/2))->right;
}
}
and then you use it like
for(int i = 0; i<100; ++i){
getPtr(i) = new Node( generatevalue(i) );
}
private Node addRecursive(*Node current, int value) {
if (current == null) {
return new Node(value);
}
if (value < current.value) {
current->left = addRecursive(current->left, value);
} else if (value > current->value) {
current->right = addRecursive(current->right, value);
} else {
// value already exists
return current;
}
return current;
}
I do not know that if your Nodes has got a value instance but:
With this code you can have a sorted binary tree by starting from the root.
if the new node’s value is lower than the current node’s, we go to the left child. If the new node’s value is greater than the current node’s, we go to the right child. When the current node is null, we’ve reached a leaf node and we can insert the new node in that position.

Build minimum height BST from a sorted std::list<float> with C++

I'm having trouble writing the code to build minimum height BST from a sorted std::list.
For the node class:
class cBTNode
{
private:
cBTNode* m_LeftChild;
cBTNode* m_RightChild;
float m_Data;
}
For the BST class:
class cBTNodeTree
{
private:
cBTNode* m_Root;
public:
void LoadBalancedMain(std::list<float>& ls);
void LoadBalanced(std::list<float>& ls, cBTNode* root);
}
Implementation: (basically my method is to find the middle element of the list ls, put that into the root, put all the elements smaller than the middle element into ls_left, and all the elements bigger than it into ls_right. Then recursively build up the left and right subtree by recursively calling the same function on ls_left and ls_right)
void cBTNodeTree::LoadBalancedMain(std::list<float>& ls)
{
LoadBalanced(ls, m_Root); // m_Root is the root of the tree
}
void cBTNodeTree::LoadBalanced(std::list<float>& ls, cBTNode* root)
{
// Stopping Condition I:
if (ls.size() <= 0)
{
root = nullptr;
return;
}
// Stopping Condition II:
if (ls.size() == 1)
{
root = new cBTNode(ls.front());
return;
}
// When we have at least 2 elements in the list
// Step 1: Locate the middle element
if (ls.size() % 2 == 0)
{
// Only consider the case of even numbers for the moment
int middle = ls.size() / 2;
std::list<float> ls_left;
std::list<float> ls_right;
int index = 0;
// Obtain ls_left consisting elements smaller than the middle one
while (index < middle)
{
ls_left.push_back(ls.front());
ls.pop_front();
index += 1;
}
// Now we reach the middle element
root = new cBTNode(ls.front());
ls.pop_front();
// The rest is actually ls_right
while (ls.size() > 0)
{
ls_right.push_back(ls.front());
ls.pop_front();
}
// Now we have the root and two lists
cBTNode* left = root->GetLeftChild();
cBTNode* right = root->GetRightChild();
if (ls_left.size() > 0)
{
LoadBalanced(ls_left, left);
root->SetLeftChild(left);
}
else
{
left = nullptr;
}
if (ls_right.size() > 0)
{
LoadBalanced(ls_right, right);
root->SetRightChild(left);
}
else
{
right = nullptr;
}
}
}
My Question: Somehow I found that actually none of the elements has been inserted into the tree. For example, if I check the value of m_Root, the root of the tree, I got an error because it's still nullprt. I'm not sure where did I go wrong? I hope it's some stupid pointer mistake because I haven't slept well. (I'm pretty sure the 'new cBTNode(ls.front())' line works)
BTW although I have written a dozen functions for the BST, I'm still struggling with BST recursion. I noticed that in all the textbooks that I read, for the linked list version of BST, the insertion ALWAYS need a helper function that return a pointer to a node. I begin to feel that I don't actually understand the things going on behind the recursion...
1:
void cBTNodeTree::LoadBalanced(std::list<float>& ls, cBTNode* root)
Here cBTNode* root is passed by value.
Instead, you should pass by reference & or cBTNode** (pointer to a pointer).
Passing by reference would be simple, you won't need to change anything except the function signature.
void cBTNodeTree::LoadBalanced(std::list<float>& ls, cBTNode*& root)
Notice & before root in above statement.
2:
if (ls_right.size() > 0)
{
LoadBalanced(ls_right, right);
root->SetRightChild(left);
}
You are setting right child to left root which is not what you desire.
3:
cBTNode* left = root->GetLeftChild();
cBTNode* right = root->GetRightChild();
These are unnecessary.
4:
if (ls.size() % 2 == 0)
No need for two separate cases.
You can achieve this by just appropriately setting middle:
int middle = (ls.size()-1) / 2;
You pass the pointer to the root by value. Pass it by reference instead by changing the signature of LoadBalanced() appropriately.

Huffman Tree, recursive function crashes (pointers are not relayed correctly)

struct node {
float weight;
char value;
node* left_child;
node* right_child;
};
void get_codes(node tree, std::string code, std::map<char, std::string> &codes)
{
if(!tree.left_child && !tree.right_child) // leap node
codes[tree.value] = code;
else
{
get_codes(*tree.left_child, code + "0", codes);
get_codes(*tree.right_child, code + "1", codes);
}
}
int main()
{
std::string test {"this is an example of a huffman tree"};
std::vector<char> alphabet = get_alphabet(test);
std::vector<float> weights = get_weights(test, alphabet);
std::priority_queue<node, std::vector<node>, is_node_greater> heap;
for(int i=0; i<alphabet.size(); i++)
{
node x;
x.weight = weights[i];
x.value = alphabet[i];
x.left_child = nullptr;
x.right_child = nullptr;
heap.push(x);
}
while(heap.size() > 1) {
node fg = heap.top(); heap.pop();
node fd = heap.top(); heap.pop();
node parent;
parent.weight = fg.weight + fd.weight;
parent.left_child = &fg;
parent.right_child = &fd;
heap.push(parent);
}
node tree = heap.top(); // our huffman tree
std::map<char, std::string> codes;
get_codes(tree, "", codes);
}
In the first loop, I build a heap (a priority queue) containing all the leap nodes, ie no left child, no right child (nullptr).
In the second loop, while the heap contains more than one node, I take the two with the smallest weights and I create a parent node with these two nodes as children. The parent node's weight is the sum of the two children's.
Then I have my huffman tree, and I have to get huffman codes. That is to say, I need to get a binary code for each leap node assuming bit '0' represents following the left child and bit '1' represents following the right child.
That's what my function get_codes should do, and where the crash occurs. It never enters the 'if' statement so recursivity never stops, so I think either it never comes to leap nodes but it should because each time the function is called on a child tree ; or the leap nodes/nullptr have been lost..? I'm new at C++ so I'm not very experienced with pointers, but this is how I would do the function in an other language.

Understanding the logic to extract frequency from a binary file to create huffman tree

I have to claculate frequency from a binary files.
What i have in mind is i will do read the characters present in the file and then calculate frequency by the number of times that character repeats.
I do so using this code. And it works fine:
struct Node
{
unsigned char symbol;
int appear;
struct Node *link;
struct Node * left,*right;
};Node * head;
Somewhere in main i have like this to read the file:
ch = fgetc(fp);
while (fread(&ch,sizeof(ch),1,fp))
{
symbol(ch);
}
fclose(fp);
where add_symbol function is like this:
But i am not able to understand the logic of this code. Could any one please explain the questions i have asked in the code?
symbol(unsigned char sym)
{
Node*pt,*pt,*t;
int is_there=0;
pt = pt = head;
while (pt != NULL)
{
if (pt -> symbol == sym)
{
pt -> appear++;
is_there = 1;
break;
}
pt = pt;
pt = pt -> link;
}
if (!is_there)
{
// printf("\n is_there2 : %d\n",!is_there);
printf("sym2 : %d\n", sym);
t = (Node *) malloc(sizeof( Node));
t -> symbol = sym;
t -> appear = 1;
t -> left = NULL;
t -> right = NULL;
t->link = NULL;
if (head == NULL)
{
head = temp;
}
else
{
pt->link = temp;
}
}
}
To find the same frequency we need to first store the all the data somewhere.
(1) Where it is done ?
(2) We need to compare the symbol if that appears again or not?
(3) Please explain bit more the code the logic is same in c and c++ as well. So any language , No problems.
In explain i have the doubt that:
suppose 1 2 1 3 3 1 2 are the symbols in binary file.
On first time execution of addsymbol we do addsymbol(1); , Now we store the "1" to know if any other "1" comes in future or not ?
so we do pt->symbol if again equals to "1" then we increase frequency by one.
But on second execution of addsymbol we do addsymbol(2); which is not equal to "1" so again repeat.
On third time execution i got addsymbol(1); , this time i got "1" which equals to the "1" stored previously, so increases the frequency by "1".
What about the previous "2" ? Because we read the file only once by doing
while (fread(&ch,sizeof(ch),1,fp))
{
add_symbol(ch);
}
and if the "2" is already passed then we will not be able to count it. How this code persists this "2" and also finds it's frequency Please do not hesitate to ask me if you still don't undersand my question ?
The code does not store all the data, it only stores the symbols and counts in a linked list.
The code reads one symbol at a time, calling add_symbol() for each. The add_symbol function starts by looking up the symbol in its linked list. If the symbol is there, the function will just increment its count; otherwise, it will add the symbol to the tail of the list, and with a count of 1.
Edit: By request, here's how it would look if it were more decomposed:
void Huffman::add_symbol(unsigned char sym)
{
Node * foundNode = find_node_in_linked_list(sym);
if(foundNode != NULL)
foundNode->freq++;
else
add_freq1_node_at_end_of_list(sym);
}
Node* Huffman::find_node_in_linked_list(unsigned char sym)
{
Node* pCur = Start;
while(pCur != NULL)
{
if(pCur->symbol == ch)
return pCur;
pCur = pCur->next;
}
return NULL;
}
void Huffman::add_freq1_node_at_end_of_list(unsigned char sym)
{
//Get tail of list
Node* pTail = NULL;
Node* pCur = Start;
while(pCur != NULL)
{
pTail = pCur;
pCur = pCur->next;
}
//Now, pTail is either the last element, or NULL if the list is empty.
//Create the new object
//(should use the new keyword instead, but since the deletion code was not posted...
Node* pNew = static_cast< Node* >(malloc(sizeof *pNew));
if(pNew == NULL)
return;
pNew->symbol = sym;
pNew->freq = 1;
pNew->left = NULL;
pNew->right = NULL;
pNew->next = NULL;
pNew->is_processed = 0;
//Add the new node at the tail
if(pTail != NULL)
pTail->next = pNew;
else
Start = pNew;
}
Note that it's less efficient than the big function because it goes through the list twice when the symbol is not found (once to try and find the symbol, once to find the tail).
In fact, there's no reason to specifically add at the tail rather than insert at the head.
Quite frankly a linked list is not the most time-efficient way of storing the counts for up to 256 symbols. Personnally I'd recommend using a lookup table instead (a dumb vector of 256 structures, or even a dedicated histogram object that would just be a vector of 256 integers).
A few advices on your general design:
Step #1: In order to count the symbols, you can use a simple histogram:
include <limits.h>
int histogram[1<<CHAR_BIT] = {0};
unsigned char ch;
while (fread(&ch,sizeof(ch),1,fp))
histogram[ch]++;
Step #2: Now you need to use the histogram in order to build a Huffman tree:
Create an array of Node pointers, one for each entry in histogram with a value greater than 0.
Take this array and build a binary heap with the minimal value at the top.
Run the following algorithm, until there is one element left in the heap:
Extract the first two Node elements from the heap.
Create a new Node whose children are these two Node elements.
Insert the new Node back into the heap.
Step #3: Now that you have a Huffman tree, please note the following:
In order to encode the file, you need to use the leaves of the tree (given in the array of Node pointers created at the beginning of the previous step).
In order to decode the file, you need to use the root of the tree (which is the last element left in the heap at the end of the previous step).
You can see a full example at:
http://planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=9737&lngWId=3.