Using a huffman tree to decode binary text - c++

The snippet below is not returning the correct text. The code takes in a pointer to the root node of a Huffman code tree and a binary text, which it then converts. However, every time it returns a single letter repeated.
string decode(Node *root, string code) {
string d = ""; char c; Node *node = root;
for (int i = 0; i < code.size(); i++) {
node = (code[i] == '0') ? node->left_child : node->right_child;
if ((c = node->value) < 128) {
d += c;
node = root;
}
}
return d;
}
The code for the Node object:
class Node {
public:
Node(int i, Node *l = nullptr, Node *r = nullptr) {
value = i;
left_child = l;
right_child = r;
}
int value;
Node *left_child;
Node *right_child;
};
The code for building the tree:
Node* buildTree(vector<int> in, vector<int> post, int in_left, int in_right, int *post_index) {
Node *node = new Node(post[*post_index]);
(*post_index)--;
if (in_left == in_right) {
return node;
}
int in_index;
for (int i = in_left; i <= in_right; i++) {
if (in[i] == node->value) {
in_index = i;
break;
}
}
node->right_child = buildTree(in, post, in_index + 1, in_right, post_index);
node->left_child = buildTree(in, post, in_left, in_index - 1, post_index);
return node;
}
Example tree:
130
/ \
129 65
/ \
66 128
/ \
76 77
Example I/O:
Input: 101010010111
Output: A�A�A��A�AAA
The diamond characters are the numbers greater than 128.

You are putting the value in a char, which for most C++ compilers is signed. But not all -- whether char is signed or unsigned is implementation defined. A signed char is in the range –128 to 127, so it is always less than 128. (Your compiler should have warned you about that.)
You need to use int c; instead of char c; in decode(), and do d += (char)c;. Then your first code snippet will correctly return ALABAMA.
By the way, there needs to be an error check in decode(), verifying that you exit the loop with node equal to root. Otherwise, there were some bits provided that ended in the middle of a code, and so were not decoded to a symbol.

Related

Why does random extra code improve performance?

Struct Node {
Node *N[SIZE];
int value;
};
struct Trie {
Node *root;
Node* findNode(Key *key) {
Node *C = &root;
char u;
while (1) {
u = key->next();
if (u < 0) return C;
// if (C->N[0] == C->N[0]); // this line will speed up execution significantly
C = C->N[u];
if (C == 0) return 0;
}
}
void addNode(Key *key, int value){...};
};
In this implementation of Prefix Tree (aka Trie) I found out that 90% of findNode() execution time is taken by a single operation C=C->N[u];
In my attempt to speed up this code, I randomly added the line that is commented in the snipped above, and code became 30% faster! Why is that?
UPDATE
Here is complete program.
#include "stdio.h"
#include "sys/time.h"
long time1000() {
timeval val;
gettimeofday(&val, 0);
val.tv_sec &= 0xffff;
return val.tv_sec * 1000 + val.tv_usec / 1000;
}
struct BitScanner {
void *p;
int count, pos;
BitScanner (void *p, int count) {
this->p = p;
this->count = count;
pos = 0;
}
int next() {
int bpos = pos >> 1;
if (bpos >= count) return -1;
unsigned char b = ((unsigned char*)p)[bpos];
if (pos++ & 1) return (b >>= 4);
return b & 0xf;
}
};
struct Node {
Node *N[16];
__int64_t value;
Node() : N(), value(-1) { }
};
struct Trie16 {
Node root;
bool add(void *key, int count, __int64_t value) {
Node *C = &root;
BitScanner B(key, count);
while (true) {
int u = B.next();
if (u < 0) {
if (C->value == -1) {
C->value = value;
return true; // value added
}
C->value = value;
return false; // value replaced
}
Node *Q = C->N[u];
if (Q) {
C = Q;
} else {
C = C->N[u] = new Node;
}
}
}
Node* findNode(void *key, int count) {
Node *C = &root;
BitScanner B(key, count);
while (true) {
char u = B.next();
if (u < 0) return C;
// if (C->N[0] == C->N[1]);
C = C->N[0+u];
if (C == 0) return 0;
}
}
};
int main() {
int T = time1000();
Trie16 trie;
__int64_t STEPS = 100000, STEP = 500000000, key;
key = 0;
for (int i = 0; i < STEPS; i++) {
key += STEP;
bool ok = trie.add(&key, 8, key+222);
}
printf("insert time:%i\n",time1000() - T); T = time1000();
int err = 0;
key = 0;
for (int i = 0; i < STEPS; i++) {
key += STEP;
Node *N = trie.findNode(&key, 8);
if (N==0 || N->value != key+222) err++;
}
printf("find time:%i\n",time1000() - T); T = time1000();
printf("errors:%i\n", err);
}
This is largely a guess but from what I read about CPU data prefetcher it would only prefetch if it sees multiple access to the same memory location and that access matches prefetch triggers, for example looks like scanning. In your case if there is only single access to C->N the prefetcher would not be interested, however if there are multiple and it can predict that the later access is further into the same bit of memory that can make it to prefetch more than one cache line.
If the above was happening then C->N[u] would not have to wait for memory to arrive from RAM therefore would be faster.
It looks like what you are doing is preventing processor stalls by delaying the execution of code until the data is available locally.
Doing it this way is very error prone unlikely to continue working consistently. The better way is to get the compiler to do this. By default most compilers generate code for a generic processor family. BUT if you look at the available flags you can usually find flags for specifying your specific processor so it can generate more specific code (like pre-fetches and stall code).
See: GCC: how is march different from mtune? the second answer goes into some detail: https://stackoverflow.com/a/23267520/14065
Since each write operation is costly than the read.
Here If you see that,
C = C->N[u]; it means CPU is executing write in each iteration for the variable C.
But when you perform if (C->N[0] == C->N[1]) dummy++; write on dummy is executed only if C->N[0] == C->N[1]. So you have save many write instructions of CPU by using if condition.

Why this code failed to run

i want to generate a tree of siblings as under
ABCD
/ | \ \
A B C D
ABCD has four nodes i have taken a array for this *next[]. but this code does not run successfully but it produces the sequence. i have written code in main() which provide characters to the enque function. e.g. str.at(x) where x is variable in for loop.
struct node
{
string info;
struct node *next[];
}*root,*child;
string str, goal;
int dept=0,bnod=0,cl,z=0;
void enqueue(string n);
void enqueue(string n)
{
node *p, *temp;
p=new node[sizeof(str.length())];
p->info=n;
for (int x=0;x<str.length();x++)
p->next[x]=NULL;
if(root==NULL)
{
root=p;
child=p;
}
else
{
cout<<" cl="<<cl<<endl;
if(cl<str.length())
{
child->next[cl]=p;
temp=child->next[cl];
cout<<"chile-info "<<temp->info<<endl;
}
else
cout<<" clif="<<cl<<endl;
}
}
OUTPUT
Enter String: sham
cl=0
chile-info s
cl=1
chile-info h
cl=2
chile-info a
cl=3
chile-info m
RUN FAILED (exit value 1, total time: 2s)
Firstly, where does "RUN FAILED" come from? Is that specific to your compiler?
Secondly, about the line p=new node[sizeof(str.length())];, it probably won't give you what you wanted because you're taking the sizeof of an unsigned integer ( which, depending on your platform is likely to give you 4 regardless of the string length. Which is not what you're after - you want the actual length of the string ).
So - since you're already using std::string, why not use std::vector? Your code would look a lot friendlier :-)
If I take the first couple of lines as your desired output ( sorry, the code you posted is very hard to decipher, and I don't think it compiles either, so I'm ignoring it ;-) )
Would something like this work better for you?
#include <iostream>
#include <vector>
#include <string>
typedef struct node
{
std::string info;
std::vector<struct node*> children;
}Node;
Node * enqueue(std::string str)
{
Node * root;
root = new Node();
root->info = str;
for (int x = 0; x < str.length(); x++)
{
Node * temp = new Node();
temp->info = str[x];
root->children.push_back(temp);
}
return root;
}
int main()
{
Node * myRoot = enqueue("ABCD");
std::cout << myRoot->info << "\n";
for( int i = 0; i < myRoot->children.size(); i++)
{
std::cout << myRoot->children[i]->info << ", ";
}
char c;
std::cin >> c;
return 0;
}
Your code seems not full.
At least the line
p=new node[sizeof(str.length())];
seems wrong.
I guess enqueue should be something similar to the following:
struct node
{
string info;
struct node *next; // [] - is not necessary here
}*root,*child;
string str, goal;
int dept=0,bnod=0,cl,z=0;
void enqueue(string n)
{
node *p, *temp;
p = new node;
p->next = new node[str.length()];
p->info=n;
for (int x=0;x<str.length();x++)
{
p->next[x] = new node;
p->next[x]->next = 0;
p->next[x]->info = str[x];
}
if(root==NULL)
{
root=p;
child=p;
}
}
Please provide more info to give a more correct answer

Trie Implementation in C++

I am trying to implement the trie as shown on the TopCoder page. I am modifying it a bit to store the phone numbers of the users. I am getting segmentation fault. Can some one please point out the error.
#include<iostream>
#include<stdlib.h>
using namespace std;
struct node{
int words;
int prefix;
long phone;
struct node* children[26];
};
struct node* initialize(struct node* root) {
root = new (struct node);
for(int i=0;i<26;i++){
root->children[i] = NULL;
}
root->word = 0;
root->prefix = 0;
return root;
}
int getIndex(char l) {
if(l>='A' && l<='Z'){
return l-'A';
}else if(l>='a' && l<='z'){
return l-'a';
}
}
void add(struct node* root, char * name, int data) {
if(*(name)== '\0') {
root->words = root->words+1;
root->phone = data;
} else {
root->prefix = root->prefix + 1;
char ch = *name;
int index = getIndex(ch);
if(root->children[ch]==NULL) {
struct node* temp = NULL;
root->children[ch] = initialize(temp);
}
add(root->children[ch],name++, data);
}
}
int main(){
struct node* root = NULL;
root = initialize(root);
add(root,(char *)"test",1111111111);
add(root,(char *)"teser",2222222222);
cout<<root->prefix<<endl;
return 0;
}
Added a new function after making suggested changes:
void getPhone(struct node* root, char* name){
while(*(name) != '\0' || root!=NULL) {
char ch = *name;
int index = getIndex(ch);
root = root->children[ch];
++name;
}
if(*(name) == '\0'){
cout<<root->phone<<endl;
}
}
Change this:
add(root->children[ch], name++, data);
// ---------------------^^^^^^
To this:
add(root->children[ch], ++name, data);
// ---------------------^^^^^^
The remainder of the issues in this code I leave to you, but that is the cause of your run up call-stack.
EDIT OP ask for further analysis, and while I normally don't do so, this was a fairly simple application on which to expand.
This is done in several places:
int index = getIndex(ch);
root = root->children[ch];
... etc. continue using ch instead of index
It begs the question: "Why did we just ask for an index that we promptly ignore and use the char anyway?" This is done in add() and getPhone(). You should use index after computing it for all peeks inside children[] arrays.
Also, the initialize() function needs to be either revamped or outright thrown out in favor of a constructor-based solution, where that code truly belongs. Finally, if this trie is supposed to be tracking usage counts of words generated and prefixes each level is participating in, I'm not clear why you need both words and prefix counters, but in either case to update the counters your recursive decent in add() should bump them up on the back-recurse.

Creating an n array with a linked list of ints

I recently made an 26array and tried to simulate a dictionary.
I can't seem to figure out how to make this. I've tried to work with passing in a linkedlist of ints instead of a string. My current code creates 26 nodes(a-z) and then each of those nodes has 26 nodes(a-z). I would like to implement a way to do this with ints, say (1-26). These int nodes will represent items, and the linkedlist of ints I want to pass in will contain a set of ints that I want represented in the tree similar to a string.
Example: pass in the set {1, 6 , 8}, instead of a string such as "hello"
#include <iostream>
using namespace std;
class N26
{
private:
struct N26Node
{
bool isEnd;
struct N26Node *children[26];
}*head;
public:
N26();
~N26();
void insert(string word);
bool isExists(string word);
void printPath(char searchKey);
};
N26::N26()
{
head = new N26Node();
head->isEnd = false;
}
N26::~N26()
{
}
void N26::insert(string word)
{
N26Node *current = head;
for(int i = 0; i < word.length(); i++)
{
int letter = (int)word[i] - (int)'a';
if(current->children[letter] == NULL)
{
current->children[letter] = new N26Node();
}
current = current->children[letter];
}
current->isEnd = true;
}
/* Pre: A search key
* Post: True is the search key is found in the tree, otherwise false
* Purpose: To determine if a give data exists in the tree or not
******************************************************************************/
bool N26::isExists(string word)
{
N26Node *current = head;
for(int i=0; i<word.length(); i++)
{
if(current->children[((int)word[i]-(int)'a')] == NULL)
{
return false;
}
current = current->children[((int)word[i]-(int)'a')];
}
return current->isEnd;
}
class N26
{
private:
N26Node newNode(void);
N26Node *mRootNode;
...
};
N26Node *newNode(void)
{
N26Node *mRootNode = new N26Node;
mRootNode = NULL;
mRootNode->mData = NULL;
for ( int i = 0; i < 26; i++ )
mRootNode->mAlphabet[i] = NULL;
return mRootNode;
}
Ah! My eyes!
Seriously, you are attempting something much too advanced. Your code is full of bugs and cannot work as intended. Tinkering will not help, you must go back to basics of pointers and linked lists. Study the basics and do not attempt anything like a linked list of linked lists until you understand what is wrong with the code above.
I'll give you some hints: "memory leak", "dangling pointer", "type mismatch", "undefined behavior".
I didnt quite use linked lists, but I managed to get it working using arrays.
/* *** Author: Jamie Roland
* Class: CSI 281
* Institute: Champlain College
* Last Update: October 31, 2012
*
* Description:
* This class is to implement an n26 trie. The
* operations
* available for this impementation are:
*
* 1. insert
* 2. isEmpty
* 3. isExists
* 4. remove
* 5. showInOrder
* 6. showPreOrder
* 7. showPostOrder
*
* Certification of Authenticity:
* I certify that this assignment is entirely my own work.
**********************************************************************/
#include <iostream>
using namespace std;
class N26
{
private:
struct N26Node
{
bool isEnd;
struct N26Node *children[26];
}*head;
public:
N26();
~N26();
void insert(int word[]);
bool isExists(int word[]);
void printPath(char searchKey);
};
N26::N26()
{
head = new N26Node();
head->isEnd = false;
}
N26::~N26()
{
}
void N26::insert(int word[])
{
int size = sizeof word/sizeof(int);
N26Node *current = head;
for(int i = 0; i < size; i++)
{
int letter = word[i] - 1;
if(current->children[letter] == NULL)
{
current->children[letter] = new N26Node();
}
current = current->children[letter];
}
current->isEnd = true;
}
/* Pre: A search key
* Post: True is the search key is found in the tree, otherwise false
* Purpose: To determine if a give data exists in the tree or not
******************************************************************************/
bool N26::isExists(int word[])
{
int size = sizeof word/sizeof(int);
N26Node *current = head;
for(int i=0; i<size; i++)
{
if(current->children[(word[i]-1)] == NULL)
{
return false;
}
current = current->children[(word[i]-1)];
}
return current->isEnd;
}

Exponential tree implementation

I was trying to implement exponential tree from documentation, but here is one place in the code which is not clear for me how to implement it:
#include<iostream>
using namespace std;
struct node
{
int level;
int count;
node **child;
int data[];
};
int binary_search(node *ptr,int element)
{
if(element>ptr->data[ptr->count-1]) return ptr->count;
int start=0;
int end=ptr->count-1;
int mid=start+(end-start)/2;
while(start<end)
{
if(element>ptr->data[mid]) { start=mid+1;}
else
{
end=mid;
}
mid=start+(end-start)/2;
}
return mid;
}
void insert(node *root,int element)
{
node *ptr=root,*parent=NULL;
int i=0;
while(ptr!=NULL)
{
int level=ptr->level,count=ptr->count;
i=binary_search(ptr,element);
if(count<level){
for(int j=count;j<=i-1;j--)
ptr->data[j]=ptr->data[j-1];
}
ptr->data[i]=element;
ptr->count=count+1;
return ;
}
parent=ptr,ptr=ptr->child[i];
//Create a new Exponential Node at ith child of parent and
//insert element in that
return ;
}
int main()
{
return 0;
}
Here is a link for the paper I'm referring to:
http://www.ijcaonline.org/volume24/number3/pxc3873876.pdf
This place is in comment, how can I create a new exponential node at level i? Like this?
parent->child[i]=new node;
insert(parent,element);
The presence of the empty array at the end of the structure indicates this is C style code rather than C++ (it's a C Hack for flexible arrays). I'll continue with C style code as idiomatic C++ code would prefer use of standard containers for the child and data members.
Some notes and comments on the following code:
There were a number of issues with the pseudo-code in the linked paper to a point where it is better to ignore it and develop the code from scratch. The indentation levels are unclear where loops end, all the loop indexes are not correct, the check for finding an insertion point is incorrect, etc....
I didn't include any code for deleting the allocated memory so the code will leak as is.
Zero-sized arrays may not be supported by all compilers (I believe it is a C99 feature). For example VS2010 gives me warning C4200 saying it will not generate the default copy/assignment methods.
I added the createNode() function which gives the answer to your original question of how to allocate a node at a given level.
A very basic test was added and appears to work but more thorough tests are needed before I would be comfortable with the code.
Besides the incorrect pseudo-code the paper has a number of other errors or at least questionable content. For example, concerning Figure 2 it says "which clearly depicts that the slope of graph is linear" where as the graph is clearly not linear. Even if the author meant "approaching linear" it is at least stretching the truth. I would also be interested in the set of integers they used for testing which doesn't appear to be mentioned at all. I assumed they used a random set but I would like to see at least several sets of random numbers used as well as several predefined sets such as an already sorted or inversely sorted set.
.
int binary_search(node *ptr, int element)
{
if (ptr->count == 0) return 0;
if (element > ptr->data[ptr->count-1]) return ptr->count;
int start = 0;
int end = ptr->count - 1;
int mid = start + (end - start)/2;
while (start < end)
{
if (element > ptr->data[mid])
start = mid + 1;
else
end = mid;
mid = start + (end - start)/2;
}
return mid;
}
node* createNode (const int level)
{
if (level <= 0) return NULL;
/* Allocate node with 2**(level-1) integers */
node* pNewNode = (node *) malloc(sizeof(node) + sizeof(int)*(1 << (level - 1)));
memset(pNewNode->data, 0, sizeof(int) * (1 << (level - 1 )));
/* Allocate 2**level child node pointers */
pNewNode->child = (node **) malloc(sizeof(node *)* (1 << level));
memset(pNewNode->child, 0, sizeof(int) * (1 << level));
pNewNode->count = 0;
pNewNode->level = level;
return pNewNode;
}
void insert(node *root, int element)
{
node *ptr = root;
node *parent = NULL;
int i = 0;
while (ptr != NULL)
{
int level = ptr->level;
int count = ptr->count;
i = binary_search(ptr, element);
if (count < (1 << (level-1)))
{
for(int j = count; j >= i+1; --j)
ptr->data[j] = ptr->data[j-1];
ptr->data[i] = element;
++ptr->count;
return;
}
parent = ptr;
ptr = ptr->child[i];
}
parent->child[i] = createNode(parent->level + 1);
insert(parent->child[i], element);
}
void InOrderTrace(node *root)
{
if (root == NULL) return;
for (int i = 0; i < root->count; ++i)
{
if (root->child[i]) InOrderTrace(root->child[i]);
printf ("%d\n", root->data[i]);
}
if (root->child[root->count]) InOrderTrace(root->child[root->count]);
}
void testdata (void)
{
node* pRoot = createNode(1);
for (int i = 0; i < 10000; ++i)
{
insert(pRoot, rand());
}
InOrderTrace(pRoot);
}