How to display our encoded text in a file using Huffman Encoding - c++

In my Huffman Algorithm project, so far I have generated the codes for each character of the input file. I have also stored the characters and their corresponding codes in an unordered map. Now, I want to read our input string, and print the corresponding codes of each character in the output file. However, printing the codes in string format will not compress the file. I want to convert my string code to bits format. I know that we need to use a byte buffer, but I do not know how I will apply this concept to my code. Any help will be very much appreciated.
#include<iostream>
#include<string>
#include<queue>
#include<vector>
#include<bitset>
#include<fstream>
#include<unordered_map>
#include<map>
using namespace std;
struct node
{
char c; //character in the string
int f; //Frequency of character in the string
node* next;
node* left, * right; //left and right child of binary tree respectively
node()
{
f = 0;
left = NULL;
right = NULL;
c = NULL;
next = NULL;
}
};
struct compare {
public:
bool operator()(node* a, node* b) // overloading both operators
{
return a->f > b->f; //To maintain the order of min heap priority queue
}
};
class Huffman
{
string filename; //The name of the file we want to encode
string text; //The text that will be encoded
priority_queue<node*, vector<node*>, compare> pq; //Priority queue that will contian characters of our string and their frequency
string encoded;
unordered_map <char, string> um;
public:
Huffman()
{
text = "";
encoded = "";
}
void FileRead()
{
cout << "Enter the name of the file you want to encode:";
cin >> filename;
fstream readfile(filename, fstream::in);
getline(readfile, text, '\0');
cout << text << endl;
readfile.close();
}
//Function which will calculate the frequency of characters in the string entered by the user
void CharacterFrequency()
{
for (int i = 0; i < text.length(); i++)
{
int sum = 0;
for (int j = 0; j < text.length(); j++)
{
if (j < i and text[i] == text[j])
{
break;
}
if (text[i] == text[j])
{
sum++;
}
}
if (sum != 0)
{
PriorityQueue(text[i], sum);
}
}
}
// This will push our characters and their frequencies into our STL min heap priority queue
void PriorityQueue(char ch, int freq)
{
node* n=new node; //pointer of type node is created
n->c = ch; //Pointer stores character
n->f = freq; //Pointer stores frequency of the character
pq.push(n); //The node is pushed into the priority queue
}
//Will display the whole priority queue. All of the elements will be popped from it as a result.
void PriorityQueueDisplay()
{
while (!pq.empty())
{
cout << (pq.top())->c<<" "<<(pq.top())->f << endl;
pq.pop();
}
}
//This function will create our Huffman Tree from a priority queue
void HuffmanTree()
{
node* n1, * n2; //The nodes that will be popped each time from the priority queue
//This loop will continue to pop out two nodes from the priority queue until only one nodes is left
//in the priority queue
while (pq.size()!=1)
{
n1 = pq.top();
pq.pop();
n2 = pq.top();
pq.pop();
node* z = new node; //Creation of new node of Huffman tree
z->left = n1;
z->right = n2;
z->f = (n1->f) + (n2->f); //Storing sum of the two popped nodes in Huffman tree node
z->c = '&'; //Assigning the new node a character that is not used in formal speech
pq.push(z); //Pushing the node into the priority queue again
}
node* root = pq.top(); //Making the last node the root node
EncodeAndPrintCodes(root,encoded); //Passing the root node and a string that will encode each character of our inputted string
}
//This function will recursively search for a character in the string, and will print it's corresponding code.
//It will do this for all our characters
void EncodeAndPrintCodes(node* root,string en)
{
if (root == NULL)
{
return ;
}
if (root->c != '&')
{
//cout << root->c << ":" << en;
StoreinMap(root->c, en);
}
EncodeAndPrintCodes(root->left, en + "0");
EncodeAndPrintCodes(root->right, en + "1");
}
//Will convert our code in string to bitstream and then store it in a text file
void CompressedFile(char ch, string code)
{
ofstream compressed;
compressed.open("CompressedFile.txt", ios::app | ios::out);
}
void StoreinMap(char ch, string code)
{
um.emplace(pair<char, string>(ch,code));
}
/*void DisplayEncoded()
{
cout << encoded;
}*/
//Displays the size of the priority queue
void DisplaySize()
{
cout<<pq.size();
}
};
int main()
{
Huffman obj;
obj.FileRead();
obj.CharacterFrequency();
//obj.PriorityQueueDisplay();
obj.HuffmanTree();
//obj.DisplaySize();
//obj.DisplayEncoded();
//obj.CompressedFile();
return 0;
}

Copied from this answer, this is a way to write bits to a file of bytes. You have a bit buffer consisting of:
unsigned long bitBuffer = 0;
int bitcount = 0;
To add the bits bits in value to the buffer:
bitBuffer |= value << bitCount;
bitcount += bits;
To write and remove available bytes:
while (bitCount >= 8) {
writeByte(bitBuffer & 0xff);
bitBuffer >>>= 8;
bitCount -= 8;
}
At the end, you need to write any bits remaining in the bit buffer. When decoding, you need to take care to not interpret any filler bits in the last byte as data that it is not. For that, you'll need an end marker in your data.
Some side comments. Somehow you turned the O(n) calculation of character frequencies into an O(n^2) calculation! You should think about that some more. Don't define special character values. You should be able to compress any sequence of bytes. Your use of getline() stops reading the input if it gets to a zero byte. Use rdbuf(). Your use of & as an indicator of a node because you think the character is "not used in formal speech" is wrong. (It is commonly used in writing.) If there is an ampersand in the input, your program will crash, trying to access an uninitialized pointer. Use left as your indicator of whether this is a node or a leaf by setting it to nullptr if it's a leaf.

Related

My compressed file have larger file size than the original file

I was able to write a code for huffman coding only using queue library. But as I save my file for compression it gives a larger byte size than the original file.
ex.
filesize.txt has 17 bytes it contain a string "Stressed-desserts" while
compressedfile.bin has 44 bytes which contains the huffman codes of the original file "01111011000011110001001100100011110010010111".
This is my whole code
#include <iostream>
#include <queue>
#include <fstream>
using namespace std;
struct HuffNode{
int my_Frequency;
char my_Char;
string my_Code;
HuffNode* my_Left;
HuffNode* my_Right;
};
//global variables
int freq[256] = {0};
string encoded = "";
string filename;
//Comparing the frequency in the priority queue
struct compare_freq {
bool operator()(HuffNode* l, HuffNode* r) {
return l->my_Frequency > r->my_Frequency;
}
};
priority_queue <HuffNode*, vector<HuffNode*>, compare_freq> freq_queue;
//get the file from user
string get_file_name()
{
cout << "Input file name to compress: ";
cin >> filename;
return filename;
}
//Scan the file to be compressed and tally all the occurence of all characters.
void file_getter()
{
fstream fp;
char c;
fp.open(get_file_name(), ios::in);
if(!fp)
{
cout << "Error: Couldn't open file " << endl;
system("pause");
}
else
{
while(!fp.eof())
{
c = fp.get();
freq[c]++;
}
}
fp.close();
}
//HuffNode to create a newNode for queue containing the letter and the frequency
HuffNode* set_Node(char ch, int count)
{
HuffNode* newNode = new HuffNode;
newNode->my_Frequency = count;
newNode->my_Char = ch;
newNode->my_Code = "";
newNode->my_Right = nullptr;
newNode->my_Left = nullptr;
return newNode;
}
//Sort or Prioritize characters based on numbers of occurences in text.
void insert_Node(char ch, int count)
{
//pass the ch and count to the newNodes for queing
freq_queue.push(set_Node(ch, count));
}
void create_Huffman_Tree()
{
HuffNode* root;
file_getter();
//insert the characters in the their frequencies into the priority queue
for(int i = 0; i < 256; i++)
{
if(freq[i] > 0)
{
insert_Node(char(i), freq[i]);
}
}
//build the huffman tree
while(freq_queue.size() > 1)
{
//get the two highest priority nodes
HuffNode* for_Left = freq_queue.top();
freq_queue.pop();
HuffNode* for_Right = freq_queue.top();
freq_queue.pop();
//Create a new HuffNode with the combined frequency of the left and right children
int freq = for_Left->my_Frequency + for_Right->my_Frequency;
char ch = '$';
root = set_Node(ch, freq);
root->my_Left = for_Left;
root->my_Right = for_Right;
//Insert the new node into the priority_queue.
freq_queue.push(root);
}
// The remaining HuffmanNode in the queue is the root of the Huffman tree
root = freq_queue.top();
}
void preOrderTraverse(HuffNode* root, char c, string code)
{
if (root == nullptr) {
// If the tree is empty, return
return;
}
if (root->my_Char == c)
{
// If the current HuffmanNode is a leaf HuffmanNode, print the code for the character.
root->my_Code = code;
encoded += code;
return;
}
// Otherwise, recurse on the left and right children
preOrderTraverse(root->my_Left, c, code + "0");
preOrderTraverse(root->my_Right, c, code + "1");
}
void encode_File(string ccode)
{
HuffNode* root = freq_queue.top();
for(int i = 0; i < ccode.length(); i++)
{
char c = ccode[i];
string code = "";
preOrderTraverse(root, c, code);
}
}
void save_Huffman_Code()
{
fstream fp, fp2;
fp.open("Compressed_file.bin", ios::out);
fp2.open(filename, ios::in);
string ccode;
getline(fp2, ccode);
encode_File(ccode);
fp << encoded;
fp.close();
fp2.close();
}
int main()
{
create_Huffman_Tree();
HuffNode* root = freq_queue.top();
save_Huffman_Code();
}
I should get a compressed file that has a smaller byte size than the original. I am trying to write the code without using bit operations, unorderedmap or map. I only use priority_queue for the program.
You are writing eight bits per bit to your output, so it is eight times larger than it's supposed to be. You want to write one bit per bit. To write bits, you need to accumulate them, one by one, into a byte buffer until you have eight, then write that byte. At the end, write the remaining bits. Use the bit operators << and | to put the bits into the byte buffer. E.g. for each bit equal to 0 or 1:
unsigned buf = 0, n = 0;
...
buf |= bit << n;
if (++n == 8) {
fp.put(buf);
buf = n = 0;
}
...
if (n)
fp.put(buf);
There are many other things wrong with your code.
Because c is a signed byte type, freq[c]++; will fail for input that has bytes larger than 127, as c will be negative. You need int c; instead of char c.
Using while(!fp.eof()) will result in getting a -1 as your last character, which is an EOF indication, and again indexing your array with a negative number. Do while ((c = fp.get()) != -1).
You use a series of get()'s the first time you read the file, which is correct. However the second time you read the file, you use a single getline(). This only gets the first line, and it omits the new line character. Read the file the same way both times, with a series of get()'s.
You are only writing the codes. There is no description of the Huffman code preceding them, so there is no way for a decoder to make any sense of the bits you send. Once you fix it to send a bit per bit instead of a byte per bit, your output will be smaller than what the data can actually be compressed to. When you add the tree, the input and output will be about the same length.
You are traversing the entire tree every time you want to encode one character! You need to make a table of codes by traversing the tree once, and then use the table to encode.
There is no way to know how many characters have been encoded, which will result in an ambiguity for any extra bits in the last byte. You need to either send the number of characters ahead of the encoded characters, or include one more symbol when coding for an end-of-stream indicator.
what you have in encoded is string of 0s and 1s. Those itself are characters.
You may want to convert the string to binary and then store it?
If you use character(a byte) to store the 0s and 1s, it will take more space. Instead of using 1 bit to store digit, it uses 1 byte. So if you convert the data to bits it should take (44/8)+1

How can I read and output the contents of.txt file using C++?

I have been assigned as a school project to check the the frequency of occurrence of 26 lowercase letters, and then they are encoded by Hoffman code. In this assignment, the basic requirement is (1)to read the given text file and open it on the terminal and (2)to output the number of occurrence of lowercase letters and their corresponding encoded Hoffman code.
I was able to accomplish the (2) task but whenever I am trying to output the text on the terminal, my code was not able to count the occurrence of the lowercase letters and output their respective Huffman code.
terminal showing only the text content
output without the inclusion of code for reading the text on terminal
Here is my code for the assignment:
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <bits/stdc++.h>
int ch;
FILE *file;
using namespace std;
struct Node
{
// data members of each node of Huffman tree - alphabet and its frequency
char data;
unsigned freq;
// pointers to left and right childs
Node *left, *right;
// constructor to initialize the data members of the Huffman tree node
Node(char d, unsigned f)
{
data = d;
freq = f;
left = NULL;
right = NULL;
}
};
struct compare
{
// overloading () to compare which of left,right childs have higher frequency
bool operator()(Node* l, Node* r)
{
return (l->freq > r->freq);
}
};
// recursive function to display huffman codes
void display(struct Node* root, string str)
{
// if tree is empty return
if (!root)
return;
if (root->data != '$')
cout << root->data << ": " << str << "\n";
// recursively call left, right childs
display(root->left, str + "0");
display(root->right, str + "1");
}
// build huffman tree
void createHuffmanTree(char data[], int freq[], int size)
{
struct Node *left, *right, *top;
// Create a min heap using STL
priority_queue<Node*, vector<Node*>, compare> minHeap;
// push alphabets, frequency into minheap as leaf nodes
for (int i = 0; i < size; ++i)
minHeap.push(new Node(data[i], freq[i]));
// repeat until heap size becomes one
while (minHeap.size() != 1)
{
// Extract two least frequent items from minheap
left = minHeap.top();
minHeap.pop();
right = minHeap.top();
minHeap.pop();
// Create new internal node with two least frequent items as child nodes
// frequency of internal node is equal to sum of frequency of child nodes
top = new Node('$', left->freq + right->freq);
top->left = left;
top->right = right;
// push newly created internal node into minheap
minHeap.push(top);
}
// building Huffman tree is done
// display Huffman codes for alphabets
cout << "The Huffman codes for lower case characters: " << endl;
display(minHeap.top(), "");
}
// reading text file
void readTxt() {
FILE * txt;
char ch;
if (txt != NULL)
{
// open file to read english article
txt = fopen("D:\\data3.txt", "r");
do
{
ch = fgetc(txt);
putchar(ch);
} while(ch != EOF);
//fclose(txt);
} else {
cout << "There is no such .txt file in the system." << endl;
exit(0);
}
}
int main()
{
int charCount = 0;
int freq[26] = {0};
readTxt();
// reads english article from file and counts frequency of each lowercase alphabet
while (1)
{
ch = fgetc(file);
if (ch == EOF)
break;
if (ch >= 'a' && ch <= 'z')
{
freq[ch - 'a']++;
// count total number of lowercase alphabets in file
charCount++;
}
}
fclose(file);
// display total number of characters in file
cout << "Number of lowercase alphabets in file: " << charCount << endl;
char arr[] = { 'a','b','c','d','e','f',
'g','h','i','j','k','l',
'm','n','o','p','q','r',
's','t','u','v','w','x',
'y','z' };
// display frequency of lowercase characters
cout << "Frequency of lowercase characters: " << endl;
int freqCount = 0;
for(int i = 0; i < 26; i++)
{
if(freq[i]!=0)
{
cout << arr[i] << " : " << freq[i] << endl;
// count number of lowercase alphabets with frequency greater than zero
freqCount++;
}
}
char arr1[freqCount];
int k = 0, freq1[freqCount];
// copy lowercase alphabets with frequency greater than zero into arr1, freq1
for(int i = 0; i < 26; i++)
{
if(freq[i]!=0)
{
arr1[k] = arr[i];
freq1[k++] = freq[i];
}
}
// call method to create Huffman tree
createHuffmanTree(arr1, freq1, freqCount);
return 1;
}
It will be a great help if you guide me through my mistake.
TIA

How can I print the BFS path itself rather than the length of the path from this word ladder?

I received the starter code/algorithm for a word ladder that implements Breadth-first search. The program takes a dictionary of words, but I modified it to take an input file. The algorithm I was given prints the length of the path from a source word to a target word ex: If it takes 4 transformations to reach the target word, it will print 4. I want to print the path itself. ex: If the source word is "TOON" and the source word "PLEA" it should print "TOON -> POON -> POIN -> PLIN -> PLIA -> PLEA"
So far I've tried to add a loop that appends the words in a queue to a vector, then returns the vector, but I am getting an error that I don't understand.
main.cpp:42:18: error: no matching member function for
call to 'push_back'
transformation.push_back(Q.front());
I have been stumped by this for a couple of days , so any help will be appreciated. I'm fairly new to C++ so forgive me for any errors.
Here is the code
#include<bits/stdc++.h>
#include <iostream>
using namespace std;
// To check if strings differ by exactly one character
bool nextWord(string & a, string & b) {
int count = 0; // counts how many differeces there
int n = a.length();
// Iterator that loops through all characters and returns false if there is more than one different letter
for (int i = 0; i < n; i++) {
if (a[i] != b[i]) {
count++;
}
if (count > 1) {
return false;
}
}
return count == 1 ? true : false;
}
// A queue item to store the words
struct QItem {
string word;
};
// Returns length of shortest chain to reach 'target' from 'start'
// using minimum number of adjacent moves. D is dictionary
int wordLadder(string & start, string & target, set < string > & D) {
vector < string > transformation;
// Create a queue for BFS and insert 'start' as source vertex
queue < QItem > Q;
QItem item = {
start
}; // Chain length for start word is 1
Q.push(item);
transformation.push_back(Q.front());
// While queue is not empty
while (!Q.empty()) {
// Take the front word
QItem curr = Q.front();
Q.pop();
// Go through all words of dictionary
for (set < string > ::iterator it = D.begin(); it != D.end(); it++) {
// Proccess the next word according to BFS
string temp = * it;
if (nextWord(curr.word, temp)) {
// Add this word to queue from the dictionary
item.word = temp;
Q.push(item);
// Pop from dictionary so that this word is not repeated
D.erase(temp);
// If we reached target
if (temp == target) {
return 0;
}
}
}
}
return 0;
}
string start;
string target;
// Driver program
int main() {
// make dictionary
std::ifstream file("english-words.txt");
set < string > D;
copy(istream_iterator < string > (file),
istream_iterator < string > (),
inserter(D, D.end()));
cout << endl;
cout << "Enter Start Word" << endl;
cin >> start;
cout << "Enter Target Word" << endl;
cin >> target;
cout << wordLadder(start, target, D);
return 0;
}
You're trying to append the wrong object to the vector<string>
Change
transformation.push_back(Q.front());
to
transformation.push_back(Q.front().word);

Trying to update an array with huffman string code

My current program creates a huffman tree filled with nodes of ascii characters that are being read from a text file along with the amount of time they appear in the text file (frequency). In addition, it outputs unique codes for each character read based on frequency, this is done using my traverse function.
My problem: I have this string array that can hold codes for all 256 ascii values in my huffman function that by default is set to an empty string for each element. I have been trying to update my array by passing a parameter to my traverse function but it gives me an error.
Code E0413 - "No suitable conversion from std::string to char exists"
Parts of my code below along with some explanation of what the variables in my traverse function are:
'character' is the char that has been found in the text file
'frequency' gives the number of times a character is read in the text file
'traversecode' is the huffman code being generated by the traverse function
I have also commented out lines in my traverse function where I get the error.
struct node {
int frequency;
char character;
const node *child0;
const node *child1;
node(unsigned char c = 0, int i = -1) {
character = c;
frequency = i;
child0 = 0;
child1 = 0;
}
node(const node* c0, const node *c1) {
character = 0;
frequency = c0->frequency + c1->frequency;
child0 = c0;
child1 = c1;
}
bool operator<(const node &a) const {
return frequency > a.frequency;
}
void traverse(string codearray[256], string traversecode = "") const {
if (child0) {
child0->traverse(traversecode + '0'); // one line throwing the error
child1->traverse(traversecode + '1'); // second line that throws me the error
}
else {
codearray[int(character)] = traversecode;
cout << " " << character << " ";
cout << frequency;
cout << " " << traversecode << endl;
}
}
};
huffman function (function that contains array I would like to get updated)
void huffman(string code[256], const unsigned long long frequency[256]) {
priority_queue < node > q;
for (unsigned i = 0; i < 256; i++) {
if (frequency[i] == 0) {
code[i] = "";
}
}
for (int i = 0; i < 256; i++)
if (frequency[i])
q.push(node(i, frequency[i]));
while (q.size() > 1) {
node *child0 = new node(q.top());
q.pop();
node *child1 = new node(q.top());
q.pop();
q.push(node(child0, child1));
}
cout << "CHAR FREQUENCY HUFFMAN CODE" << endl;
q.top().traverse(code);
}
When you make the recursive call to traverse, you need to provide both parameters.
child0->traverse(codearray, traversecode + '0');
You're currently trying to pass what should be the second parameter as the first.
One other possible issue is that your code assumes that char is unsigned. If a char is signed, the access to codearray[int(character)] will access outside the bounds of codearray if the character is "negative" (or in the upper half of the ASCII table when using unsigned characters).

Trying to print out the line numbers of a file, using an int variable

I was working on a cs project in c++, we had to open a file read the words and print them out in alphabetical order and next to the word print what line it was found on. I used recursive functions to do this but for some reason my counter which i was using to mark the line numbers will not update. I tried using a pointer for it but still nothing. I might have done the pointer wrong but i made the int var global so that should have handled it, but still nothing. I already turned in the assignment, but I want to know why the counter never worked. There are a few hacks in this code, like convert string to c_str(), but that was just to try and get my OR arguments to work, please ignore them.
Any advice?
#include<iostream>
#include<algorithm> //This is to do a comparison of ASCII characters.
#include<cctype> //This is to convert capital letters to lowercase.
#include<string> //This is to work with strings
#include<fstream> //This is to work with getline().
#include<cstring>
using namespace std;
// Node
struct node {
int line;
string word;
node* left;
node* right;
};
// MakeNode Function creates nodes
node* makeNode(string word, int line) {
node* newNode = new node();
newNode->word = word;
newNode->left = newNode->right = NULL;
return newNode;
delete newNode;
}
// Function to insert new nodes into the tree.
node* Insert(node* root,string word, int line) {
if(root == NULL) { // empty tree
root = makeNode(word, line);
}
// If word is less then root-word
else if(word <= root->word) {
root->left = Insert(root->left,word, line);
root->line = line;
}
// If word is greater then root-word
else {
root->right = Insert(root->right,word, line);
root->line = line;
}
return root;
}
// Print function to print tree
void printTree(node* root)
{
if (root == NULL){// If tree doesn't exist
return;
}
else{
printTree(root->left);
cout<<root->word<<"\t"<<root->line<<endl;
printTree(root->right);
}
}
int main() {
int lineNum = 1; // Set line equal to one.
// cout<<lineNum<<endl;
node* root = NULL; // Creating an empty tree
string word; // Var to hold word
ifstream quote ("quote.txt"); // Opens the text file
getline(quote,word,' '); // Gets the words from the text file.
root = Insert(root, word, lineNum);
while(!quote.eof() || word != "#"){// While loop to read all words from text file.
int * line = lineNum;
*line++;
cout<<"This is the word right now: "<<word<<endl;
char *newLine = new char[word.length()+1];
strcpy(newLine, word.c_str());
cout<<"This is the newLine[0] value right now: "<<newLine[0]<<endl;
cout<<"This is lineNum after the if"<<*line<<endl;
if(newLine[0] == '\n'){// Increments line by keeping track of \n line char.
cout<<"This is lineNum after the if"<<*line<<endl;
}
unsigned wordSize = word.size();
if(wordSize > 10){// Shortens word if it is longer then ten chars.
word.resize(10);
}
root = Insert(root, word, *line);
quote>>word;
delete[] newLine;
}// End of While
quote.close();
printTree(root);
}
/***********************************************************************
this is the content of the quote file:
civilization of science
science is knowledge
knowledge is our destiny
#// this hash is to mark the end of the paragraph.
/**************************************************************************
This is a scaled down version of what the problem is, as you can see the
*ptr variable is not updating.*/
int main()
{
int num = 1;
int *ptr = &num;
cout<<*ptr<<endl;
int key = 0;
while(key < 5){
cout<<"This is *prt: "<<*ptr<<endl;
cout<<"This is key: "<<key<<endl;
key++;
*ptr++;
}
return 0;
}
While Nathan Oliver is correct in his assessment you probably might also want to take a look at this:
int * line = lineNum;
I'm not sure why you're using a pointer but given the rest of your code I'm not sure A) it is doing what you think it is doing and B) why it isn't crashing. Were I a gambling man I'd wager this is your error right here.