Huffman Decoding Function Uncompressing One Character Repeatedly

Huffman Decoding Function Uncompressing One Character Repeatedly - c++

I have a program that produces a Huffman tree based on ASCII character frequency read in a text input file. The Huffman codes are stored in a string array of 256 elements, empty string if the character is not read. This program also encodes and compresses an output file.
I am now trying to decompress and decode my current output file which is opened as an input file and a new output file is to have the decoded message identical to the original text input file.
My thought process for this part of the assignment is to recreate a tree with huffman codes and then while reading 8 bits at a time, traverse through tree until I reach a leaf node where I will have updated an empty string(string answer) and then output it to my output file.
My problem: After writing this function I see that only one character in between all of the other characters of my original input file gets output repeatedly. I am confused as to why this is the case because I am expecting the output file to be identical to the original input file.
Any guidance or solution to this problem is appreciated.
(For encodedOutput function, fileName is the input file parameter, fileName2 is the output file parameter)
(For decodeOutput function, fileName2 is the input file parameter, fileName 3 is output file parameter)
code[256] is a parameter for both of these functions and holds the Huffman code for each unique character read in the original input file, for example, the character 'H' being read in the input file may have a code of "111" stored in the code array for code[72] at the time it is being passed to the functions.
freq[256] holds the frequency of each ascii character read or holds 0 if it is not in original input file.
void encodeOutput(const string & fileName, const string & fileName2, string code[256]) {
ifstream ifile;
ifile.open(fileName, ios::binary);
if (!ifile)
{
die("Can't read again");
}
ofstream ofile;
ofile.open(fileName2, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
int read;
read = ifile.get();
char buffer = 0, bit_count = 0;
while (read != -1) {
for (unsigned b = 0; b < code[read].size(); b++) { // loop through bits (code[read] outputs huffman code)
buffer <<= 1;
buffer |= code[read][b] != '0';
bit_count++;
if (bit_count == 8) {
ofile << buffer;
buffer = 0;
bit_count = 0;
}
}
read = ifile.get();
}
if (bit_count != 0)
ofile << (buffer << (8 - bit_count));
ifile.close();
ofile.close();
}
// Work in progress
void decodeOutput(const string & fileName2, const string & fileName3, string code[256], const unsigned long long freq[256]) {
ifstream ifile;
ifile.open(fileName2, ios::binary);
if (!ifile)
{
die("Can't read again");
}
ofstream ofile;
ofile.open(fileName3, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
priority_queue < node > q;
for (unsigned i = 0; i < 256; i++) {
if (freq[i] == 0) {
code[i] = "";
}
}
for (unsigned i = 0; i < 256; i++)
if (freq[i])
q.push(node(unsigned(i), freq[i]));
if (q.size() < 1) {
die("no data");
}
while (q.size() > 1) {
node *child0 = new node(q.top());
q.pop();
node *child1 = new node(q.top());
q.pop();
q.push(node(child0, child1));
} // created the tree
string answer = "";
const node * temp = &q.top(); // root
for (int c; (c = ifile.get()) != EOF;) {
for (unsigned p = 8; p--;) { //reading 8 bits at a time
if ((c >> p & 1) == '0') { // if bit is a 0
temp = temp->child0; // go left
}
else { // if bit is a 1
temp = temp->child1; // go right
}
if (temp->child0 == NULL && temp->child1 == NULL) // leaf node
{
ans += temp->value;
temp = &q.top();
}
ofile << ans;
}
}
}

(c >> p & 1) == '0'
Will only return true when (c >> p & 1) equals 48, so your if statement will always follow the else branch. The correct code is:
(c >> p & 1) == 0

Related

Handling extra bytes in huffman compression/decompression

I have a program that produces a Huffman tree based on ASCII character frequency read in a text input file. The Huffman codes are stored in a string array of 256 elements, empty string if the character is not read. This program also then encodes and compresses an output file and currently has some functionality in decompression and decoding.
In summary, my program takes a input file compresses and encodes an output file, closes the output file and opens the encoding as an input file, and takes a new output file that is supposed to have a decoded message identical to the original text input file.
My problem is that in my test run while compressing I notice that I have 3 extra bytes and in turn when I decompress and decode my encoded file, these 3 extra bytes are being decoded to my output file. Depending on the amount of text in the original input file, my other tests output these extra bytes.
My research has let me to a few suggestions such as making the first 8 bytes of your encoded output file the 64 bits of an unsigned long long that give the number of bytes in the file or using a psuedo-EOF but I am stuck on how I would go about handling it and which of the two is a smart way to handle it given the code I have already written or if either is a smart way at all?
Any guidance or solution to this problem is appreciated.
(For encodedOutput function, fileName is the input file parameter, fileName2 is the output file parameter)
(For decodeOutput function, fileName2 is the input file parameter, fileName 3 is output file parameter)
code[256] is a parameter for both of these functions and holds the Huffman code for each unique character read in the original input file, for example, the character 'H' being read in the input file may have a code of "111" stored in the code array for code[72] at the time it is being passed to the functions.
freq[256] holds the frequency of each ascii character read or holds 0 if it is not in original input file.
void encodeOutput(const string & fileName, const string & fileName2, string code[256]) {
ifstream ifile; //to read file
ifile.open(fileName, ios::binary);
if (!ifile)//to check if file is open or not
{
die("Can't read again"); // function that exits program if can't open
}
ofstream ofile;
ofile.open(fileName2, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
int read;
read = ifile.get(); //read one char from file and store it in int
char buffer = 0, bit_count = 0;
while (read != -1) {//run this loop until reached to end of file(-1)
for (unsigned b = 0; b < code[read].size(); b++) { // loop through bits (code[read] outputs huffman code)
buffer <<= 1;
buffer |= code[read][b] != '0';
bit_count++;
if (bit_count == 8) {
ofile << buffer;
buffer = 0;
bit_count = 0;
}
}
read = ifile.get();
}
if (bit_count != 0)
ofile << (buffer << (8 - bit_count));
ifile.close();
ofile.close();
}
void decodeOutput(const string & fileName2, const string & fileName3, string code[256], const unsigned long long freq[256]) {
ifstream ifile;
ifile.open(fileName2, ios::binary);
if (!ifile)
{
die("Can't read again");
}
ofstream ofile;
ofile.open(fileName3, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
priority_queue < node > q;
for (unsigned i = 0; i < 256; i++) {
if (freq[i] == 0) {
code[i] = "";
}
}
for (unsigned i = 0; i < 256; i++)
if (freq[i])
q.push(node(unsigned(i), freq[i]));
if (q.size() < 1) {
die("no data");
}
while (q.size() > 1) {
node *child0 = new node(q.top());
q.pop();
node *child1 = new node(q.top());
q.pop();
q.push(node(child0, child1));
} // created the tree
string answer = "";
const node * temp = &q.top(); // root
for (int c; (c = ifile.get()) != EOF;) {
for (unsigned p = 8; p--;) { //reading 8 bits at a time
if ((c >> p & 1) == '0') { // if bit is a 0
temp = temp->child0; // go left
}
else { // if bit is a 1
temp = temp->child1; // go right
}
if (temp->child0 == NULL && temp->child1 == NULL) // leaf node
{
answer += temp->value;
temp = &q.top();
}
}
}
ofile << ans;
}

Because of integral promotion rules, (buffer << (8 - bit_count)) will be an integer expression, causing 4 bytes to be written. To only write one byte, you need to cast this to a char.
ofile << char(buffer << (8 - bit_count));

Handling last byte in huffman compression/decompression

I have a program that produces a Huffman tree based on ASCII character frequency read in a text input file. The Huffman codes are stored in a string array of 256 elements, empty string if the character is not read. This program also then encodes and compresses an output file and then is able to take the compressed file as an input file and does decompression and decoding.
In summary, my program takes a input file compresses and encodes an output file, closes the output file and opens the encoding as an input file, and takes a new output file that is supposed to have a decoded message identical to the original text input file.
My current problem with this program: When decoding the compressed file I get an extra character or so that is not in the original input file decoded. This is due to the trash bits from what I know. With research I found one solution may be to use a psuedo-EOF character to stop decoding before the trash bits are read but I am not sure how to implement this in my current functions that handle encoding and decoding so all guidance and help is much appreciated.
My end goal is to be able to use this program to also completely decode the encoded file without the trash bits sent to output file.
Below I have two functions, encodedOutput and decodeOutput that handle the compression and decompression.
(For encodedOutput function, fileName is the input file parameter, fileName2 is the output file parameter)
(For decodeOutput function, fileName2 is the input file parameter, fileName 3 is output file parameter)
code[256] is a parameter for both of these functions and holds the Huffman code for each unique character read in the original input file, for example, the character 'H' being read in the input file may have a code of "111" stored in the code array for code[72] at the time it is being passed to the functions.
freq[256] holds the frequency of each ascii character read or holds 0 if it is not in original input file.
void encodeOutput(const string & fileName, const string & fileName2, string code[256]) {
ifstream ifile; //to read file
ifile.open(fileName, ios::binary);
if (!ifile)//to check if file is open or not
{
die("Can't read again"); // function that exits program if can't open
}
ofstream ofile;
ofile.open(fileName2, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
int read;
read = ifile.get(); //read one char from file and store it in int
char buffer = 0, bit_count = 0;
while (read != -1) {//run this loop until reached to end of file(-1)
for (unsigned b = 0; b < code[read].size(); b++) { // loop through bits (code[read] outputs huffman code)
buffer <<= 1;
buffer |= code[read][b] != '0';
bit_count++;
if (bit_count == 8) {
ofile << buffer;
buffer = 0;
bit_count = 0;
}
}
read = ifile.get();
}
if (bit_count != 0)
ofile << char(buffer << (8 - bit_count));
ifile.close();
ofile.close();
}
void decodeOutput(const string & fileName2, const string & fileName3, string code[256], const unsigned long long freq[256]) {
ifstream ifile;
ifile.open(fileName2, ios::binary);
if (!ifile)
{
die("Can't read again");
}
ofstream ofile;
ofile.open(fileName3, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
priority_queue < node > q;
for (unsigned i = 0; i < 256; i++) {
if (freq[i] == 0) {
code[i] = "";
}
}
for (unsigned i = 0; i < 256; i++)
if (freq[i])
q.push(node(unsigned(i), freq[i]));
if (q.size() < 1) {
die("no data");
}
while (q.size() > 1) {
node *child0 = new node(q.top());
q.pop();
node *child1 = new node(q.top());
q.pop();
q.push(node(child0, child1));
} // created the tree
string answer = "";
const node * temp = &q.top(); // root
for (int c; (c = ifile.get()) != EOF;) {
for (unsigned p = 8; p--;) { //reading 8 bits at a time
if ((c >> p & 1) == '0') { // if bit is a 0
temp = temp->child0; // go left
}
else { // if bit is a 1
temp = temp->child1; // go right
}
if (temp->child0 == NULL && temp->child1 == NULL) // leaf node
{
answer += temp->value;
temp = &q.top();
}
}
}
ofile << ans;
}

Change it to freq[257] and code[257], and set freq[256] to one. Your EOF is symbol 256, and it will appear once in the stream, at the end. At the end of your encoding, send symbol 256. When you receive symbol 256 while decoding, stop.

Huffman Decoding Compressed File

I have a program that produces a Huffman tree based on ASCII character frequency read in a text input file. The Huffman codes are stored in a string array of 256 elements, empty string if the character is not read. This program also encodes and compresses an output file.
I am now trying to decompress and decode my current output file which is opened as an input file and a new output file is to have the decoded message identical to the original text input file.
My thought process for this part of my assignment is to work backwards from the encoding function I have made and read 8 bits at a time and somehow decode the message by updating a variable (string n) which is an empty string at first, through recursion of the Huffman tree until I get a code to output to output file.
I have currently started the function but I am stuck and I am looking for some guidance in writing my current decodeOutput function. All help is appreciated.
My completed encodedOutput function and decodeOutput function is down below:
(For encodedOutput function, fileName is the input file parameter, fileName2 is the output file parameter)
(For decodeOutput function, fileName2 is the input file parameter, fileName 3 is output file parameter)
code[256] is a parameter for both of these functions and holds the Huffman code for each unique character read in the original input file, for example, the character 'H' being read in the input file may have a code of "111" stored in the code array for code[72] at the time it is being passed to the functions.
void encodeOutput(const string & fileName, const string & fileName2, string code[256]) {
ifstream ifile;//to read file
ifile.open(fileName, ios::binary);
if (!ifile) //to check if file is open or not
{
die("Can't read again");
}
ofstream ofile;
ofile.open(fileName2, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
int read;
read = ifile.get();//read one char from file and store it in int
char buffer = 0, bit_count = 0;
while (read != -1) {
for (unsigned b = 0; b < code[read].size(); b++) { // loop through bits (code[read] outputs huffman code)
buffer <<= 1;
buffer |= code[read][b] != '0';
bit_count++;
if (bit_count == 8) {
ofile << buffer;
buffer = 0;
bit_count = 0;
}
}
read = ifile.get();
}
if (bit_count != 0)
ofile << (buffer << (8 - bit_count));
ifile.close();
ofile.close();
}
//Work in progress
void decodeOutput(const string & fileName2, const string & fileName3, string code[256]) {
ifstream ifile;
ifile.open(fileName2, ios::binary);
if (!ifile)
{
die("Can't read again");
}
ofstream ofile;
ofile.open(fileName3, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
string n = "";
for (int c; (c = ifile.get()) != EOF;) {
for (unsigned p = 8; p--;) {
if ((c >> p & 1) == '0') { // if bit is a 0
}
else if ((c >> p & 1) == '1') { // if bit is a 1
}
else { // Output string n (decoded character) to output file
ofile << n;
}
}
}
}

The decoding would be easier if you had the original Hoffman tree used to construct the codebook. But suppose you only have the codebook (i.e., the string code[256]) but not the original Hoffman tree. What you can do is the following:
Partition the codebook into groups of codewords with different lengths. Say the codebook consists of codewords with n different lengths: L0 < L1 < ... < Ln-1 .
Read (but do not consume yet) k bits from input file, with k increasing From L0 up to Ln-1, until you find a match between the input k bits and a codeword of length k = Li for some i.
Output the 8-bit character corresponding to the matching codeword, and consume the k bits from input file.
Repeat until all bits from input file are consumed.
If the codebook were constructed correctly, and you always look up the codewords in increasing length, you should never find a sequence of input bits which you cannot find a matching codeword.
Effectively, in terms of the Hoffman tree equivalence, every time you compare k input bits with a group of codewords of length k, you are checking whether a leaf at tree level-k contains an input-matching codeword; every time you increase k to the next longer group of codewords, you are walking down the tree to a higher level (say level-0 is the root).

C++ peek giving value 'ÿ' (ifstream)

My code first of all:
int GetHighScore(string name)
{
int highScore = 0;
ifstream fin;
char textInFile[50];1
fin.open(name + ".txt", ios::in);
if (fin.fail())
{
// Old piece of code
highScore = 0;
}
else
{
while (fin.good())
{
fin >> textInFile;
for each (char var in textInFile)
{
if (var == '#')
{
char c = fin.peek();
if (c == '1')
{
char score = fin.peek();
highScoreLvl1 = (int)score;
}
else if (c == '2')
{
char score = fin.peek();
highScoreLvl2 = (int)score;
}
else if (c == '3')
{
char score = fin.peek();
highScoreLvl3 = (int)score;
}
}
}
}
//fin >> highScore;
}
// Return the high score found in the file
return highScoreLvl1;
}
It detects the '#', but then c gets assigned the value 'ÿ' when it performs the peek operation. What it should give is the number '1', '2' or '3' (in char form); but it doesn't for some reason, and I can't see why... :/
Here's what the file looks like:
level#12level#22level#32
The first number represents the level, and the second number is the score achieved on that level.

If your file contains the only string 'level#12level#22level#32' then it's read into textInFile in fin >> textInFile operator. When you meet '#' character in the string you're trying to peek character from the file stream but there is nothing to peek, that's why -1 (end of file) is returned.
To fix this you need to take next character from textInFile string, not from the file. Here is example code:
int GetHighScore(string name)
{
int highScore = 0;
ifstream fin;
char textInFile[50];
fin.open(name + ".txt", ios::in);
int highScoreLvl1, highScoreLvl2, highScoreLvl3;
if (fin.fail())
{
// Old piece of code
highScore = 0;
}
else
{
while (fin.good())
{
fin >> textInFile;
bool bPrevIsHash = false;
size_t nLength = strlen(textInFile);
for (size_t i = 0; i + 2 < nLength; ++i)
{
if (textInFile[i] == '#')
{
if (textInFile[i + 1] == '1')
{
highScoreLvl1 = (int)textInFile[i + 2];
}
else if (textInFile[i + 1] == '2')
{
highScoreLvl2 = (int)textInFile[i + 2];
}
else if (textInFile[i + 1] == '3')
{
highScoreLvl3 = (int)textInFile[i + 2];
}
}
}
}
}
// Return the high score found in the file
return highScoreLvl1;
}
And there are several other issues with your code:
You return value of highScoreLvl1 that could be left uninitialized because there can be no '#' in the string. And probably you mean to return max value of highScoreLvl1, highScoreLvl2 or highScoreLvl3.
You're assigning value of char converted to int. In this case you will not get value of 1, 2, etc. You'll get ordinal of ASCII character, e.g. 0x31 (49) for '1', 0x32 (50) for 2, etc. If you need digit value you can do following trick: highScoreLvl1 = textInFile[i + 2] - '0';

Alternating between reading and writing repeatedly

My objective is to read a file line by line, check if that line contains some number, and if so rewrite that line. Then continue reading the file.
I've successfully been able to do this for one line, but I can't figure out how to continue reading the rest of the file.
Here's how I replace one line (every line is a known fixed size):
while(getline(fs, line)){
if(condition){
pos = fs.tellg(); //gets current read position (end of the line I want to change)
pos -= line.length()+1; //position of the beginning of the line
fs.clear(); //switch to write mode
fs.seekp(pos); //seek to beginning of line
fs << new_data; //overwrite old data with new data (also fixed size)
fs.close(); //Done.
continue;
}
}
How do I switch back to read and continue the getline loop?

I had the same problem, TB-scale files and I wanted to modify some header information in the beginning of the file.
Obviously one has to leave enough room when one initially creates the file for any new content, because there is no way to increase the file size (besides appending to it) and the new line has to have the exact same line length as the original one.
Here is a simplification of my code:
#include <iostream>
#include <fstream>
using namespace std;
bool CreateDummy()
{
ofstream out;
out.open("Dummy.txt");
// skip: test if open
out<<"Some Header"<<endl;
out<<"REPLACE1 12345678901234567890"<<endl;
out<<"REPLACE2 12345678901234567890"<<endl;
out<<"Now ~1 TB of data follows..."<<endl;
out.close();
return true;
}
int main()
{
CreateDummy(); // skip: test if successful
fstream inout;
inout.open("Dummy.txt", ios::in | ios::out);
// skip test if open
bool FoundFirst = false;
string FirstText = "REPLACE1";
string FirstReplacement = "Replaced first!!!";
bool FoundSecond = false;
string SecondText = "REPLACE2";
string SecondReplacement = "Replaced second!!!";
string Line;
size_t LastPos = inout.tellg();
while (getline(inout, Line)) {
if (FoundFirst == false && Line.compare(0, FirstText.size(), FirstText) == 0) {
// skip: check if Line.size() >= FirstReplacement.size()
while (FirstReplacement.size() < Line.size()) FirstReplacement += " ";
FirstReplacement += '\n';
inout.seekp(LastPos);
inout.write(FirstReplacement.c_str(), FirstReplacement.size());
FoundFirst = true;
} else if (FoundSecond == false && Line.compare(0, SecondText.size(), SecondText) == 0) {
// skip: check if Line.size() >= SecondReplacement.size()
while (SecondReplacement.size() < Line.size()) SecondReplacement += " ";
SecondReplacement += '\n';
inout.seekp(LastPos);
inout.write(SecondReplacement.c_str(), SecondReplacement.size());
FoundSecond = true;
}
if (FoundFirst == true && FoundSecond == true) break;
LastPos = inout.tellg();
}
inout.close();
return 0;
}
The input is
Some Header
REPLACE1 12345678901234567890
REPLACE2 12345678901234567890
Now ~1 TB of data follows...
The output is:
Some Header
Replaced first!!!
Replaced second!!!
Now ~1 TB of data follows...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Huffman Decoding Function Uncompressing One Character Repeatedly - c++

(c >> p & 1) == '0' Will only return true when (c >> p & 1) equals 48, so your if statement will always follow the else branch. The correct code is: (c >> p & 1) == 0

Related

Handling extra bytes in huffman compression/decompression

Handling last byte in huffman compression/decompression

Huffman Decoding Compressed File

C++ peek giving value 'ÿ' (ifstream)

Alternating between reading and writing repeatedly

Categories

Resources