Why does writing to binary file writes one byte from no where - c++

I have a class for writing to bytes to binary file
class BITWRITER{
public:
ofstream OFD;
char var;
int x;
BITWRITER(char* pot){
OFD.open(pot);
x = 0;
var =0;
}
void WRITE(bool b){
var ^= (-b^var)&(1 << x);
x++;
if(x == 7){
OFD.write(&var, 1);
x = 0;
var = 0;
}
}
}
And my sample code:
string bitCode = "0001010";
bool BitIsOne = false;
BITWRITER *write= new BITWRITER("out.bin");
for(int i = bitCode.length()-1 ; i >= 0; i--){
if(bitCode[i] == '1')
BitIsOne=true;
else
BitIsOne=false;
write->WRITE(BitIsOne);
}
delete write;
What I don't get it is, why when i run this exact code, when I then next read this file instead of having in binary file only one byte, I have two bytes.
In this example, the output should be
"1010"
but before this one random byte is somehow created ("1101").
Any ideas would be appreciated!

Binary 1010 is 0x0a which is a newline. You're openign the file without specifying that it should be opened in binary mode.
On Windows when you write a newline to a text mode file it will translate it to a cr/lf sequence. A cr return is 0x0d which is a binary 1101.
Specify that the file should be opened in binary mode:
OFD.open(pot, ios::binary);

Related

Handling extra bytes in huffman compression/decompression

I have a program that produces a Huffman tree based on ASCII character frequency read in a text input file. The Huffman codes are stored in a string array of 256 elements, empty string if the character is not read. This program also then encodes and compresses an output file and currently has some functionality in decompression and decoding.
In summary, my program takes a input file compresses and encodes an output file, closes the output file and opens the encoding as an input file, and takes a new output file that is supposed to have a decoded message identical to the original text input file.
My problem is that in my test run while compressing I notice that I have 3 extra bytes and in turn when I decompress and decode my encoded file, these 3 extra bytes are being decoded to my output file. Depending on the amount of text in the original input file, my other tests output these extra bytes.
My research has let me to a few suggestions such as making the first 8 bytes of your encoded output file the 64 bits of an unsigned long long that give the number of bytes in the file or using a psuedo-EOF but I am stuck on how I would go about handling it and which of the two is a smart way to handle it given the code I have already written or if either is a smart way at all?
Any guidance or solution to this problem is appreciated.
(For encodedOutput function, fileName is the input file parameter, fileName2 is the output file parameter)
(For decodeOutput function, fileName2 is the input file parameter, fileName 3 is output file parameter)
code[256] is a parameter for both of these functions and holds the Huffman code for each unique character read in the original input file, for example, the character 'H' being read in the input file may have a code of "111" stored in the code array for code[72] at the time it is being passed to the functions.
freq[256] holds the frequency of each ascii character read or holds 0 if it is not in original input file.
void encodeOutput(const string & fileName, const string & fileName2, string code[256]) {
ifstream ifile; //to read file
ifile.open(fileName, ios::binary);
if (!ifile)//to check if file is open or not
{
die("Can't read again"); // function that exits program if can't open
}
ofstream ofile;
ofile.open(fileName2, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
int read;
read = ifile.get(); //read one char from file and store it in int
char buffer = 0, bit_count = 0;
while (read != -1) {//run this loop until reached to end of file(-1)
for (unsigned b = 0; b < code[read].size(); b++) { // loop through bits (code[read] outputs huffman code)
buffer <<= 1;
buffer |= code[read][b] != '0';
bit_count++;
if (bit_count == 8) {
ofile << buffer;
buffer = 0;
bit_count = 0;
}
}
read = ifile.get();
}
if (bit_count != 0)
ofile << (buffer << (8 - bit_count));
ifile.close();
ofile.close();
}
void decodeOutput(const string & fileName2, const string & fileName3, string code[256], const unsigned long long freq[256]) {
ifstream ifile;
ifile.open(fileName2, ios::binary);
if (!ifile)
{
die("Can't read again");
}
ofstream ofile;
ofile.open(fileName3, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
priority_queue < node > q;
for (unsigned i = 0; i < 256; i++) {
if (freq[i] == 0) {
code[i] = "";
}
}
for (unsigned i = 0; i < 256; i++)
if (freq[i])
q.push(node(unsigned(i), freq[i]));
if (q.size() < 1) {
die("no data");
}
while (q.size() > 1) {
node *child0 = new node(q.top());
q.pop();
node *child1 = new node(q.top());
q.pop();
q.push(node(child0, child1));
} // created the tree
string answer = "";
const node * temp = &q.top(); // root
for (int c; (c = ifile.get()) != EOF;) {
for (unsigned p = 8; p--;) { //reading 8 bits at a time
if ((c >> p & 1) == '0') { // if bit is a 0
temp = temp->child0; // go left
}
else { // if bit is a 1
temp = temp->child1; // go right
}
if (temp->child0 == NULL && temp->child1 == NULL) // leaf node
{
answer += temp->value;
temp = &q.top();
}
}
}
ofile << ans;
}
Because of integral promotion rules, (buffer << (8 - bit_count)) will be an integer expression, causing 4 bytes to be written. To only write one byte, you need to cast this to a char.
ofile << char(buffer << (8 - bit_count));

Handling last byte in huffman compression/decompression

I have a program that produces a Huffman tree based on ASCII character frequency read in a text input file. The Huffman codes are stored in a string array of 256 elements, empty string if the character is not read. This program also then encodes and compresses an output file and then is able to take the compressed file as an input file and does decompression and decoding.
In summary, my program takes a input file compresses and encodes an output file, closes the output file and opens the encoding as an input file, and takes a new output file that is supposed to have a decoded message identical to the original text input file.
My current problem with this program: When decoding the compressed file I get an extra character or so that is not in the original input file decoded. This is due to the trash bits from what I know. With research I found one solution may be to use a psuedo-EOF character to stop decoding before the trash bits are read but I am not sure how to implement this in my current functions that handle encoding and decoding so all guidance and help is much appreciated.
My end goal is to be able to use this program to also completely decode the encoded file without the trash bits sent to output file.
Below I have two functions, encodedOutput and decodeOutput that handle the compression and decompression.
(For encodedOutput function, fileName is the input file parameter, fileName2 is the output file parameter)
(For decodeOutput function, fileName2 is the input file parameter, fileName 3 is output file parameter)
code[256] is a parameter for both of these functions and holds the Huffman code for each unique character read in the original input file, for example, the character 'H' being read in the input file may have a code of "111" stored in the code array for code[72] at the time it is being passed to the functions.
freq[256] holds the frequency of each ascii character read or holds 0 if it is not in original input file.
void encodeOutput(const string & fileName, const string & fileName2, string code[256]) {
ifstream ifile; //to read file
ifile.open(fileName, ios::binary);
if (!ifile)//to check if file is open or not
{
die("Can't read again"); // function that exits program if can't open
}
ofstream ofile;
ofile.open(fileName2, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
int read;
read = ifile.get(); //read one char from file and store it in int
char buffer = 0, bit_count = 0;
while (read != -1) {//run this loop until reached to end of file(-1)
for (unsigned b = 0; b < code[read].size(); b++) { // loop through bits (code[read] outputs huffman code)
buffer <<= 1;
buffer |= code[read][b] != '0';
bit_count++;
if (bit_count == 8) {
ofile << buffer;
buffer = 0;
bit_count = 0;
}
}
read = ifile.get();
}
if (bit_count != 0)
ofile << char(buffer << (8 - bit_count));
ifile.close();
ofile.close();
}
void decodeOutput(const string & fileName2, const string & fileName3, string code[256], const unsigned long long freq[256]) {
ifstream ifile;
ifile.open(fileName2, ios::binary);
if (!ifile)
{
die("Can't read again");
}
ofstream ofile;
ofile.open(fileName3, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
priority_queue < node > q;
for (unsigned i = 0; i < 256; i++) {
if (freq[i] == 0) {
code[i] = "";
}
}
for (unsigned i = 0; i < 256; i++)
if (freq[i])
q.push(node(unsigned(i), freq[i]));
if (q.size() < 1) {
die("no data");
}
while (q.size() > 1) {
node *child0 = new node(q.top());
q.pop();
node *child1 = new node(q.top());
q.pop();
q.push(node(child0, child1));
} // created the tree
string answer = "";
const node * temp = &q.top(); // root
for (int c; (c = ifile.get()) != EOF;) {
for (unsigned p = 8; p--;) { //reading 8 bits at a time
if ((c >> p & 1) == '0') { // if bit is a 0
temp = temp->child0; // go left
}
else { // if bit is a 1
temp = temp->child1; // go right
}
if (temp->child0 == NULL && temp->child1 == NULL) // leaf node
{
answer += temp->value;
temp = &q.top();
}
}
}
ofile << ans;
}
Change it to freq[257] and code[257], and set freq[256] to one. Your EOF is symbol 256, and it will appear once in the stream, at the end. At the end of your encoding, send symbol 256. When you receive symbol 256 while decoding, stop.

Huffman Decoding Function Uncompressing One Character Repeatedly

I have a program that produces a Huffman tree based on ASCII character frequency read in a text input file. The Huffman codes are stored in a string array of 256 elements, empty string if the character is not read. This program also encodes and compresses an output file.
I am now trying to decompress and decode my current output file which is opened as an input file and a new output file is to have the decoded message identical to the original text input file.
My thought process for this part of the assignment is to recreate a tree with huffman codes and then while reading 8 bits at a time, traverse through tree until I reach a leaf node where I will have updated an empty string(string answer) and then output it to my output file.
My problem: After writing this function I see that only one character in between all of the other characters of my original input file gets output repeatedly. I am confused as to why this is the case because I am expecting the output file to be identical to the original input file.
Any guidance or solution to this problem is appreciated.
(For encodedOutput function, fileName is the input file parameter, fileName2 is the output file parameter)
(For decodeOutput function, fileName2 is the input file parameter, fileName 3 is output file parameter)
code[256] is a parameter for both of these functions and holds the Huffman code for each unique character read in the original input file, for example, the character 'H' being read in the input file may have a code of "111" stored in the code array for code[72] at the time it is being passed to the functions.
freq[256] holds the frequency of each ascii character read or holds 0 if it is not in original input file.
void encodeOutput(const string & fileName, const string & fileName2, string code[256]) {
ifstream ifile;
ifile.open(fileName, ios::binary);
if (!ifile)
{
die("Can't read again");
}
ofstream ofile;
ofile.open(fileName2, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
int read;
read = ifile.get();
char buffer = 0, bit_count = 0;
while (read != -1) {
for (unsigned b = 0; b < code[read].size(); b++) { // loop through bits (code[read] outputs huffman code)
buffer <<= 1;
buffer |= code[read][b] != '0';
bit_count++;
if (bit_count == 8) {
ofile << buffer;
buffer = 0;
bit_count = 0;
}
}
read = ifile.get();
}
if (bit_count != 0)
ofile << (buffer << (8 - bit_count));
ifile.close();
ofile.close();
}
// Work in progress
void decodeOutput(const string & fileName2, const string & fileName3, string code[256], const unsigned long long freq[256]) {
ifstream ifile;
ifile.open(fileName2, ios::binary);
if (!ifile)
{
die("Can't read again");
}
ofstream ofile;
ofile.open(fileName3, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
priority_queue < node > q;
for (unsigned i = 0; i < 256; i++) {
if (freq[i] == 0) {
code[i] = "";
}
}
for (unsigned i = 0; i < 256; i++)
if (freq[i])
q.push(node(unsigned(i), freq[i]));
if (q.size() < 1) {
die("no data");
}
while (q.size() > 1) {
node *child0 = new node(q.top());
q.pop();
node *child1 = new node(q.top());
q.pop();
q.push(node(child0, child1));
} // created the tree
string answer = "";
const node * temp = &q.top(); // root
for (int c; (c = ifile.get()) != EOF;) {
for (unsigned p = 8; p--;) { //reading 8 bits at a time
if ((c >> p & 1) == '0') { // if bit is a 0
temp = temp->child0; // go left
}
else { // if bit is a 1
temp = temp->child1; // go right
}
if (temp->child0 == NULL && temp->child1 == NULL) // leaf node
{
ans += temp->value;
temp = &q.top();
}
ofile << ans;
}
}
}
(c >> p & 1) == '0'
Will only return true when (c >> p & 1) equals 48, so your if statement will always follow the else branch. The correct code is:
(c >> p & 1) == 0

Huffman Decoding Compressed File

I have a program that produces a Huffman tree based on ASCII character frequency read in a text input file. The Huffman codes are stored in a string array of 256 elements, empty string if the character is not read. This program also encodes and compresses an output file.
I am now trying to decompress and decode my current output file which is opened as an input file and a new output file is to have the decoded message identical to the original text input file.
My thought process for this part of my assignment is to work backwards from the encoding function I have made and read 8 bits at a time and somehow decode the message by updating a variable (string n) which is an empty string at first, through recursion of the Huffman tree until I get a code to output to output file.
I have currently started the function but I am stuck and I am looking for some guidance in writing my current decodeOutput function. All help is appreciated.
My completed encodedOutput function and decodeOutput function is down below:
(For encodedOutput function, fileName is the input file parameter, fileName2 is the output file parameter)
(For decodeOutput function, fileName2 is the input file parameter, fileName 3 is output file parameter)
code[256] is a parameter for both of these functions and holds the Huffman code for each unique character read in the original input file, for example, the character 'H' being read in the input file may have a code of "111" stored in the code array for code[72] at the time it is being passed to the functions.
void encodeOutput(const string & fileName, const string & fileName2, string code[256]) {
ifstream ifile;//to read file
ifile.open(fileName, ios::binary);
if (!ifile) //to check if file is open or not
{
die("Can't read again");
}
ofstream ofile;
ofile.open(fileName2, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
int read;
read = ifile.get();//read one char from file and store it in int
char buffer = 0, bit_count = 0;
while (read != -1) {
for (unsigned b = 0; b < code[read].size(); b++) { // loop through bits (code[read] outputs huffman code)
buffer <<= 1;
buffer |= code[read][b] != '0';
bit_count++;
if (bit_count == 8) {
ofile << buffer;
buffer = 0;
bit_count = 0;
}
}
read = ifile.get();
}
if (bit_count != 0)
ofile << (buffer << (8 - bit_count));
ifile.close();
ofile.close();
}
//Work in progress
void decodeOutput(const string & fileName2, const string & fileName3, string code[256]) {
ifstream ifile;
ifile.open(fileName2, ios::binary);
if (!ifile)
{
die("Can't read again");
}
ofstream ofile;
ofile.open(fileName3, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
string n = "";
for (int c; (c = ifile.get()) != EOF;) {
for (unsigned p = 8; p--;) {
if ((c >> p & 1) == '0') { // if bit is a 0
}
else if ((c >> p & 1) == '1') { // if bit is a 1
}
else { // Output string n (decoded character) to output file
ofile << n;
}
}
}
}
The decoding would be easier if you had the original Hoffman tree used to construct the codebook. But suppose you only have the codebook (i.e., the string code[256]) but not the original Hoffman tree. What you can do is the following:
Partition the codebook into groups of codewords with different lengths. Say the codebook consists of codewords with n different lengths: L0 < L1 < ... < Ln-1 .
Read (but do not consume yet) k bits from input file, with k increasing From L0 up to Ln-1, until you find a match between the input k bits and a codeword of length k = Li for some i.
Output the 8-bit character corresponding to the matching codeword, and consume the k bits from input file.
Repeat until all bits from input file are consumed.
If the codebook were constructed correctly, and you always look up the codewords in increasing length, you should never find a sequence of input bits which you cannot find a matching codeword.
Effectively, in terms of the Hoffman tree equivalence, every time you compare k input bits with a group of codewords of length k, you are checking whether a leaf at tree level-k contains an input-matching codeword; every time you increase k to the next longer group of codewords, you are walking down the tree to a higher level (say level-0 is the root).

Change a text file in random access

I have a text file and want to change it in some places, for example in byte range 4030 to 4060, in this way:
if there is a character 'C' or 'c' followed by 'G' or 'g' ,must be changed in 'B' character
The input file is a text file and I want to get a changed text output file. There is no random access in text file and so I must open file in binary form and make changes, but the output file will be binary and I don't have any idea to get a text output. The code is below:
int main()
{
string str, cstr;
ReadTextFile("in", 4030, 4060);
return 0;
}
string ReadTextFile(string path, int from, int to)
{
fstream fp(path.c_str(), ios::in|ios::out|ios::binary);
char *target;
string res, str;
target = new char[to - from + 1];
if (!target)
{
cout << "Cannot allocate memory." << endl;
return "";
}
fp.seekg(from);
fp.read(target, to - from);
target[to - from] = 0;
res = target;
str = changestring(res);
fp.seekg(from);
fp.write((char *)&str, to-from);
return res;
}
string changestring(string str)
{
int l = str.length();
l = l-1;
for (int i = 0; i <= l; i++)
{
if (str[i] == 'C' || str[i] == 'c')
{
int j = i+1;
if (str[j] == 'G' || str[j] == 'g')
str[i] = 'B';
}
}
return str;
}
You misunderstand text and binary. It's not true that if you open a file in binary mode then the output will be binary. All binary mode does in practice is stop \n being translated to \r\n on Windows systems. In other words it affects the line endings in a text file, nothing else.
You are getting binary in your output because you are writing the pointers internal to a string to your output, write characters to your file instead.
Change
fp.write((char *)&str, to-from);
to
fp.write(str.c_str(), to-from);
This would also work
fp << str;
To get text output, write text. It doesn't matter whether you do that with write or with <<, text is just text. Stop thinking in terms of two modes of output, binary and text, it's not accurate, its what you write not how you write it that determines whether the output is text or binary.