Implementing Huffman Tree - c++

I have a program that produces a huffman tree based on ascii character frequency read in a text input file. The huffman codes are stored in a string array of 256 elements, empty string if character is not read.
I am now trying to implement the huffman tree by writing a function that takes my huffman codes that are stored in a string array and outputting the encoding of input file into an output file.
I soon realized that my current approach defeats the meaning of the assignment. I have tried simply copying the string of codes to output file making my encoded output file bigger than the input file.
I am hoping to get help in changing my current function so that it can output the bits into output file making the output file smaller than input file. I am stuck because I am only reading and writing bytes currently?
My current function(fileName being input file parameter, fileName2 being output file parameter):
void encodeOutput(const string & fileName, const string & fileName2, string code[256]) {
ifstream ifile;//to read file
ifile.open(fileName, ios::binary);
if (!ifile)//to check if file is open or not
{
die("Can't read again"); // function that exits program if can't open
}
ofstream ofile;
ofile.open(fileName2, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
int read;
read = ifile.get();//read one char from file and store it in int
while (read != -1) {//run this loop until reached to end of file(-1)
ofile << code[read]; //put huffman code of character into output file
read = ifile.get();//read next character
}
ifile.close();
ofile.close();
}

You can't just use ofile << code[read]; if what you need is writing bits, the smallest unit ofstream understands is a byte.
To overcome that, you can write your bits to some sort of "bit buffer" (a char will do) and write that out once it has 8 bits. I don't know exctly what you code strings look like, but this should do:
char buffer = 0, bit_count = 0;
while (read != -1) {
for (int b = 0; b < code[read].size(); b++) {
buffer << 1;
buffer |= code[read][b] != 0;
bit_count++;
if (bit_count == 8) {
ofile << buffer;
buffer = 0;
bit_count = 0;
}
}
read = ifile.get();
}
if (bit_count != 0)
ofile << (buffer << (8 - bit_count));

Related

C++ - Read the bytes of any file into an unsigned char array

I have an assignment where I have to implement the Rijndael Algorithm for AES-128 Encryption. I have the algorithm operational, but I do not have proper file input/output.
The assignment requires us to use parameters passed in from the command line. In this case, the parameter will be the file path to the particular file the user wishes to encrypt.
My problem is, I am lost as to how to read in the bytes of a file and store these bytes inside an array for later encryption.
I have tried using ifstream and ofstream to open, read, write, and close the files and it works fine for plaintext files. However, I need the application to take ANY file as input.
When I tried my method of using fstream with a pdf as input, it would crash my program. So, I now need to learn how to take the bytes of a file, store them inside an unsigned char array for Encryption, and then store them inside another file. This process of encryption and storage of ciphertext needs to occur in 16 byte intervals.
The below implementation is my first attempt to read files in binary mode and then write whatever was read in another file also in binary mode.
The output is readable in a hex reader.
int main(int argc, char* argv[])
{
if (argc < 2)
{
cerr << "Use: " << argv[0] << " SOURCE_FILEPATH" << endl << "Ex. \"C\\Users\\Anthony\\Desktop\\test.txt\"\n";
return 1;
}
// Store the Command Line Parameter inside a string
// In this case, a filepath.
string src_fp = argv[1];
string dst_fp = src_fp.substr(0, src_fp.find('.', 0)) + ".enc";
// Open the filepaths in binary mode
ifstream srcF(src_fp, ios::in | ios::binary);
ofstream dstF(dst_fp, ios::out | ios::binary);
// Buffer to handle the input and output.
unsigned char fBuffer[16];
srcF.seekg(0, ios::beg);
while (!srcF.eof())
{
srcF >> fBuffer;
dstF << fBuffer << endl;
}
dstF.close();
srcF.close();
}
The code implementation does not work as intended.
Any direction on how to solve my dilemma would be greatly appreciated.
Like you, I really struggled to find a way to read a binary file into a byte array in C++ that would output the same hex values I see in a hex editor. After much trial and error, this seems to be the fastest way to do so without extra casts.
It would go faster without the counter, but then sometimes you end up with wide chars. To truly get one byte at a time I haven't found a better way.
By default it loads the entire file into memory, but only prints the first 1000 bytes.
string Filename = "BinaryFile.bin";
FILE* pFile;
pFile = fopen(Filename.c_str(), "rb");
fseek(pFile, 0L, SEEK_END);
size_t size = ftell(pFile);
fseek(pFile, 0L, SEEK_SET);
uint8_t* ByteArray;
ByteArray = new uint8_t[size];
if (pFile != NULL)
{
int counter = 0;
do {
ByteArray[counter] = fgetc(pFile);
counter++;
} while (counter <= size);
fclose(pFile);
}
for (size_t i = 0; i < 800; i++) {
printf("%02X ", ByteArray[i]);
}

Reading / Writing Strings to Binary Files

I have been experiencing a bug for the past day that I have not been able to solve.
I have my first method which is for saving player data:
bool Player::savePlayerData() {
ofstream writeFile(getName() + ".bin", ios::out | ios::binary | ios::trunc);
string writeData;
writeData = formatEntityData() + "<" + formatLocationData() + "<" + formatInventory();
writeFile.write(writeData.c_str(), writeData.length() + 1);
writeFile.close();
return true;
}
Note: Assume that getName(), formatEntityData(), formatLocationData(), and formatInventory() return strings and are functional.
Then I have my load player data method:
bool Player::loadPlayerData(string name) {
ifstream readFile(name + ".bin", ios::in | ios::binary | ios::_Nocreate);
if (readFile.good() && readFile.is_open()) {
string data;
getline(readFile, data, '\0');
vector<string> str = split(data, '<');
parseEntityData(str.at(0));
parseLocationData(str.at(1));
parseInventory(str.at(2));
readFile.close();
return true;
}
readFile.close();
return false;
}
Note: Assume that parseEntityData(), parseLocationData(), parseInventory() have string param, void returns and are functional
Note: Assume that split(string, char) takes in a string with a delim. char and splits into vector correctly
So, here is what I am trying to accomplish (for purposes of simplicity lets assume getName() return "luke"):
•Create luke.bin
•Save string to luke.bin in binary
•Load data from luke.bin in form of a string
When I run the program is not properly reading the player data. Instead it is returning as if nothing is in the file. What am I doing wrong? Any tips, ideas, or thoughts would be greatly appreciated.
Code on brothers!
Typically when you open a binary file in notepad++ it gives seemingly
random characters
It depends on data. The string "Hell world" is the same in binary or text. Numbers will appear as text if they are text formatted.
Example of text format:
fout << 1234 << std::endl; //saved as "1234"
Example of binary data:
int i = 1234;
fout.write(&i, sizeof(i)); //saved as 2 bytes, big-endian or little endian binary
ios::binary stops translation of new line characters.
When writing to file, put the exact size:
writeFile.write(writeData.c_str(), writeData.length());
When reading the file, getline(fin, data, '\0'); will stop when it reaches zero or end of file. You should use EOF instead of zero. Better yet, use this method:
std::ifstream f(filename, ios::binary);
if (f.good())
{
f.seekg(0, ios::end);
size_t filesize = (size_t)f.tellg();
f.seekg(0);
std::string data(filesize, 0);
f.read(&data[0], filesize);
cout << data << endl;
return true;
}
return false;

Reading a single character from a file returns special characters?

Using fstreams I'm attempting to read single characters from a specified location in a file and append them onto a string. For some reason, reading in these characters returns special characters. I've tried numerous things, but the more curious thing that I found while debugging was that changing the initial value of the char temp; will cause the whole string to change to that value.
int Class::numbers(int number, string& buffer) {
char temp;
if (number < 0 || buffer.length() > size) {
exit(0);
}
string fname = name + ".txt";
int start = number * size;
ifstream readin(fname.c_str());
readin.open(fname.c_str(), ios::in)
readin.seekg(start);
for (int i = 0; i < size; ++i) {
readin.get(temp);
buffer += temp;
}
cout << buffer << endl;
readin.close();
return 0;
}
Here is an example screenshot of the special characters being outputted: http://i.imgur.com/6HCI7TT.png
Could the issue be where I'm starting using seekg? It seems to start in the appropriate position. Another thing I've considered is that maybe I'm reading some invalid place into the stream and it's just giving me junk characters from memory.
Any thoughts?
WORKING SOLUTION:
int Class::numbers(int number, string& buffer) {
char temp;
if (number < 0 || buffer.length() > size) {
exit(0);
}
string fname = name + ".txt";
int start = number * size;
ifstream readin(fname.c_str());
readin.open(fname.c_str(), ios::in)
readin.seekg(start);
for (int i = 0; i < size; ++i) {
readin.get(temp);
buffer += temp;
}
cout << buffer << endl;
readin.close();
return 0;
}
Here is the working solution. In my program I had already had this file name open, so opening it twice was likely to cause issues I suppose. I will do some further testing on this in my own time.
For ASCII characters with a numeric value greater than 127, the actual character rendered on screen depends on the code page of the system you are currently using.
What is likely happening is that you are not getting a single "character" as you think you are.
First, to debug this, use your existing code to just open and print out an entire text file. Is your program capable of doing this? If not, it's likely that the "text" file you are opening isn't using ASCII, but possibly UTF or some other form of encoding. That means when you read a "character" (8-bits most likely), you're just reading half of a 16-bit "wide character", and the result is meaningless to you.
For example, the gedit application will automatically render "Hello World" on screen as I'd expect, regardless of character encoding. However, in a hex editor, a UTF8 encoded file looks like:
UTF8 Raw text:
0000000: 4865 6c6c 6f20 776f 726c 642e 0a Hello world..
While UTF16 looks like:
0000000: fffe 4800 6500 6c00 6c00 6f00 2000 7700 ..H.e.l.l.o. .w.
0000010: 6f00 7200 6c00 6400 2e00 0a00 o.r.l.d.....
This is what your program sees. C/C++ expect ASCII encoding by default. If you want to handle other encodings, it's up to your program to accomodate it manually or by using a third-party library.
Also, you aren't testing to see if you've exceeded the length of the file. You could just be grabbing random garbage.
Using a simple text file just containing the string "Hello World", can your program do this:
Code Listing
// read a file into memory
#include <iostream> // std::cout
#include <fstream> // std::ifstream
#include <string.h>
int main () {
std::ifstream is ("test.txt", std::ifstream::binary);
if (is) {
// get length of file:
is.seekg (0, is.end);
int length = is.tellg();
is.seekg (0, is.beg);
// allocate memory:
char * buffer = new char [length];
// read data as a block:
is.read (buffer,length);
// print content:
std::cout.write (buffer,length);
std::cout << std::endl;
// repeat at arbitrary locations:
for (int i = 0; i < length; i++ )
{
memset(buffer, 0x00, length);
is.seekg (i, is.beg);
is.read(buffer, length-i);
// print content:
std::cout.write (buffer,length);
std::cout << std::endl;
}
is.close();
delete[] buffer;
}
return 0;
}
Sample Output
Hello World
Hello World
ello World
llo World
lo World
o World
World
World
orld
rld
ld
d

C++ writing to text using buffer returns non ascii character at end of file.

I'm fairly new with C++ and am trying to read and write binary file. I have used the read and write functions to read text from one file and output it to a new file. However the following characters always appear at the end of the created text file "ÌÌ". Is a particular character indicating the end of file being saved in the character buffer?
int main(){
ifstream myfile("example.txt", ios::ate);
ofstream outfile("new.txt");
ifstream::pos_type size;
char buf [1024];
if(myfile.is_open()){
size=myfile.tellg();
cout<<"The file's size is "<<(int) size<<endl;
myfile.seekg(0,ios::beg);
while(!myfile.eof()){
myfile.read(buf, sizeof(buf));
}
outfile.write(buf,size);
}
else
cout<<"Error"<<endl;
myfile.close();
outfile.close();
cin.get();
return 0;
}
Not the only problem with your code (try it on a file bigger than 1024 bytes) but since you are doing binary I/O you need
ifstream myfile("example.txt", ios::ate|ios::binary);
ofstream outfile("new.txt", ios::binary);

In c++ seekg seems to include cr chars, but read() drops them

I'm currently trying to read the contents of a file into a char array.
For instance, I have the following text in a char array. 42 bytes:
{
type: "Backup",
name: "BackupJob"
}
This file is created in windows, and I'm using Visual Studio c++, so there is no OS compatibility issues.
However, executing the following code, at the completion of the for loop, I get Index: 39, with no 13 displayed prior to the 10's.
// Create the file stream and open the file for reading
ifstream fs;
fs.open("task.txt", ifstream::in);
int index = 0;
int ch = fs.get();
while (fs.good()) {
cout << ch << endl;
ch = fs.get();
index++;
}
cout << "----------------------------";
cout << "Index: " << index << endl;
return;
However, when attempting to create a char array the length of the file, reading the file size as per below results in the 3 additional CR chars attributing to the total filesize so that length is equal 42, which is adding screwing up the end of the array with dodgy bytes.
// Create the file stream and open the file for reading
ifstream fs;
fs.seekg(0, std::ios::end);
length = fs.tellg();
fs.seekg(0, std::ios::beg);
// Create the buffer to read the file
char* buffer = new char[length];
fs.read(buffer, length);
buffer[length] = '\0';
// Close the stream
fs.close();
Using a hex viewer, I have confirmed that file does indeed contain the CRLF (13 10) bytes in the file.
There seems to be a disparity with getting the end of the file, and what the get() and read() methods actually return.
Could anyone please help with this?
Cheers,
Justin
You should open your file in binary mode. This will stop read dropping CR.
fs.open("task.txt", ifstream::in|ifstream::binary);