Reading file returns weird results - c++

I am writing a compression program with a huffman tree.
It generates a file filled with a bit of overhead de decompress and a bunch of random bits which are then split into pieces of 8 and turned into the char corresponding with those 8 bits. So essentially random chars. And then they are written into a file.
When reading this file two problems occur:
The chars shown when I cout the random chars is different from the ones in the file.
My loop that reads the file stops only a few lines in.
I'm using the following function to read the file:
void Convertor::HuffmanToFile(string outputLocation){
string fileInfo, fileDataPiece;
ifstream inputFile;
ofstream outputFile;
stringstream fileData;
outputFile.open(outputLocation, ofstream::out | ofstream::trunc);
inputFile.open(inputLocation);
if (inputFile.fail()) {
cerr << "Error opening text file" << endl;
exit(1);
}
while (inputFile >> fileDataPiece){
fileData << fileDataPiece;
}
inputFile.close();
Decoder decoder(fileInfo,fileData.str());
outputFile << decoder.decodeInfo();
outputFile.close();
}
If anyone could hand me a clue as to where I should look into that would be great!

Be careful when using operator>> from istream into string - it will skip over white space characters! My guess is that is causing the differences.
You are loading the whole file at once. Your way is unnecessary complicated. One good way to do it in C++ is described here: Read whole ASCII file into C++ std::string

Related

C++: FStream thinks it has reached EOF of binary file

I'm trying to read a binary file with the following format:
-64 bit integer
-3276 32-bit floats
-(Repeat last 2 lines until eof)
This is the block where I interpret the file:
ifstream bbrFile;
ofstream csvFile;
bbrFile.open(inFilename);
csvFile.open(dataFilename);
//yes I did actually check to make sure that the files had opened.
//I omitted it here for brevity
long long int time;
float point;
while (bbrFile)
{
bbrFile.read((char*)&time, sizeof(time));
csvFile << time;
for (int i = 0; i < 3276; i++) {
bbrFile.read((char*)&point, sizeof(point));
csvFile << ',' << point;
}
csvFile << "\n";
}
So far, my code is working fine, except that it thinks it's reached the end of file after reading in about 53 floats, and then just outputs the last float it read until the 'for' loop ends. I've tried using fread and FILE* instead of read and fstream, and gotten identical results. I've also tried replacing
while (bbrFile)
with
while (!bbrFile.eof())
To no avail.
Since the binary file is about 12 megabytes, I'm somewhat as a loss as to why it stops reading here.
To read the file as a binary file, you should add binary to the file mode:
bbrFile.open(inFilename, ios::binary);
otherwise it will be read as a text file, and some codes could be interpreted as an end-of-file mark.

What should binary file look like after conversion from text?

Problem:
Split the binary I/O from the example code into two: one program that converts an ordinary text file into binary and one program that reads binary and converts into text. Test these programs by comparing a text file with what you get by converting it to binary and back.
Example code:
#include "std_lib_facilities.h"
int main(){
cout <<"Please enter input file name.\n";
string name;
cin >> name;
// open file to read, with no byte interpretation
ifstream ifs(name.c_str(), ios_base::binary);
if(!ifs) error("Can't open input file: ", name);
cout << "Please enter output file name.\n";
cin >> name;
// open file to write
ofstream ofs(name.c_str(), ios_base::binary);
if(!ofs) error("Can't open output file: ", name);
vector<int> v;
// read from binary file
int i;
while(ifs.read(as_bytes(i), sizeof(int))) v.push_back(i);
// do something with v
// write to binary file
for(int i = 0; i < v.size(); ++i) ofs.write(as_bytes(v[i]), sizeof(int));
return 0;
}
Here is my code, instead of reading and writing int values, I tried with strings:
#include "std_lib_facilities.h"
void textToBinary(string, string);
//--------------------------------------------------------------------------------
int main(){
const string info("This program converts text to binary files.\n");
cout << info;
const string testFile("test.txt");
const string binaryFile("binary.bin");
textToBinary(testFile, binaryFile);
getchar();
return 0;
}
//--------------------------------------------------------------------------------
void textToBinary(string ftest, string fbinary){
// open text file to read
ifstream ift(ftest);
if(!ift) error("Can't open input file: ", ftest);
// copy contents in vector
vector<string>textFile;
string line;
while (getline(ift,line)) textFile.push_back(line);
// open binary file to write
ofstream fb(fbinary, ios::binary);
if(!fb) error("Can't open output file: ", fbinary);
// convert text to binary, by writing the vector contents
for(size_t i = 0; i < textFile.size(); ++i){ fb.write(textFile[i].c_str(), textFile[i].length()); fb <<'\n';}
cout << "Conversion done!\n";
}
Note:
My text file contains Lorem Ipsum, no digits or special punctuation. After I write the text using binary mode, there is a perfect character interpretation and the source text file looks exactly like the destination. (My attention goes to the fact that when using binary mode and the function write(as_bytes(), sizeof()), the content of the text file is translated perfectly and there are not mistakes.)
Question:
How should the binary file look like after I use binary mode(no char interpretation) and the function write(as_bytes(), sizeof()) when writing?
In both Unix-land and Windows a file is primarily just a sequence of bytes.
With the Windows NTFS file system (which is default) you can have more than one sequence of bytes in the same file, but there is always one main sequence which is the one that ordinary tools see. To ordinary tools every file appears as just a single sequence of bytes.
Text mode and binary mode in C++ concern whether the basic i/o machinery should translate to and from an external convention. In Unix-land there is no difference. In Windows text mode translates newlines from internal single byte C convention (namely ASCII linefeed, '\n'), to external double byte Windows convention (namely ASCII carriage return '\r' + linefeed '\n'), and vice versa. Also, on input in Windows, encountering a single byte value 26, a "control Z", is or can be interpreted as end of file.
Regarding the literal question,
” The question is in what format are they written in the binary file, shouldn't they be written in not-interpreted form, i.e raw bytes?
the text is written as raw bytes in both cases. The difference is only about how newlines are translated to the external convention for newlines. Since your text 1)doesn't contain any newlines, there's no difference. Edit: Not shown in your code except by scrolling it sideways, there's a fb <<'\n' that outputs a newline to the file opened in binary mode, and if this produces the same bytes as in the original text file, then there is no effective translation, which implies you're not doing this in Windows.
About the extra streams for Windows files, they're used e.g. for Windows (file) Explorer's custom file properties, and they're accessible e.g. via a bug in the Windows command interpreter, like this:
C:\my\forums\so\0306>echo This is the main stream >x.txt
C:\my\forums\so\0306>dir | find "x"
04-Jul-15 08:36 PM 26 x.txt
C:\my\forums\so\0306>echo This is a second byte stream, he he >x.txt:2nd
C:\my\forums\so\0306>dir | find "x"
04-Jul-15 08:37 PM 26 x.txt
C:\my\forums\so\0306>type x.txt
This is the main stream
C:\my\forums\so\0306>type x.txt:2nd
The filename, directory name, or volume label syntax is incorrect.
C:\my\forums\so\0306>find /v "" <x.txt:2nd
This is a second byte stream, he he
C:\my\forums\so\0306>_
I just couldn't resist posting an example. :)
1) You state that “My text file contains Lorem Ipsum, no digits or special punctuation”, which indicates no newlines.

How to extract specific substring from getline function in C++?

I'm fairly new to C++ so please forgive me if my terminology or methodology isn't correct.
I'm trying to write a simple program that:
Opens two input files ("infileicd" and "infilesel").
Opens a single output file "list.txt".
Compares "infilesel" to "infileicd" line by line.
If a line from "infilesel" is found in "infileicd", it writes that line from "infileicd" to "list.txt", effectively making a separate log file.
I am using the getline() function to do this but have run into trouble when trying to compare each file line. I think it might be easier if I could use only the substring of interest to use as a comparison.
The problem is that there are multiple words within the entire getline string and I am only really interested in the second one. Here are two examples:
"1529 nic1_mau_op_mode_3 "8664afm007-01" "1" OUTPUT 1 0 LOGICAL 4 4136"
"1523 pilot_mfd_only_sel "8664afm003-02" "1" OUTPUT 1 0 LOGICAL 4 4112"
"nic1_mau_op_mode_3" and "pilot_mfd_only_sel" are the only substrings of interest.
It would make it a lot easier if I could only use that second substring to compare but I don't know how to extract it specifically from the getline() function. I haven't found anything suggesting it is impossible to do this, but if it is impossible, what would be an alternative method for extracting that substring?
This is a personal project so I'm under no time contstraints.
Any assistance is greatly apprecated in advance. Here is my code (so far):
int main()
{
//Open the file to write the selected variables to.
ofstream writer("list.txt");
//Open the selected variabels file to be read.
ifstream infilesel;
infilesel.open("varsel.txt");
//Open the icd file to be read.
ifstream infileicd;
infileicd.open("aic_fdk_host.txt");
//Check icd file for errors.
if (infileicd.fail()){
cerr << "Error opening icd.\n" << endl;
return 1;
}
else {
cout << "The icd file has been opened.\n";
}
//Check selected variables file for errors.
if (infilesel.fail()){
cerr << "Error opening selection file.\n" << endl;
return 1;
}
else {
cout << "The selection file has been opened.\n";
}
//Read each infile and copy contents of icd file to the list file.
string namesel;
string nameicd;
while(!infileicd.eof()){
getline(infileicd, nameicd);
getline(infilesel, namesel);
if (nameicd != namesel){ //This is where I would like to extract and compare the two specific strings
infileicd; //Skip to next line if not the same
} else {
writer << nameicd << namesel << endl;
}
}
writer.close();
infilesel.close();
infileicd.close();
return 0;
}
So, based on what we discussed in the comments, you just need to toss the stuff you don't want. So try this:
string namesel;
string nameicd;
string junk;
while(!infileicd.eof()){
// Get the first section, which we'll ignore
getline(infileicd, junk, ' ');
getline(infilesel, junk, ' ');
// Get the real data
getline(infileicd, nameicd, ' ');
getline(infilesel, namesel, ' ');
// Get the rest of the line, which we'll ignore
getline(infileicd, junk);
getline(infilesel, junk);
Basically, getline takes a delimiter, which by default is a newline. By setting it as a space the first time, you get rid of the first junk section, using the same method, you get the part you want, and then the final portion goes to the end of the line, also ignoring it.

Converting between text files and binary files in C++

For converting an ordinary text file into binary and then convert that binary file back to a text file so that the first text file equals with the last text file, I have wrote below code.
But the bintex text file and the final text file aren't equal. I don't know which part of code is incorrect.
Input sample ("bintex") contains this: 1983 1362
The result ("final") contains this: 959788084
which of course are not equal.
#include <iostream>
#include <fstream>
using namespace std;
int main() try
{
string name1 = "bintex", name2 = "texbin", name3 = "final";
ifstream ifs1(name1.c_str());
if(!ifs1) error("Can't open file for reading.");
vector<int>v1, v2;
int i;
while(ifs1.read(as_bytes(i), sizeof(int)));
v1.push_back(i);
ifs1.close();
ofstream ofs1(name2.c_str(), ios::binary);
if(!ofs1) error("Can't open file for writting.");
for(int i=0; i<v1.size(); i++)
ofs1 << v1[i];
ofs1.close();
ifstream ifs2(name2.c_str(), ios::binary);
if(!ifs2) error("Can't open file for reading.");
while(ifs2.read(as_bytes(i), sizeof(int)));
v2.push_back(i);
ifs2.close();
ofstream ofs2(name3.c_str());
if(!ofs2) error("Can't open file for writting.");
for(int i=0; i<v2.size(); i++)
ofs2 << v2[i];
ofs2.close();
keep_window_open();
return 0;
}
//********************************
catch(exception& e)
{
cerr << e.what() << endl;
keep_window_open();
return 0;
}
What is this?
while(ifs1.read(as_bytes(i), sizeof(int)));
It looks like a loop that reads all input and throws it away. The line afterward suggests that you should be using braces instead of a semicolon there, and doing the write in the block.
Your read and write operations aren't symmetric.
ifs1.read(as_bytes(i), sizeof(int))
grabs 4 bytes, and dumps the values into the char* its passed.
ofs1 << v1[i];
output the integer in v[i] as text. Those are very very different formats.
If you used >> to read you would have a lot more success.
To expound, the first read might look like this {'1','9','8','3'}, which I would guess would be the 959788084 you are seeing when you pun it to an int. Your second read would be {' ','1','3','6'}, like not what you'd hoped for either.
It's not clear (to me, at least), what you are trying to do.
When you say that the orginal file contains 1983 1262, what do
you really mean? That it contains two four byte integers, in
some unspecified format, whose values are 1983 and 1262? If so,
the problem is probably due to your machine not using the same
format. You cannot, in general, just read bytes (using
istream::read) and expect them to mean anything in your
machine's internal format. You have to read the bytes into
a buffer, and unformat them, according to the format with which
they were written.
Of course, opening a stream in binary mode doesn't mean that
the actual data are in some binary format; it just affects
things like how (or more strictly speaking, whether) line
endings are encoded, and how end of file is recognized.
(Strictly speaking, a binary file is not divided into lines. It
is just a sequence of bytes. Of course, some of those bytes
might have values that you, in your program, interpret and new
line characters.) If your file actually contains nine bytes
with characters corresponding to "1983 1362", then you'll have
to parse them as a text format, even if the file is written in
binary. You can do this by reading the entire file into
a string, and usingstd::istringstream; _or_, on most common
systems (but not necessarily on all exotics) by using>>` to
read, just as you would with a text file.
EDIT:
Just a simple reminder: you don't show the code for as_bytes,
but I'm willing to guess that there's a reinterpret_cast in
it. And any time you have to use a reinterpret cast, you can be
very sure that what you're doing isn't portable, and if it's
supposed to be portable, you're doing it wrong.

C++ edit a binary file with another

Solved! thanks all of you very much. My day has been made!(well morning, its 4am)
I'm trying to write a program in C++ that opens a .dat file in binary and replaces the first 1840 hex characters with that of another .dat file, while leaving the remaining hex values of the first .dat file the same. I have spent about 12 hours on this today and have had little success. I am a beginner programmer, I have taken one semester worth of c++ courses and we did not get to streams.
(it opens a file and everything, but deletes every thing after the new values have been added)
#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>
#include <cmath>
#include <cstring>
using namespace std;
int main (){
string filename;
long size;
char* memblock;
cout << " Enter a file to be modded by Mod.dat ";
cin >> filename;
ofstream infile ( filename ,std::ofstream::binary);
//filename: the file that will be opened and changed)
ifstream modFile ("Mod.dat", ifstream::binary);
// (mod.dat is the file that i get the first 1840 hex values from)
modFile.seekg (0,modFile.end);
size = modFile.tellg();
memblock = new char [size];
modFile.seekg (0, ios::beg);
modFile.read (memblock, size);
infile.write(memblock, 1840);
modFile.close();
infile.close();
cout << endl;
return 0;
}
Any help would be greatly appreciated, I hope there is some simple way to do this.
Solved! thanks all of you very much. My day has been made!(well morning, its 4am)
Edit:
You can modidy your file in place with something like :
std::fstream s(my_file_path, std::ios_base::binary);
s.seekp(position_of_data_to_overwrite, std::ios_base::beg);
s.write(my_data, size_of_data_to_overwrite);
std::fstream will not truncate your input file as std::ofstream does.
The other solution is to not use the same file for reading and writing. Use three files :
One for the output file.
One for the First input file.
One for the second input file.
fstream infile ( filename ,std::ofstream::binary); does not keeps the contents of the original file. Everything you write will erase the contents of the file.
Thus, you should:
open the output file
open the "Mod" file, read the first 1840 bytes from the first file, write them into the output file.
open the "main input file" file, move the cursor to 1840, read the remaining data and write it to the output file.
Depending on the "main input file" size, you may want to buffer you read/write operation.
My preferred fix, although Matthieu Rouget's fix does indeed work, is to just add ofstreeam::in to the opening of the input file:
ofstream infile ( filename.c_str(), std::ofstream::binary | ofstream::in);
(I had to use c_str() in my build, as glibc in my version doesn't take std::string as input).
I tested this on my local system (it took a while to realize that mod.dat is actually "Mod.dat"!)
It is probably a good idea to also check that the files actually opened, so something like this after ofstream infile line:
if (!infile)
{
cout << "Couldn't open " << filename << endl;
}
and similar for the modfile line.
And since you go through the effort of figuring out what the first part of the modfile size is, I would suggest that you also USE that for the writing of the file.