I'm trying to read a binary file with the following format:
-64 bit integer
-3276 32-bit floats
-(Repeat last 2 lines until eof)
This is the block where I interpret the file:
ifstream bbrFile;
ofstream csvFile;
bbrFile.open(inFilename);
csvFile.open(dataFilename);
//yes I did actually check to make sure that the files had opened.
//I omitted it here for brevity
long long int time;
float point;
while (bbrFile)
{
bbrFile.read((char*)&time, sizeof(time));
csvFile << time;
for (int i = 0; i < 3276; i++) {
bbrFile.read((char*)&point, sizeof(point));
csvFile << ',' << point;
}
csvFile << "\n";
}
So far, my code is working fine, except that it thinks it's reached the end of file after reading in about 53 floats, and then just outputs the last float it read until the 'for' loop ends. I've tried using fread and FILE* instead of read and fstream, and gotten identical results. I've also tried replacing
while (bbrFile)
with
while (!bbrFile.eof())
To no avail.
Since the binary file is about 12 megabytes, I'm somewhat as a loss as to why it stops reading here.
To read the file as a binary file, you should add binary to the file mode:
bbrFile.open(inFilename, ios::binary);
otherwise it will be read as a text file, and some codes could be interpreted as an end-of-file mark.
Related
I am learning C++ with the "Programming: Principles and Practice Using C++" book from Bjarne Stroustrup. I am currently studying chapter 11 and I found an example on how to read and write binary files of integers (section 11.3.2). I played around with the example and used a .txt file (input.txt) with a sentence which I read and wrote to another file (output.txt) (text_to_binary fnc) and then read and wrote back to the original file (input.txt) (binary_to_text fnc).
#include<fstream>
#include<iostream>
using namespace std;
void text_to_binary(ifstream &ifs, ofstream &ofs)
{
for (int x; ifs.read(as_bytes(x), sizeof(char));)
{
ofs << x << '\n';
}
ofs.close();
ifs.close();
}
void binary_to_text(ifstream &ifs, ofstream &ofs)
{
for (int x; ifs >> x;)
{
ofs.write(as_bytes(x), sizeof(char));
}
ifs.close();
ofs.close();
}
int main()
{
string iname = "./chapter_11/input.txt";
string oname = "./chapter_11/output.txt";
ifstream ifs{iname, ios_base::binary};
ofstream ofs{oname, ios_base::binary};
text_to_binary(ifs, ofs);
ifstream ifs2{oname, ios_base::binary};
ofstream ofs2{iname, ios_base::binary};
binary_to_text(ifs2, ofs2);
return 0;
}
I figured out that I have to use sizeof(char) rather than sizeof(int) in the .read and .write command. If I use the sizeof(int) some chars of the .txt file go missing when I write them back to text. Funnily enough chars only goes missing if
x%4 != 0 (x = nb of chars in .txt file)
example with sizeof(int):
input.txt:
hello this is an amazing test. 1234 is a number everything else doesn't matter..asd
(text_to_binary fnc) results in:
output.txt:
1819043176
1752440943
1763734377
1851859059
1634558240
1735289210
1936028704
824192628
540291890
1629516649
1836412448
544367970
1919252069
1768453241
1696622446
543519596
1936027492
544483182
1953784173
774795877
(binary_to_text fnc) results back in:
input.txt:
hello this is an amazing test. 1234 is a number everything else doesn't matter..
asd went missing.
Now to my question, why does this happen? Is it because int's are saved as 4 bytes?
Bonus question: Out of interest, is there a simpler/more efficient way of doing this?
edit: updated the question with the results to make it hopefully more clear
When you attempt to do a partial read, the read will attempt to go beyond the end of the file and the eof flag will be set for the stream. This makes its use in the loop condition false so the loop ends.
You need to check gcount of the stream after the loop to see if any bytes was actually read into the variable x.
But note that partial reads will only write to parts of the variable x, leaving the rest indeterminate. Exactly which parts depends on the system endianness, and using the variable with its indeterminate bits will lead to undefined behavior.
I am writing a compression program with a huffman tree.
It generates a file filled with a bit of overhead de decompress and a bunch of random bits which are then split into pieces of 8 and turned into the char corresponding with those 8 bits. So essentially random chars. And then they are written into a file.
When reading this file two problems occur:
The chars shown when I cout the random chars is different from the ones in the file.
My loop that reads the file stops only a few lines in.
I'm using the following function to read the file:
void Convertor::HuffmanToFile(string outputLocation){
string fileInfo, fileDataPiece;
ifstream inputFile;
ofstream outputFile;
stringstream fileData;
outputFile.open(outputLocation, ofstream::out | ofstream::trunc);
inputFile.open(inputLocation);
if (inputFile.fail()) {
cerr << "Error opening text file" << endl;
exit(1);
}
while (inputFile >> fileDataPiece){
fileData << fileDataPiece;
}
inputFile.close();
Decoder decoder(fileInfo,fileData.str());
outputFile << decoder.decodeInfo();
outputFile.close();
}
If anyone could hand me a clue as to where I should look into that would be great!
Be careful when using operator>> from istream into string - it will skip over white space characters! My guess is that is causing the differences.
You are loading the whole file at once. Your way is unnecessary complicated. One good way to do it in C++ is described here: Read whole ASCII file into C++ std::string
Problem:
Split the binary I/O from the example code into two: one program that converts an ordinary text file into binary and one program that reads binary and converts into text. Test these programs by comparing a text file with what you get by converting it to binary and back.
Example code:
#include "std_lib_facilities.h"
int main(){
cout <<"Please enter input file name.\n";
string name;
cin >> name;
// open file to read, with no byte interpretation
ifstream ifs(name.c_str(), ios_base::binary);
if(!ifs) error("Can't open input file: ", name);
cout << "Please enter output file name.\n";
cin >> name;
// open file to write
ofstream ofs(name.c_str(), ios_base::binary);
if(!ofs) error("Can't open output file: ", name);
vector<int> v;
// read from binary file
int i;
while(ifs.read(as_bytes(i), sizeof(int))) v.push_back(i);
// do something with v
// write to binary file
for(int i = 0; i < v.size(); ++i) ofs.write(as_bytes(v[i]), sizeof(int));
return 0;
}
Here is my code, instead of reading and writing int values, I tried with strings:
#include "std_lib_facilities.h"
void textToBinary(string, string);
//--------------------------------------------------------------------------------
int main(){
const string info("This program converts text to binary files.\n");
cout << info;
const string testFile("test.txt");
const string binaryFile("binary.bin");
textToBinary(testFile, binaryFile);
getchar();
return 0;
}
//--------------------------------------------------------------------------------
void textToBinary(string ftest, string fbinary){
// open text file to read
ifstream ift(ftest);
if(!ift) error("Can't open input file: ", ftest);
// copy contents in vector
vector<string>textFile;
string line;
while (getline(ift,line)) textFile.push_back(line);
// open binary file to write
ofstream fb(fbinary, ios::binary);
if(!fb) error("Can't open output file: ", fbinary);
// convert text to binary, by writing the vector contents
for(size_t i = 0; i < textFile.size(); ++i){ fb.write(textFile[i].c_str(), textFile[i].length()); fb <<'\n';}
cout << "Conversion done!\n";
}
Note:
My text file contains Lorem Ipsum, no digits or special punctuation. After I write the text using binary mode, there is a perfect character interpretation and the source text file looks exactly like the destination. (My attention goes to the fact that when using binary mode and the function write(as_bytes(), sizeof()), the content of the text file is translated perfectly and there are not mistakes.)
Question:
How should the binary file look like after I use binary mode(no char interpretation) and the function write(as_bytes(), sizeof()) when writing?
In both Unix-land and Windows a file is primarily just a sequence of bytes.
With the Windows NTFS file system (which is default) you can have more than one sequence of bytes in the same file, but there is always one main sequence which is the one that ordinary tools see. To ordinary tools every file appears as just a single sequence of bytes.
Text mode and binary mode in C++ concern whether the basic i/o machinery should translate to and from an external convention. In Unix-land there is no difference. In Windows text mode translates newlines from internal single byte C convention (namely ASCII linefeed, '\n'), to external double byte Windows convention (namely ASCII carriage return '\r' + linefeed '\n'), and vice versa. Also, on input in Windows, encountering a single byte value 26, a "control Z", is or can be interpreted as end of file.
Regarding the literal question,
” The question is in what format are they written in the binary file, shouldn't they be written in not-interpreted form, i.e raw bytes?
the text is written as raw bytes in both cases. The difference is only about how newlines are translated to the external convention for newlines. Since your text 1)doesn't contain any newlines, there's no difference. Edit: Not shown in your code except by scrolling it sideways, there's a fb <<'\n' that outputs a newline to the file opened in binary mode, and if this produces the same bytes as in the original text file, then there is no effective translation, which implies you're not doing this in Windows.
About the extra streams for Windows files, they're used e.g. for Windows (file) Explorer's custom file properties, and they're accessible e.g. via a bug in the Windows command interpreter, like this:
C:\my\forums\so\0306>echo This is the main stream >x.txt
C:\my\forums\so\0306>dir | find "x"
04-Jul-15 08:36 PM 26 x.txt
C:\my\forums\so\0306>echo This is a second byte stream, he he >x.txt:2nd
C:\my\forums\so\0306>dir | find "x"
04-Jul-15 08:37 PM 26 x.txt
C:\my\forums\so\0306>type x.txt
This is the main stream
C:\my\forums\so\0306>type x.txt:2nd
The filename, directory name, or volume label syntax is incorrect.
C:\my\forums\so\0306>find /v "" <x.txt:2nd
This is a second byte stream, he he
C:\my\forums\so\0306>_
I just couldn't resist posting an example. :)
1) You state that “My text file contains Lorem Ipsum, no digits or special punctuation”, which indicates no newlines.
I am trying to read in a binary file and write in chunks to multiple output files. Say if there are 25 4byte numbers in total and chunk size is set to 20, the program will generate two output files one with 20 integers the other with 5. However if I have a input file with 40 integers, my program generates three files, first 2 files both have 20 numbers, however the third file has one number which is the last one from the input file and it is already included in the second output file. How do I force the read position to move forward every time reading a number?
ifstream fin("in.txt", ios::in | ios::binary);
if(fin.is_open())
{
while(!fin.eof()){
//set file name for each output file
fname[0] = 0;
strcpy(fname, "binary_chunk");
index[0] = 0;
sprintf(index, "%d", fcount);
strcat(fname, index);
// open output file to write
fout.open(fname);
for(i = 0; i < chunk; i++)
{
fin.read((char *)(&num), INT_SIZE);
fout << num << "\n";
if(fin.eof())
{
fout.close();
fin.close();
return;
}
}
fcount ++;
fout.close();
}
fout.close();
}
The problem is most likely your use of while (!fin.eof()). The eofbit flag is not set until after you have tried to read from beyond the end of the file. This means that the loop will loop one extra time without you noticing.
Instead you should remember that all input operations returns the stream object, and that stream objects can be used as boolean conditions. That means you can do like this:
while (fin.read(...))
This is safe from the problems with looping while !fin.eof().
And to answer your question in the title: Yes, the file position is moved when you successfully read anything. If you read X bytes, the read-position will be moved forward X bytes as well.
For converting an ordinary text file into binary and then convert that binary file back to a text file so that the first text file equals with the last text file, I have wrote below code.
But the bintex text file and the final text file aren't equal. I don't know which part of code is incorrect.
Input sample ("bintex") contains this: 1983 1362
The result ("final") contains this: 959788084
which of course are not equal.
#include <iostream>
#include <fstream>
using namespace std;
int main() try
{
string name1 = "bintex", name2 = "texbin", name3 = "final";
ifstream ifs1(name1.c_str());
if(!ifs1) error("Can't open file for reading.");
vector<int>v1, v2;
int i;
while(ifs1.read(as_bytes(i), sizeof(int)));
v1.push_back(i);
ifs1.close();
ofstream ofs1(name2.c_str(), ios::binary);
if(!ofs1) error("Can't open file for writting.");
for(int i=0; i<v1.size(); i++)
ofs1 << v1[i];
ofs1.close();
ifstream ifs2(name2.c_str(), ios::binary);
if(!ifs2) error("Can't open file for reading.");
while(ifs2.read(as_bytes(i), sizeof(int)));
v2.push_back(i);
ifs2.close();
ofstream ofs2(name3.c_str());
if(!ofs2) error("Can't open file for writting.");
for(int i=0; i<v2.size(); i++)
ofs2 << v2[i];
ofs2.close();
keep_window_open();
return 0;
}
//********************************
catch(exception& e)
{
cerr << e.what() << endl;
keep_window_open();
return 0;
}
What is this?
while(ifs1.read(as_bytes(i), sizeof(int)));
It looks like a loop that reads all input and throws it away. The line afterward suggests that you should be using braces instead of a semicolon there, and doing the write in the block.
Your read and write operations aren't symmetric.
ifs1.read(as_bytes(i), sizeof(int))
grabs 4 bytes, and dumps the values into the char* its passed.
ofs1 << v1[i];
output the integer in v[i] as text. Those are very very different formats.
If you used >> to read you would have a lot more success.
To expound, the first read might look like this {'1','9','8','3'}, which I would guess would be the 959788084 you are seeing when you pun it to an int. Your second read would be {' ','1','3','6'}, like not what you'd hoped for either.
It's not clear (to me, at least), what you are trying to do.
When you say that the orginal file contains 1983 1262, what do
you really mean? That it contains two four byte integers, in
some unspecified format, whose values are 1983 and 1262? If so,
the problem is probably due to your machine not using the same
format. You cannot, in general, just read bytes (using
istream::read) and expect them to mean anything in your
machine's internal format. You have to read the bytes into
a buffer, and unformat them, according to the format with which
they were written.
Of course, opening a stream in binary mode doesn't mean that
the actual data are in some binary format; it just affects
things like how (or more strictly speaking, whether) line
endings are encoded, and how end of file is recognized.
(Strictly speaking, a binary file is not divided into lines. It
is just a sequence of bytes. Of course, some of those bytes
might have values that you, in your program, interpret and new
line characters.) If your file actually contains nine bytes
with characters corresponding to "1983 1362", then you'll have
to parse them as a text format, even if the file is written in
binary. You can do this by reading the entire file into
a string, and usingstd::istringstream; _or_, on most common
systems (but not necessarily on all exotics) by using>>` to
read, just as you would with a text file.
EDIT:
Just a simple reminder: you don't show the code for as_bytes,
but I'm willing to guess that there's a reinterpret_cast in
it. And any time you have to use a reinterpret cast, you can be
very sure that what you're doing isn't portable, and if it's
supposed to be portable, you're doing it wrong.