Converting between text files and binary files in C++ - c++

For converting an ordinary text file into binary and then convert that binary file back to a text file so that the first text file equals with the last text file, I have wrote below code.
But the bintex text file and the final text file aren't equal. I don't know which part of code is incorrect.
Input sample ("bintex") contains this: 1983 1362
The result ("final") contains this: 959788084
which of course are not equal.
#include <iostream>
#include <fstream>
using namespace std;
int main() try
{
string name1 = "bintex", name2 = "texbin", name3 = "final";
ifstream ifs1(name1.c_str());
if(!ifs1) error("Can't open file for reading.");
vector<int>v1, v2;
int i;
while(ifs1.read(as_bytes(i), sizeof(int)));
v1.push_back(i);
ifs1.close();
ofstream ofs1(name2.c_str(), ios::binary);
if(!ofs1) error("Can't open file for writting.");
for(int i=0; i<v1.size(); i++)
ofs1 << v1[i];
ofs1.close();
ifstream ifs2(name2.c_str(), ios::binary);
if(!ifs2) error("Can't open file for reading.");
while(ifs2.read(as_bytes(i), sizeof(int)));
v2.push_back(i);
ifs2.close();
ofstream ofs2(name3.c_str());
if(!ofs2) error("Can't open file for writting.");
for(int i=0; i<v2.size(); i++)
ofs2 << v2[i];
ofs2.close();
keep_window_open();
return 0;
}
//********************************
catch(exception& e)
{
cerr << e.what() << endl;
keep_window_open();
return 0;
}

What is this?
while(ifs1.read(as_bytes(i), sizeof(int)));
It looks like a loop that reads all input and throws it away. The line afterward suggests that you should be using braces instead of a semicolon there, and doing the write in the block.

Your read and write operations aren't symmetric.
ifs1.read(as_bytes(i), sizeof(int))
grabs 4 bytes, and dumps the values into the char* its passed.
ofs1 << v1[i];
output the integer in v[i] as text. Those are very very different formats.
If you used >> to read you would have a lot more success.
To expound, the first read might look like this {'1','9','8','3'}, which I would guess would be the 959788084 you are seeing when you pun it to an int. Your second read would be {' ','1','3','6'}, like not what you'd hoped for either.

It's not clear (to me, at least), what you are trying to do.
When you say that the orginal file contains 1983 1262, what do
you really mean? That it contains two four byte integers, in
some unspecified format, whose values are 1983 and 1262? If so,
the problem is probably due to your machine not using the same
format. You cannot, in general, just read bytes (using
istream::read) and expect them to mean anything in your
machine's internal format. You have to read the bytes into
a buffer, and unformat them, according to the format with which
they were written.
Of course, opening a stream in binary mode doesn't mean that
the actual data are in some binary format; it just affects
things like how (or more strictly speaking, whether) line
endings are encoded, and how end of file is recognized.
(Strictly speaking, a binary file is not divided into lines. It
is just a sequence of bytes. Of course, some of those bytes
might have values that you, in your program, interpret and new
line characters.) If your file actually contains nine bytes
with characters corresponding to "1983 1362", then you'll have
to parse them as a text format, even if the file is written in
binary. You can do this by reading the entire file into
a string, and usingstd::istringstream; _or_, on most common
systems (but not necessarily on all exotics) by using>>` to
read, just as you would with a text file.
EDIT:
Just a simple reminder: you don't show the code for as_bytes,
but I'm willing to guess that there's a reinterpret_cast in
it. And any time you have to use a reinterpret cast, you can be
very sure that what you're doing isn't portable, and if it's
supposed to be portable, you're doing it wrong.

Related

C++ binary files I/O, data lost when writing

I am learning C++ with the "Programming: Principles and Practice Using C++" book from Bjarne Stroustrup. I am currently studying chapter 11 and I found an example on how to read and write binary files of integers (section 11.3.2). I played around with the example and used a .txt file (input.txt) with a sentence which I read and wrote to another file (output.txt) (text_to_binary fnc) and then read and wrote back to the original file (input.txt) (binary_to_text fnc).
#include<fstream>
#include<iostream>
using namespace std;
void text_to_binary(ifstream &ifs, ofstream &ofs)
{
for (int x; ifs.read(as_bytes(x), sizeof(char));)
{
ofs << x << '\n';
}
ofs.close();
ifs.close();
}
void binary_to_text(ifstream &ifs, ofstream &ofs)
{
for (int x; ifs >> x;)
{
ofs.write(as_bytes(x), sizeof(char));
}
ifs.close();
ofs.close();
}
int main()
{
string iname = "./chapter_11/input.txt";
string oname = "./chapter_11/output.txt";
ifstream ifs{iname, ios_base::binary};
ofstream ofs{oname, ios_base::binary};
text_to_binary(ifs, ofs);
ifstream ifs2{oname, ios_base::binary};
ofstream ofs2{iname, ios_base::binary};
binary_to_text(ifs2, ofs2);
return 0;
}
I figured out that I have to use sizeof(char) rather than sizeof(int) in the .read and .write command. If I use the sizeof(int) some chars of the .txt file go missing when I write them back to text. Funnily enough chars only goes missing if
x%4 != 0 (x = nb of chars in .txt file)
example with sizeof(int):
input.txt:
hello this is an amazing test. 1234 is a number everything else doesn't matter..asd
(text_to_binary fnc) results in:
output.txt:
1819043176
1752440943
1763734377
1851859059
1634558240
1735289210
1936028704
824192628
540291890
1629516649
1836412448
544367970
1919252069
1768453241
1696622446
543519596
1936027492
544483182
1953784173
774795877
(binary_to_text fnc) results back in:
input.txt:
hello this is an amazing test. 1234 is a number everything else doesn't matter..
asd went missing.
Now to my question, why does this happen? Is it because int's are saved as 4 bytes?
Bonus question: Out of interest, is there a simpler/more efficient way of doing this?
edit: updated the question with the results to make it hopefully more clear
When you attempt to do a partial read, the read will attempt to go beyond the end of the file and the eof flag will be set for the stream. This makes its use in the loop condition false so the loop ends.
You need to check gcount of the stream after the loop to see if any bytes was actually read into the variable x.
But note that partial reads will only write to parts of the variable x, leaving the rest indeterminate. Exactly which parts depends on the system endianness, and using the variable with its indeterminate bits will lead to undefined behavior.

C++: FStream thinks it has reached EOF of binary file

I'm trying to read a binary file with the following format:
-64 bit integer
-3276 32-bit floats
-(Repeat last 2 lines until eof)
This is the block where I interpret the file:
ifstream bbrFile;
ofstream csvFile;
bbrFile.open(inFilename);
csvFile.open(dataFilename);
//yes I did actually check to make sure that the files had opened.
//I omitted it here for brevity
long long int time;
float point;
while (bbrFile)
{
bbrFile.read((char*)&time, sizeof(time));
csvFile << time;
for (int i = 0; i < 3276; i++) {
bbrFile.read((char*)&point, sizeof(point));
csvFile << ',' << point;
}
csvFile << "\n";
}
So far, my code is working fine, except that it thinks it's reached the end of file after reading in about 53 floats, and then just outputs the last float it read until the 'for' loop ends. I've tried using fread and FILE* instead of read and fstream, and gotten identical results. I've also tried replacing
while (bbrFile)
with
while (!bbrFile.eof())
To no avail.
Since the binary file is about 12 megabytes, I'm somewhat as a loss as to why it stops reading here.
To read the file as a binary file, you should add binary to the file mode:
bbrFile.open(inFilename, ios::binary);
otherwise it will be read as a text file, and some codes could be interpreted as an end-of-file mark.

Editing bmp pixel by pixel [C++]

I want to make a program that will take an image and replace the Blue component of every pixel with 0.
So I wrote this. I have one bmp image in the folder and a copy of it and as the input file i put in the name of the original and as the output name i write the copy. But when i try to open the second one after the program works it doesnt open properly. Could anyone help?
#include <iostream>
#include <fstream>
#include <cstdlib>
using namespace std;
int main()
{
ifstream ifs;
ofstream ofs;
char input[80];
char output[80];
cout<<"Input file name"<<endl;
cin>>input;
ifs.open(input, ios::binary);
if(!ifs)
{
cout<<"Error in opening file"<<endl;
system("pause");
return 0;
}
cout<<"Output file name"<<endl;
cin>>output;
ofs.open(output, ios::binary);
ifs.seekg(2);
int file_size;
ifs.read((char*)&file_size, sizeof(int));
ofs<<"Bitmap size: "<<file_size<<"\r\n";
ifs.seekg(10);
int beg;
ifs.read((char*)&beg, sizeof(int));
ofs<<"Beggining of image: "<<beg<<"\r\n";
ifs.seekg(18);
int columns;
ifs.read((char*)&columns, sizeof(int));
ofs<<"Column number: "<<columns<<"\r\n";
ifs.seekg(22);
int rows;
ifs.read((char*)&rows, sizeof(int));
ofs<<"Row number: "<<rows<<"\r\n";
int image_size=0;
columns+=(3*columns)%4;
image_size=3*columns*rows;
ofs<<"Size of image"<<image_size<<"\r\n";
ifs.seekg(beg);
unsigned char R,G,B;
for(int i=beg; i<image_size+beg; i+=3)
{
ofs.seekp(i);
ofs<<char(0);
}
system("pause");
return 0;
}
There is no image file format that contains text like "Bitmap size: ", "Beginning of image: ", "Column number: ", "Row number: ", and "Size of image".
Even if there was such a file format, you are not writing "Beginning", you are writing "Beggining", and that would not work. Computers tend to be very partial to proper spelling.
Even if there was such a file format, it would not be the same as the file format that you are reading, because you are reading an int at offset 2 and interpreting it as some kind of file size, but you are not writing any size at offset 2 of your output file.
So, to cut a long story short, you have to have a very clear specification of the file format you are reading, (which you have told us nothing about,) and you also have to follow this exact same specification in writing the file.
Making up your own file format specification as you go along will not work.
Also, in the future, keep in mind that on stackoverflow, phrases like "it doesnt open properly" are not considered valid descriptions of technical issues. Be specific about precisely what is happening.
Hint: you appear to be trying to edit the file in-place, by seeking to individual bytes and overwriting them with zero. That won't work on an empty file. Copy the entire contents of the original file to the new filename, and then go seeking and overwriting bytes on the copy.
So I'll post my comment as answer:
I don't know much about BMP format, but... does it really contains strings such a "Size of image" or "Row number: "?
If not, remove ofs<<"Beggining of image: "<<beg<<"\r\n"; etc., I think that you meant cout instead of ofs.
Ok so in stead of reading a file to modify I just copy the whole content of the original file to the output file name and modify that. Thanks guys, and thanks Mike, I did that at your suggestion.

What should binary file look like after conversion from text?

Problem:
Split the binary I/O from the example code into two: one program that converts an ordinary text file into binary and one program that reads binary and converts into text. Test these programs by comparing a text file with what you get by converting it to binary and back.
Example code:
#include "std_lib_facilities.h"
int main(){
cout <<"Please enter input file name.\n";
string name;
cin >> name;
// open file to read, with no byte interpretation
ifstream ifs(name.c_str(), ios_base::binary);
if(!ifs) error("Can't open input file: ", name);
cout << "Please enter output file name.\n";
cin >> name;
// open file to write
ofstream ofs(name.c_str(), ios_base::binary);
if(!ofs) error("Can't open output file: ", name);
vector<int> v;
// read from binary file
int i;
while(ifs.read(as_bytes(i), sizeof(int))) v.push_back(i);
// do something with v
// write to binary file
for(int i = 0; i < v.size(); ++i) ofs.write(as_bytes(v[i]), sizeof(int));
return 0;
}
Here is my code, instead of reading and writing int values, I tried with strings:
#include "std_lib_facilities.h"
void textToBinary(string, string);
//--------------------------------------------------------------------------------
int main(){
const string info("This program converts text to binary files.\n");
cout << info;
const string testFile("test.txt");
const string binaryFile("binary.bin");
textToBinary(testFile, binaryFile);
getchar();
return 0;
}
//--------------------------------------------------------------------------------
void textToBinary(string ftest, string fbinary){
// open text file to read
ifstream ift(ftest);
if(!ift) error("Can't open input file: ", ftest);
// copy contents in vector
vector<string>textFile;
string line;
while (getline(ift,line)) textFile.push_back(line);
// open binary file to write
ofstream fb(fbinary, ios::binary);
if(!fb) error("Can't open output file: ", fbinary);
// convert text to binary, by writing the vector contents
for(size_t i = 0; i < textFile.size(); ++i){ fb.write(textFile[i].c_str(), textFile[i].length()); fb <<'\n';}
cout << "Conversion done!\n";
}
Note:
My text file contains Lorem Ipsum, no digits or special punctuation. After I write the text using binary mode, there is a perfect character interpretation and the source text file looks exactly like the destination. (My attention goes to the fact that when using binary mode and the function write(as_bytes(), sizeof()), the content of the text file is translated perfectly and there are not mistakes.)
Question:
How should the binary file look like after I use binary mode(no char interpretation) and the function write(as_bytes(), sizeof()) when writing?
In both Unix-land and Windows a file is primarily just a sequence of bytes.
With the Windows NTFS file system (which is default) you can have more than one sequence of bytes in the same file, but there is always one main sequence which is the one that ordinary tools see. To ordinary tools every file appears as just a single sequence of bytes.
Text mode and binary mode in C++ concern whether the basic i/o machinery should translate to and from an external convention. In Unix-land there is no difference. In Windows text mode translates newlines from internal single byte C convention (namely ASCII linefeed, '\n'), to external double byte Windows convention (namely ASCII carriage return '\r' + linefeed '\n'), and vice versa. Also, on input in Windows, encountering a single byte value 26, a "control Z", is or can be interpreted as end of file.
Regarding the literal question,
” The question is in what format are they written in the binary file, shouldn't they be written in not-interpreted form, i.e raw bytes?
the text is written as raw bytes in both cases. The difference is only about how newlines are translated to the external convention for newlines. Since your text 1)doesn't contain any newlines, there's no difference. Edit: Not shown in your code except by scrolling it sideways, there's a fb <<'\n' that outputs a newline to the file opened in binary mode, and if this produces the same bytes as in the original text file, then there is no effective translation, which implies you're not doing this in Windows.
About the extra streams for Windows files, they're used e.g. for Windows (file) Explorer's custom file properties, and they're accessible e.g. via a bug in the Windows command interpreter, like this:
C:\my\forums\so\0306>echo This is the main stream >x.txt
C:\my\forums\so\0306>dir | find "x"
04-Jul-15 08:36 PM 26 x.txt
C:\my\forums\so\0306>echo This is a second byte stream, he he >x.txt:2nd
C:\my\forums\so\0306>dir | find "x"
04-Jul-15 08:37 PM 26 x.txt
C:\my\forums\so\0306>type x.txt
This is the main stream
C:\my\forums\so\0306>type x.txt:2nd
The filename, directory name, or volume label syntax is incorrect.
C:\my\forums\so\0306>find /v "" <x.txt:2nd
This is a second byte stream, he he
C:\my\forums\so\0306>_
I just couldn't resist posting an example. :)
1) You state that “My text file contains Lorem Ipsum, no digits or special punctuation”, which indicates no newlines.

Binary file I/O issues

Edit: I'm trying to convert a text file into bytes. I'm not sure if the code is turning it into bytes or not. Here is the link to the header so you can see the as_bytes function.
link
#include "std_lib_facilities.h"
int main()
{
cout << "Enter input file name.\n";
string file;
cin >> file;
ifstream in(file.c_str(), ios::binary);
int i;
vector<int> bin;
while(in.read(as_bytes(i), sizeof(int)))
bin.push_back(i);
ofstream out(file.c_str(), ios::out);
for(int i = 0; i < bin.size(); ++i)
out << bin[i];
keep_window_open();
}
Note that now the out stream just outputs the contents of the vector. It doesn't use the write function or the binary mode. This converts the file to a large line of numbers - is this what I'm looking for?
Here is an example of the second code's file conversion:
that guy likes to eat lots of pie (not sure if this was exact text)
turns to
543518319544825700191924850016351970295432362115448292821701667182186922608417526375411952522351186935715718643976841768956006
The reason your first method didn't change the file is because all files are stored in the same way. The only "difference" between text files and binary files is that text files contain only bytes that can be shown as ASCII characters, while binary files* have a much more random variety and order of bytes. So you are reading bytes in as bytes and then outputting them as bytes again!
*I'm including Unicode text files as binary, since they can have multiple bytes to denote one character point, depending on the character point and the encoding used.
The second method is also fairly simple. You are reading in the bytes, as before, and storing them in integers (which are probably 4 bytes long). Then you are just printing out the integers as if they are integers, so you are seeing a string of numbers.
As for why your first method cut off some of the bytes, you're right in that it's probably some bug in your code. I thought it was more important to explain what the ideas are in this case, rather than debug some test code.