Binary file I/O issues - c++

Edit: I'm trying to convert a text file into bytes. I'm not sure if the code is turning it into bytes or not. Here is the link to the header so you can see the as_bytes function.
link
#include "std_lib_facilities.h"
int main()
{
cout << "Enter input file name.\n";
string file;
cin >> file;
ifstream in(file.c_str(), ios::binary);
int i;
vector<int> bin;
while(in.read(as_bytes(i), sizeof(int)))
bin.push_back(i);
ofstream out(file.c_str(), ios::out);
for(int i = 0; i < bin.size(); ++i)
out << bin[i];
keep_window_open();
}
Note that now the out stream just outputs the contents of the vector. It doesn't use the write function or the binary mode. This converts the file to a large line of numbers - is this what I'm looking for?
Here is an example of the second code's file conversion:
that guy likes to eat lots of pie (not sure if this was exact text)
turns to
543518319544825700191924850016351970295432362115448292821701667182186922608417526375411952522351186935715718643976841768956006

The reason your first method didn't change the file is because all files are stored in the same way. The only "difference" between text files and binary files is that text files contain only bytes that can be shown as ASCII characters, while binary files* have a much more random variety and order of bytes. So you are reading bytes in as bytes and then outputting them as bytes again!
*I'm including Unicode text files as binary, since they can have multiple bytes to denote one character point, depending on the character point and the encoding used.
The second method is also fairly simple. You are reading in the bytes, as before, and storing them in integers (which are probably 4 bytes long). Then you are just printing out the integers as if they are integers, so you are seeing a string of numbers.
As for why your first method cut off some of the bytes, you're right in that it's probably some bug in your code. I thought it was more important to explain what the ideas are in this case, rather than debug some test code.

Related

C++ ifstream will read some values then stop

I am trying to write a program that reads 940 4-byte long values of binary data [hex] from a bin file, and output the values to console. I have ifstream::read, cout and seekg operations in a loop.
It will work for the first 10 or so iterations, and then in one iteration skip the read and write operations, preform the seekg operation, and continue on reading and writing. Also the last 200 lines or so are coming out the same value.
It will work properly for 12 iterations, then it will start outputting the wrong numbers. At this point it goes from address 0x230 to 0x28B when it should be at 0x260. It looks like read and cout are not called in this particular iteration.
The last correct value reads 3f4fc938. The next value should be 3ef646c1.
Does anyone know why this would fail? Any help is appreciated.
This is the program:
int main(int argc, char* argv[]) {
fstream in;
uint32_t buffer;
in.open(argv[1]);
in.seekg(0x6500,in.beg);
for(int i = 0; i < 940; i++) {
in.read(reinterpret_cast<char*> (&buffer),4);
cout << hex << buffer << endl;
in.seekg(0x2c,in.cur);
}
}
You have opened your file in text mode. Text mode means that operations on the file will interpret a Byte sequence that matches the platform-specific representation of a newline as a single '\n' character. If you're on Windows, for example, newlines are represented as the Byte sequence 0D 0A. So on Windows, whatever you do in your file will work well up to the point where your file happens to have a Byte with value 13 followed by a Byte with value 10. Once you reach that point, that 13 followed by 10 will be interpreted as a single character. Essentially, text mode will just swallow any Byte with value 13 if it happens to appear right before a Byte with value 10. Your application will never see the 13 and anything beyond the point where the 13 appeared will end up "shifted" by one Byte. On other platforms, other newline representations are common. If you wanna work with binary data, you will generally want to open your file in binary mode, for example
fstream in(argv[1], std::ios::binary);
or
in.open(argv[1], std::ios::binary);

Binary file not holding data properly

Im currently trying to replace a text based file in my application with a binary one. Im just doing some early tests so the code isn't exactly safe but I'm having problems with the data.
When trying to read out the data it gets about half way before it starts coming back with incorrect results.
Im creating the file in c++ and my client application is c#. I think the problem is in my c++ (which I haven't used very much)
Where the problem is at the moment is I have a vector of a struct that is called DoubleVector3 which consists of 3 doubles
struct DoubleVector3 {
double x, y, z;
DoubleVector3(std::string line);
};
Im currently writing the variables individually to the file
void ObjElement::WriteToFile(std::string file) {
std::ofstream fileStream;
fileStream.open(file); //, ios::out | ios::binary);
// ^^problem was this line. it should be
// fileStream.open(file, std::ios_base::out | std::ios_base::binary);
fileStream << this->name << '\0';
fileStream << this->materialName << '\0';
int size = this->vertices.size();
fileStream.write((char*)&size,sizeof(size));
//i have another int written here
for (int i=0; i<this->vertices.size(); i++) {
fileStream.write((char*)&this->vertices[i].x, 8);
fileStream.write((char*)&this->vertices[i].y, 8);
fileStream.write((char*)&this->vertices[i].z, 8);
}
fileStream.close();
}
When I read the file in c# the first 6 sets of 3 doubles are all correct but then I start getting 0s and minus infinities
Am I doing anything obviously wrong in my WriteToFile code?
I have the file uploaded on mega if anyone needs to look at it
https://mega.co.nz/#!XEpHTSYR!87ihtCfnGXJJNn13iE6GIpeRhlhbabQHFfN88kr_BAk
(im writing the name and material in first then the number of vertices before the actual list of vertices)
Small side question - Should I delimit these doubles or just add them in one after the other?
To store binary data in a stream, you must add std::ios_base::binary to the stream's flags when opening it. Without this, the stream is opened in text mode and line-ending conversions can happen.
On Windows, line-ending conversions mean inserting a byte 0x0D (ASCII for carriage-return) before each 0x0A byte (ASCII for line-feed). Needless to say, this corrupts binary data.

visual studio c++ cin big string from command line

When I run the following program and paste 50000 symbols to the command line, the program gets 4096 symbols only. Could you please suggest me what to do in order to get the full list of symbols?
#include <iostream>
#include <string>
using namespace std;
int main()
{
char temp[50001];
while (cin.getline(temp, 50001, '\n'))
{
string s(temp);
cout << s.size() << endl;
}
return 0;
}
P.S.
When I read the symbols from file using fstream, it's OK
I'm taking a leap jump here but since many powershell terminals have 4096 truncation limits (take a look at the Out-File documentation), this is likely a Windows command line limitation rather than a getline limitation.
The same problem has been encountered previously by others: https://github.com/Discordia/large-std-input/blob/master/LargeStdInput/Main.cpp
I don't understand why you are reading into a character array, then transferring it into a string.
In any case, your issue may be with repeated allocations.
Reading into std::string directly
Two simple lines:
std::string s;
getline(cin, s, '\n');
Reading into an array first
Yes, there is a simpler method:
#define BUFFER_SIZE 8196 // Very important, named constant
char temp[BUFFER_SIZE];
cin.getline(temp, BUFFER_SIZE, '\n');
// Get the number of characters actually read
unsigned int chars_read = cin.gcount();
std::string s(temp, chars_read); // Here's how to transfer the characters.
Using a debugger, you need to view the value in chars_read to verify that the quantity of characters read is valid.
Binary reading
Some platforms provide translations between the data read and your program. For example, Windows uses Ctrl-Z as an EOF character; Linux uses Ctrl-D.
The input data may use UTF encoding and contain values outside the range of ASCII printable set.
So, the preferred method is to read from a stream opened in binary mode. Unfortunately, cin cannot be opened easily in binary mode.
See Open cin in binary
The preferred method, if possible, is to put the text into a file and read from the file.

Converting between text files and binary files in C++

For converting an ordinary text file into binary and then convert that binary file back to a text file so that the first text file equals with the last text file, I have wrote below code.
But the bintex text file and the final text file aren't equal. I don't know which part of code is incorrect.
Input sample ("bintex") contains this: 1983 1362
The result ("final") contains this: 959788084
which of course are not equal.
#include <iostream>
#include <fstream>
using namespace std;
int main() try
{
string name1 = "bintex", name2 = "texbin", name3 = "final";
ifstream ifs1(name1.c_str());
if(!ifs1) error("Can't open file for reading.");
vector<int>v1, v2;
int i;
while(ifs1.read(as_bytes(i), sizeof(int)));
v1.push_back(i);
ifs1.close();
ofstream ofs1(name2.c_str(), ios::binary);
if(!ofs1) error("Can't open file for writting.");
for(int i=0; i<v1.size(); i++)
ofs1 << v1[i];
ofs1.close();
ifstream ifs2(name2.c_str(), ios::binary);
if(!ifs2) error("Can't open file for reading.");
while(ifs2.read(as_bytes(i), sizeof(int)));
v2.push_back(i);
ifs2.close();
ofstream ofs2(name3.c_str());
if(!ofs2) error("Can't open file for writting.");
for(int i=0; i<v2.size(); i++)
ofs2 << v2[i];
ofs2.close();
keep_window_open();
return 0;
}
//********************************
catch(exception& e)
{
cerr << e.what() << endl;
keep_window_open();
return 0;
}
What is this?
while(ifs1.read(as_bytes(i), sizeof(int)));
It looks like a loop that reads all input and throws it away. The line afterward suggests that you should be using braces instead of a semicolon there, and doing the write in the block.
Your read and write operations aren't symmetric.
ifs1.read(as_bytes(i), sizeof(int))
grabs 4 bytes, and dumps the values into the char* its passed.
ofs1 << v1[i];
output the integer in v[i] as text. Those are very very different formats.
If you used >> to read you would have a lot more success.
To expound, the first read might look like this {'1','9','8','3'}, which I would guess would be the 959788084 you are seeing when you pun it to an int. Your second read would be {' ','1','3','6'}, like not what you'd hoped for either.
It's not clear (to me, at least), what you are trying to do.
When you say that the orginal file contains 1983 1262, what do
you really mean? That it contains two four byte integers, in
some unspecified format, whose values are 1983 and 1262? If so,
the problem is probably due to your machine not using the same
format. You cannot, in general, just read bytes (using
istream::read) and expect them to mean anything in your
machine's internal format. You have to read the bytes into
a buffer, and unformat them, according to the format with which
they were written.
Of course, opening a stream in binary mode doesn't mean that
the actual data are in some binary format; it just affects
things like how (or more strictly speaking, whether) line
endings are encoded, and how end of file is recognized.
(Strictly speaking, a binary file is not divided into lines. It
is just a sequence of bytes. Of course, some of those bytes
might have values that you, in your program, interpret and new
line characters.) If your file actually contains nine bytes
with characters corresponding to "1983 1362", then you'll have
to parse them as a text format, even if the file is written in
binary. You can do this by reading the entire file into
a string, and usingstd::istringstream; _or_, on most common
systems (but not necessarily on all exotics) by using>>` to
read, just as you would with a text file.
EDIT:
Just a simple reminder: you don't show the code for as_bytes,
but I'm willing to guess that there's a reinterpret_cast in
it. And any time you have to use a reinterpret cast, you can be
very sure that what you're doing isn't portable, and if it's
supposed to be portable, you're doing it wrong.

Reading file byte by byte with ifstream::get

I wrote this binary reader after a tutorial on the internet. (I'm trying to find the link...)
The code reads the file byte by byte and the first 4 bytes are together the magic word. (Let's say MAGI!) My code looks like this:
std::ifstream in(fileName, std::ios::in | std::ios::binary);
char *magic = new char[4];
while( !in.eof() ){
// read the first 4 bytes
for (int i=0; i<4; i++){
in.get(magic[i]);
}
// compare it with the magic word "MAGI"
if (strcmp(magic, "MAGI") != 0){
std::cerr << "Something is wrong with the magic word: "
<< magic << ", couldn't read the file further! "
<< std::endl;
exit(1);
}
// read the rest ...
}
Now here comes the problem, when I open my file, I get this error output:
Something is wrong with the magic word: MAGI?, couldn't read the file further! So there is always one (mostly random) character after the word MAGI, like in this example the character ?!
I do think that it has something to do with how a string in C++ is stored and compared with each other. Am I right and how can I avoid this?
PS: this implementation is included in another program and works totally fine ... weird.
strcmp assumes that both strings are nul-terminated (end with a nul-character). When you want to compare strings which are not terminated, like in this case, you need to use strncmp and tell it how many characters to compare (4 in this case).
if (strncmp(magic, "MAGI", 4) != 0){
When you try to use strcmp to compare not null-terminated char arrays, it can't tell how long the arrays are (you can't tell the length of an array in C/C++ just by looking at the array itself - you need to know the length it was allocated with. The standard library is not exempt from this limitation). So it reads any data which happens to be stored in memory after the char array until it hits a 0-byte.
By the way: Note the comment to your question by Lightness Races in Orbit, which is unrelated to the issue you are having now, but which hints a different bug which might cause you some problems later on.