Weird behavior writing/reading simple binary file - c++

I'm writing and reading on a binary file. I'm getting small errors when outputting the reads.
The strings are there but with little snippets like: (I"�U) (�U) appended to the end of ~30% of them
I'm using g++ compiler on Ubuntu
Simplified code:
struct Db_connection
{
public:
string name;
}
int Db_connection::write_config()
{
ofstream config_f("config.dat", std::ios_base::binary | std::ios_base::out); //open file
string str = name;
int size = str.length();
config_f.write(reinterpret_cast<char *>(&size), sizeof(int)); // write size of string in int size chunk
config_f.write(str.c_str(), size); //write string
config_f.close();
return 0;
}
Db_connection read_config()
{
ifstream config_f("config.dat", std::ios_base::binary | std::ios_base::in);
Db_connection return_obj;
int size;
string data;
config_f.read(reinterpret_cast<char *>(&size), sizeof(int)); // read string size
char buffer[size];
config_f.read(buffer, size); // read string
data.assign(buffer);
return_obj.name = data;
return return_obj;
}
Is there anything obvious I am messing up? Does this have to do with Endian? I tried to minimize the code to it's absolute essentials
The actual code is more complex. I have a class holding vectors of 2 structs. 1 struct has four string members and the other has a string and bool. These fuctions are actually a member of and return (respectively) that class. The fuctions loop through the vectors writing struct members sequentially.
Two oddities:
To debug, I added outputs of the size and data variables on each iteration in both the read and write functions. size comes out accurate and consistent on both sides. data is accurate on the write side but with the weird special characters on the read side. I'm looking at outputs like:
Read Size: 12
Data: random addy2�U //the 12 human readable chars are there but with 2 extra symbols
The final chunk of data (a bool) comes out fine every time, so I don't think there is a file pointer issue. If its relevant: every bool and int is fine. Its just a portion of the strings.
Hopefully i'm making a bonehead mistake and this minimized code can be critiqued. The actual example would be too long.

Big thanks to WhozCraig,
The following edit did, indeed, work:
Db_connection read_config()
{
ifstream config_f("config.dat", std::ios_base::binary | std::ios_base::in);
Db_connection return_obj;
int size;
string data;
config_f.read(reinterpret_cast<char *>(&size), sizeof(int)); // read string size
vector<char> buff(size);
config_f.read(buff.data(), size);
data = string(buff.begin(), buff.end());
return_obj.name = data;
return return_obj;
}
As paddy pointed out directly and WhozCraig alluded to, this code still needs to implement a standardized, portable data type for recording the integer properly into binary and the write function needs to be rethought as well.
Thank you very much to the both of you. I read like 5-8 top search results for "cpp binary i/o" before writing my code and still ended up with that mess. You guys saved me hours/days of my life.

Related

Why some data in binary file is shown as it is and other is shown in a strange way

I have code, which writes vector of such structures to a binary file:
struct reader{
char name[50];
int card_num;
char title[100];
}
Everything works actually fine but when I, for example, write to file structure {One,1,One} and open .txt file, where it is stored, I see this:
One ММММММММММММММММММММММММММММММММММММММММММММММММ One ММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММММ
So I was asked why is it displayed so, what it depends on, but I could'nt give a good answer to that question
EDITED:
Added code which I use to write to file
void Write_to_File(vector<reader>& vec){
cin.clear(); // clearing
fflush(stdin);// input stream
const char* pointer = reinterpret_cast<const char*>(&vec[0]);
size_t bytes = vec.size() * sizeof(vec[0]);
fstream f("D:\\temp.txt", ios::out);
f.close();
ofstream file("D:\\temp.txt", ios::in | ios::binary);
file.write(pointer, bytes);
file.close();
remove("D:\\lab.txt");
rename("D:\\temp.txt", "D:\\lab.txt");
cout << "\n*** Successfully written data ***\n\n";
}
P.S. When I read from file everything is ok
You write 154 octets in a file, only One and One are char, so your text editor try to read char but get mostly garbage. You write binary, you should not expect to have something readable.
Why some data in binary file is shown as it is and other is shown in a strange way
It seems that you are trying to read the binary data as if it contained character encoded data. Some of it does - but not all. Perhaps this is why you think that it seems strange. Other than that, the output seems perfectly reasonable.
why is it displayed so
Because that is the textual representation of the data that the object contains in the character encoding that your reader uses.
what it depends on
It depends on the values that you have initialized the memory to have. For example the first character is displayed as O because you have initialized name[0] with the value 'O'. Some of the data is padding between members that can not be initialized directly. What the value of those bytes depends on is unspecified.

Writing struct of vector to a binary file in c++

I have a struct and I would like to write it to a binary file (c++ / visual studio 2008).
The struct is:
struct DataItem
{
std::string tag;
std::vector<int> data_block;
DataItem(): data_block(1024 * 1024){}
};
I am filling tha data_block vector with random values:
DataItem createSampleData ()
{
DataItem data;
std::srand(std::time(NULL));
std::generate(data.data_block.begin(), data.data_block.end(), std::rand);
data.tag = "test";
return data;
}
And trying to write the struct to file:
void writeData (DataItem data, long fileName)
{
ostringstream ss;
ss << fileName;
string s(ss.str());
s += ".bin";
char szPathedFileName[MAX_PATH] = {0};
strcat(szPathedFileName,ROOT_DIR);
strcat(szPathedFileName,s.c_str());
ofstream f(szPathedFileName, ios::out | ios::binary | ios::app);
// ******* first I tried to write this way then one by one
//f.write(reinterpret_cast<char *>(&data), sizeof(data));
// *******************************************************
f.write(reinterpret_cast<const char *>(&data.tag), sizeof(data.tag));
f.write(reinterpret_cast<const char *>(&data.data_block), sizeof(data.data_block));
f.close();
}
And the main is:
int main()
{
DataItem data = createSampleData();
for (int i=0; i<5; i++) {
writeData(data,i);
}
}
So I expect a file size at least (1024 * 1024) * 4 (for vector)+ 48 (for tag) but it just writes the tag to the file and creates 1KB file to hard drive.
I can see the contents in while I'm debugging but it doesn't write it to file...
What's wrong with this code, why can't I write the strcut to vector to file? Is there a better/faster or probably efficient way to write it?
Do I have to serialize the data?
Thanks...
Casting a std::string to char * will not produce the result you expect. Neither will using sizeof on it. The same for a std::vector.
For the vector you need to use either the std::vector::data method, or using e.g. &data.data_block[0]. As for the size, use data.data_block.size() * sizeof(int).
Writing the string is another matter though, especially if it can be of variable length. You either have to write it as a fixed-length string, or write the length (in a fixed-size format) followed by the actual string, or write a terminator at the end of the string. To get a C-style pointer to the string use std::string::c_str.
Welcome to the merry world of C++ std::
Basically, vectors are meant to be used as opaque containers.
You can forget about reinterpret_cast right away.
Trying to shut the compiler up will allow you to create an executable, but it will produce silly results.
Basically, you can forget about most of the std::vector syntactic sugar that has to do with iterators, since your fstream will not access binary data through them (it would output a textual representation of your data).
But all is not lost.
You can access the vector underlying array using the newly (C++11) introduced .data() method, though that defeats the point of using an opaque type.
const int * raw_ptr = data.data_block.data();
that will gain you 100 points of cool factor instead of using the puny
const int * raw_ptr = &data.data_block.data[0];
You could also use the even more cryptic &data.data_block.front() for a cool factor bonus of 50 points.
You can then write your glob of ints in one go:
f.write (raw_ptr, sizeof (raw_ptr[0])*data.data_block.size());
Now if you want to do something really too simple, try this:
for (int i = 0 ; i != data.data_block.size() ; i++)
f.write (&data.data_block[i], sizeof (data.data_block[i]));
This will consume a few more microseconds, which will be lost in background noise since the disk I/O will take much more time to complete the write.
Totally not cool, though.

Read .part files and concatenate them all

So I am writing my own custom FTP client for a school project. I managed to get everything to work with the swarming FTP client and am down to one last small part...reading the .part files into the main file. I need to do two things. (1) Get this to read each file and write to the final file properly (2) The command to delete the part files after I am done with each one.
Can someone please help me to fix my concatenate function I wrote below? I thought I had it right to read each file until the EOF and then go on to the next.
In this case *numOfThreads is 17. Ended up with a file of 4742442 bytes instead of 594542592 bytes. Thanks and I am happy to provide any other useful information.
EDIT: Modified code for comment below.
std::string s = "Fedora-15-x86_64-Live-Desktop.iso";
std::ofstream out;
out.open(s.c_str(), std::ios::out);
for (int i = 0; i < 17; ++i)
{
std::ifstream in;
std::ostringstream convert;
convert << i;
std::string t = s + ".part" + convert.str();
in.open(t.c_str(), std::ios::in | std::ios::binary);
int size = 32*1024;
char *tempBuffer = new char[size];
if (in.good())
{
while (in.read(tempBuffer, size))
out.write(tempBuffer, in.gcount());
}
delete [] tempBuffer;
in.close();
}
out.close();
return 0;
Almost everything in your copying loop has problems.
while (!in.eof())
This is broken. Not much more to say than that.
bzero(tempBuffer, size);
This is fairly harmless, but utterly pointless.
in.read(tempBuffer, size);
This the "almost" part -- i.e., the one piece that isn't obviously broken.
out.write(tempBuffer, strlen(tempBuffer));
You don't want to use strlen to determine the length -- it's intended only for NUL-terminated (C-style) strings. If (as is apparently the case) the data you read may contain zero-bytes (rather than using zero-bytes only to signal the end of a string), this will simply produce the wrong size.
What you normally want to do is a loop something like:
while (read(some_amount) == succeeded)
write(amount that was read);
In C++ that will typically be something like:
while (infile.read(buffer, buffer_size))
outfile.write(buffer, infile.gcount());
It's probably also worth noting that since you're allocating memory for the buffer using new, but never using delete, your function is leaking memory. Probably better to do without new for this -- an array or vector would be obvious alternatives here.
Edit: as for why while (infile.read(...)) works, the read returns a reference to the stream. The stream in turn provides a conversion to bool (in C++11) or void * (in C++03) that can be interpreted as a Boolean. That conversion operator returns the state of the stream, so if reading failed, it will be interpreted as false, but as long as it succeeded, it will be interpreted as true.

Object loading segfault under GCC

These methods are supposed to save and load the entirety of the object they're associated with. When I compile the program under Linux through gcc, the save seems to work but it segfaults when loading. When I compile it under Windows through the Visual Studio compiler, it works like a dream. I am not sure what the differences are, but I've got a hunch that it involves some gcc oddity.
The two methods:
void User::SaveToFile()
{
ofstream outFile;
string datafile_name = username + "_data";
outFile.open(datafile_name.c_str(), ios::binary);
outFile.write((char*)this, sizeof(*this));
}
void User::LoadFromFile(string filename)
{
ifstream inFile;
inFile.open(filename.c_str(), ios::binary);
inFile.read((char*)this, sizeof(*this));
}
The declaration:
class User
{
private:
string username;
string realname;
string password;
string hint;
double gpa;
vector<Course> courses;
public:
double PredictGPA();
void ChangePassword();
void SaveToFile();
void LoadFromFile(string filename);
void SetUsername(string _username){username = _username;}
string GetUsername(){return username;}
void SetRealname(string _realname){realname = _realname;}
string GetRealname(){return realname;}
void SetPass(string _password){password = _password;}
string GetPass(){return password;}
void SetHint(string _hint){hint = _hint;}
string GetHint(){return hint;}
};
Your class User is not a POD type, its not a Plain Old Data type (as C structs are). You cannot just read and write its memory bitwise and expect it to work. Both string and vector are not PODs, they keep pointers to their dynamically allocated data. When reading those back, attempts to access invalid memory will result in a segfault. What's more, the contents of both the string and vector are not actually being saved at all, since they are not within the memory layout of the object (it may work sometimes with string with SBO, but its just but chance and still undefined to do it).
You would need a way to serialize and deserialize your class; your class can't magically become an object when you read it in like that.
Instead you would need to supply to functions that you call when loading/saving your class that store the class in some format of your choosing e.g. XML.
so instead of
outFile.write((char*)this, sizeof(*this));
have some member function to convert it to a string with some format that you easily can parse when you load it (or some binary format whatever you find easier), then save it.
outFile.write(this->myserialize(), mysize);
You can't write into string like that. For one thing it usually stores its data dynamically, i.e. not inside the object at all, and for another you shall not rely on any particular layout of it.
There are similar issues with vectors, and you don't appear to have considered endianness and padding at all.
Put simply, you're making assumptions that do not hold.
In general, do not mess with complex (non-POD) objects on the byte level. Serialise with some text format instead, using the objects' public member functions to extract and restore their state.
Have you considered JSON?
Things like strings etc may contain pointers - in which case your method can go horribly wrong.
You need to serialise the data - I.e. convert it to a series of bytes.
Then when reading the data you just read the bytes and then create the object from that. The new pointers will be correct.
If you stay with this route I would write the length of the string instead of null terminating it. Easier to allocated on loading. There is alot to consider in a binary format. Each field should have some type of ID so it can be found if in wrong spot or a different version of your program. Also at the beginning of your file write what endianess you are using and the size of your integers etc. Or decide a standard size and endianess for everything. I use to write code like this all the time for networking and file storage. There are much better modern approaches. Also consider using a buffer and creating Serialize() function.
Good modern alternatives include :SQLite3, XML, JSON
Untested Example:
class object
{
Load()
{
ifstream inFile;
int size;
inFile.open("filename", ios::binary);
inFile.read(&size, 4);
stringA.resize(size);
inFile.read(&stringA[0], size);
inFile.read(&size, 4);
stringB.resize(size);
inFile.read(&stringB[0], size);
inFile.close(); //don't forget to close your files
}
Save()
{
ofstream outFile;
int size;
outFile.open("filename", ios::binary);
size = stringA.size();
outFile.write(&size, 4);
outFile.write(&stringA[0], size);
size = stringB.size();
outFile.write(&size, 4);
outFile.write(&stringA[0], size);
outFile.close();
}
private:
std::string stringA
std::string stringB
};

Parsing binary data from file

and thank you in advance for your help!
I am in the process of learning C++. My first project is to write a parser for a binary-file format we use at my lab. I was able to get a parser working fairly easily in Matlab using "fread", and it looks like that may work for what I am trying to do in C++. But from what I've read, it seems that using an ifstream is the recommended way.
My question is two-fold. First, what, exactly, are the advantages of using ifstream over fread?
Second, how can I use ifstream to solve my problem? Here's what I'm trying to do. I have a binary file containing a structured set of ints, floats, and 64-bit ints. There are 8 data fields all told, and I'd like to read each into its own array.
The structure of the data is as follows, in repeated 288-byte blocks:
Bytes 0-3: int
Bytes 4-7: int
Bytes 8-11: float
Bytes 12-15: float
Bytes 16-19: float
Bytes 20-23: float
Bytes 24-31: int64
Bytes 32-287: 64x float
I am able to read the file into memory as a char * array, with the fstream read command:
char * buffer;
ifstream datafile (filename,ios::in|ios::binary|ios::ate);
datafile.read (buffer, filesize); // Filesize in bytes
So, from what I understand, I now have a pointer to an array called "buffer". If I were to call buffer[0], I should get a 1-byte memory address, right? (Instead, I'm getting a seg fault.)
What I now need to do really ought to be very simple. After executing the above ifstream code, I should have a fairly long buffer populated with a number of 1's and 0's. I just want to be able to read this stuff from memory, 32-bits at a time, casting as integers or floats depending on which 4-byte block I'm currently working on.
For example, if the binary file contained N 288-byte blocks of data, each array I extract should have N members each. (With the exception of the last array, which will have 64N members.)
Since I have the binary data in memory, I basically just want to read from buffer, one 32-bit number at a time, and place the resulting value in the appropriate array.
Lastly - can I access multiple array positions at a time, a la Matlab? (e.g. array(3:5) -> [1,2,1] for array = [3,4,1,2,1])
Firstly, the advantage of using iostreams, and in particular file streams, relates to resource management. Automatic file stream variables will be closed and cleaned up when they go out of scope, rather than having to manually clean them up with fclose. This is important if other code in the same scope can throw exceptions.
Secondly, one possible way to address this type of problem is to simply define the stream insertion and extraction operators in an appropriate manner. In this case, because you have a composite type, you need to help the compiler by telling it not to add padding bytes inside the type. The following code should work on gcc and microsoft compilers.
#pragma pack(1)
struct MyData
{
int i0;
int i1;
float f0;
float f1;
float f2;
float f3;
uint64_t ui0;
float f4[64];
};
#pragma pop(1)
std::istream& operator>>( std::istream& is, MyData& data ) {
is.read( reinterpret_cast<char*>(&data), sizeof(data) );
return is;
}
std::ostream& operator<<( std::ostream& os, const MyData& data ) {
os.write( reinterpret_cast<const char*>(&data), sizeof(data) );
return os;
}
char * buffer;
ifstream datafile (filename,ios::in|ios::binary|ios::ate);
datafile.read (buffer, filesize); // Filesize in bytes
you need to allocate a buffer first before you read into it:
buffer = new filesize[filesize];
datafile.read (buffer, filesize);
as to the advantages of ifstream, well it is a matter of abstraction. You can abstract the contents of your file in a more convenient way. You then do not have to work with buffers but instead can create the structure using classes and then hide the details about how it is stored in the file by overloading the << operator for instance.
You might perhaps look for serialization libraries for C++. Perhaps s11n might be useful.
This question shows how you can convert data from a buffer to a certain type. In general, you should prefer using a std::vector<char> as your buffer. This would then look like this:
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
int main() {
std::ifstream input("your_file.dat");
std::vector<char> buffer;
std::copy(std::istreambuf_iterator<char>(input),
std::istreambuf_iterator<char>(),
std::back_inserter(buffer));
}
This code will read the entire file into your buffer. The next thing you'd want to do is to write your data into valarrays (for the selection you want). valarray is constant in size, so you have to be able to calculate the required size of your array up-front. This should do it for your format:
std::valarray array1(buffer.size()/288); // each entry takes up 288 bytes
Then you'd use a normal for-loop to insert the elements into your arrays:
for(int i = 0; i < buffer.size()/288; i++) {
array1[i] = *(reinterpret_cast<int *>(buffer[i*288])); // first position
array2[i] = *(reinterpret_cast<int *>(buffer[i*288]+4)); // second position
}
Note that on a 64-bit system this is unlikely to work as you expect, because an integer would take up 8 bytes there. This question explains a bit about C++ and sizes of types.
The selection you describe there can be achieved using valarray.