Strings to binary files - c++

My problem goes like this: I have a class called 'Register'. It has a string attribute called 'trainName' and its setter:
class Register {
private:
string trainName;
public:
string getTrainName();
};
As a matter of fact, it is longer but I want to make this simpler.
In other class, I copy several Register objects into a binary file, previously setting trainName.
Register auxRegister = Register();
auxRegister.setName("name");
for(int i = 0; i < 10; i++) {
file.write(reinterpret_cast<char*>(&auxRegister),sizeof(Register));
}
Later on, I try to retrieve the register from the binary file:
Register auxRegister = Register();
while(!file.eof()) { //I kwnow this is not right. Which is the right way?
file.read(reinterpret_cast<char*>(&auxRegister), sizeof(Register));
}
It occurs it does not work. Register does, in fact, have more attributes (they are int) and I retrieve them OK, but it's not the case with the string.
Am I doing something wrong? Should I take something into consideration when working with binary files and strings?
Thank you very much.

The std::string class contains a pointer to a buffer where the string is stored (along with other member variables). The string buffer itself is not a part of the class. So writing out the contents of an instance of the class is not going to work, since the string will never be part of what you dump into the file, if you do it that way. You need to get a pointer to the string and write that.
Register auxRegister = Register();
auxRegister.setName("name");
auto length = auxRegister.size();
for(int i = 0; i < 10; i++) {
file.write( auxRegister.c_str(), length );
// You'll need to multiply length by sizeof(CharType) if you
// use a wstring instead of string
}
Later on, to read the string, you'll have to keep track of the number of bytes that were written to the file; or maybe fetch that information from the file itself, depending on the file format.
std::unique_ptr<char[]> buffer( new char[length + 1] );
file.read( buffer, length );
buffer[length] = '\0'; // NULL terminate the string
Register auxRegister = Register();
auxRegister.setName( buffer );

You cannot write string this way, as it almost certainly contains pointers to some structs and other binary stuff that cannot be serialized at all.
You need to write your own serializing function, and write the string length + bytes (for example) or use complete library, for example, protobuf, which can solve serializing problem for you.
edit: see praetorian's answer. much better than mine (even with lower score at time of this edit).

Related

C++: Overwrite std::string in Cache

I got a string variable (contains passphrase) and would like to overwrite it's value with a sequence of '0' before the variable is released. I tought about doing something like:
void overwrite(std::string &toOverwrite){
if(toOverwrite.empty())
return;
else{
std::string removeString;
size_t length = toOverwrite.size();
for(int i = 0; i < length; i++){
removeString += "0";
}
toOverwrite = removeString;
}
}
But somehow this doesn't feel right.
First because it seems to produce much overhead in the for loop.
Moreover I'm not sure if the last line would really overwrite the string. I know that e.g. in Java strings are immutable and therefore can not be overwritten at all. They are not immutable in C++ (at least not std::string) but would toOverwrite = removeString really replace toOverwrite or just make that the "pointer" of toOverwrite will point to removeString?
Is it possible that my compiler will optimize the code and removes this overwriting?
Maybe I should use the std::string::replace method or change the datatype to char* / byte[]?
Chances are that will just swap and free pointers, leaving the passphrase somewhere in memory which is no longer pointed to. If you want to overwrite the string data, do:
std::fill(toOverwrite.begin(), toOverwrite.end(), '0');
And you don't need a test for an empty string either.

Writing struct of vector to a binary file in c++

I have a struct and I would like to write it to a binary file (c++ / visual studio 2008).
The struct is:
struct DataItem
{
std::string tag;
std::vector<int> data_block;
DataItem(): data_block(1024 * 1024){}
};
I am filling tha data_block vector with random values:
DataItem createSampleData ()
{
DataItem data;
std::srand(std::time(NULL));
std::generate(data.data_block.begin(), data.data_block.end(), std::rand);
data.tag = "test";
return data;
}
And trying to write the struct to file:
void writeData (DataItem data, long fileName)
{
ostringstream ss;
ss << fileName;
string s(ss.str());
s += ".bin";
char szPathedFileName[MAX_PATH] = {0};
strcat(szPathedFileName,ROOT_DIR);
strcat(szPathedFileName,s.c_str());
ofstream f(szPathedFileName, ios::out | ios::binary | ios::app);
// ******* first I tried to write this way then one by one
//f.write(reinterpret_cast<char *>(&data), sizeof(data));
// *******************************************************
f.write(reinterpret_cast<const char *>(&data.tag), sizeof(data.tag));
f.write(reinterpret_cast<const char *>(&data.data_block), sizeof(data.data_block));
f.close();
}
And the main is:
int main()
{
DataItem data = createSampleData();
for (int i=0; i<5; i++) {
writeData(data,i);
}
}
So I expect a file size at least (1024 * 1024) * 4 (for vector)+ 48 (for tag) but it just writes the tag to the file and creates 1KB file to hard drive.
I can see the contents in while I'm debugging but it doesn't write it to file...
What's wrong with this code, why can't I write the strcut to vector to file? Is there a better/faster or probably efficient way to write it?
Do I have to serialize the data?
Thanks...
Casting a std::string to char * will not produce the result you expect. Neither will using sizeof on it. The same for a std::vector.
For the vector you need to use either the std::vector::data method, or using e.g. &data.data_block[0]. As for the size, use data.data_block.size() * sizeof(int).
Writing the string is another matter though, especially if it can be of variable length. You either have to write it as a fixed-length string, or write the length (in a fixed-size format) followed by the actual string, or write a terminator at the end of the string. To get a C-style pointer to the string use std::string::c_str.
Welcome to the merry world of C++ std::
Basically, vectors are meant to be used as opaque containers.
You can forget about reinterpret_cast right away.
Trying to shut the compiler up will allow you to create an executable, but it will produce silly results.
Basically, you can forget about most of the std::vector syntactic sugar that has to do with iterators, since your fstream will not access binary data through them (it would output a textual representation of your data).
But all is not lost.
You can access the vector underlying array using the newly (C++11) introduced .data() method, though that defeats the point of using an opaque type.
const int * raw_ptr = data.data_block.data();
that will gain you 100 points of cool factor instead of using the puny
const int * raw_ptr = &data.data_block.data[0];
You could also use the even more cryptic &data.data_block.front() for a cool factor bonus of 50 points.
You can then write your glob of ints in one go:
f.write (raw_ptr, sizeof (raw_ptr[0])*data.data_block.size());
Now if you want to do something really too simple, try this:
for (int i = 0 ; i != data.data_block.size() ; i++)
f.write (&data.data_block[i], sizeof (data.data_block[i]));
This will consume a few more microseconds, which will be lost in background noise since the disk I/O will take much more time to complete the write.
Totally not cool, though.

Use C++ strings in file handling

How to use C++ strings in file handling? I created a class that had C++ string as one of its private data members but that gave an error while reading from the file even if I am not manipulating with it at the moment and was initialised with default value in constructor. There is no problem while writing to the file. It works fine if I use C string instead but I don't want to. Is there a way to solve this?
class budget
{
float balance;
string due_name,loan_name; //string objects
int year,month;
float due_pay,loan_given;
public:
budget()
{
balance=0;
month=1;
due_name="NO BODY"; //default values
loan_name="SAFE";
year=0;
balance = 0;
due_pay=0;
loan_given=0;
}
.
.
.
};
void read_balance() //PROBLEM AFTER ENTERING THIS FUNCTION
{
system("cls");
budget b;
ifstream f1;
f1.open("balance.dat",ios::in|ios::binary);
while(f1.read((char*)&b,sizeof(b)))
{ b.show_data();
}
system("cls");
cout<<"No More Records To Display!!";
getch();
f1.close();
}
String is non-POD data-type. You cannot read/write from/in string by read/write functions.
basic_istream<charT,traits>& read(char_type* s, streamsize n);
30 Effects: Behaves as an unformatted input function (as described in
27.7.2.3, paragraph 1). After constructing a sentry object, if !good() calls setstate(failbit) which may throw an exception, and return.
Otherwise extracts characters and stores them into successive
locations of an array whose first element is designated by s.323
Characters are extracted and stored until either of the following
occurs: — n characters are stored; — end-of-file occurs on the input
sequence (in which case the function calls setstate(failbit | eofbit),
which may throw ios_base::failure (27.5.5.4)). 31 Returns: *this.
There is nothing about, how members of std::string placed. Look at, or use boost::serialiation. http://www.boost.org/doc/libs/1_50_0/libs/serialization/doc/index.html And of course you can write size of string and then write data and when read - read size, allocate array of this size, read data in this array and then create string. But use boost is better.
While reading the string members (due_name,loan_name) of your class budget your code literally fills them byte by byte. While it makes sense for floats and ints it won't work for strings.
Strings are designed to keep 'unlimited' amount of text, therefore their constructors, copy constructors, concatenations and so on must ensure to allocate the actual piece of memory to store the text and expand it if necessary (and delete upon destruction). Filling strings this way from disk will result in invalid pointers inside your string objects (not pointing to the actual memory which contains the text), actually no text will be actually read this way at all.
The easiest way to solve this is to not use C++ strings in that class. Work out the maximum length for each of the strings you will be storing, and make a char array that is one byte longer (to allow for the 0-terminator). Now you can read and write that class as binary without worrying about serialization etc.
If you don't want to do that, you cannot use iostream::read() on your class. You will need member functions that read/write to a stream. This is what serialization is about... But you don't need the complexity of boost. In basic terms, you'd do something like:
// Read with no error checking :-S
istream& budget::read( istream& s )
{
s.read( (char*)&balance, sizeof(balance) );
s.read( (char*)&year, sizeof(year) );
s.read( (char*)&month, sizeof(month) );
s.read( (char*)&due_pay, sizeof(due_pay) );
s.read( (char*)&loan_given, sizeof(loan_given) );
size_t length;
char *tempstr;
// Read due_name
s.read( (char*)&length, sizeof(length) );
tempstr = new char[length];
s.read( tempstr, length );
due_name.assign(tempstr, length);
delete [] tempstr;
// Read loan_name
s.read( (char*)&length, sizeof(length) );
tempstr = new char[length];
s.read( tempstr, length );
loan_name.assign(tempstr, length);
delete [] tempstr;
return s;
}
ostream& budget::write( ostream& s )
{
// etc...
}
Notice above that we've serialized the strings by writing a size value first, and then that many characters after.

Object loading segfault under GCC

These methods are supposed to save and load the entirety of the object they're associated with. When I compile the program under Linux through gcc, the save seems to work but it segfaults when loading. When I compile it under Windows through the Visual Studio compiler, it works like a dream. I am not sure what the differences are, but I've got a hunch that it involves some gcc oddity.
The two methods:
void User::SaveToFile()
{
ofstream outFile;
string datafile_name = username + "_data";
outFile.open(datafile_name.c_str(), ios::binary);
outFile.write((char*)this, sizeof(*this));
}
void User::LoadFromFile(string filename)
{
ifstream inFile;
inFile.open(filename.c_str(), ios::binary);
inFile.read((char*)this, sizeof(*this));
}
The declaration:
class User
{
private:
string username;
string realname;
string password;
string hint;
double gpa;
vector<Course> courses;
public:
double PredictGPA();
void ChangePassword();
void SaveToFile();
void LoadFromFile(string filename);
void SetUsername(string _username){username = _username;}
string GetUsername(){return username;}
void SetRealname(string _realname){realname = _realname;}
string GetRealname(){return realname;}
void SetPass(string _password){password = _password;}
string GetPass(){return password;}
void SetHint(string _hint){hint = _hint;}
string GetHint(){return hint;}
};
Your class User is not a POD type, its not a Plain Old Data type (as C structs are). You cannot just read and write its memory bitwise and expect it to work. Both string and vector are not PODs, they keep pointers to their dynamically allocated data. When reading those back, attempts to access invalid memory will result in a segfault. What's more, the contents of both the string and vector are not actually being saved at all, since they are not within the memory layout of the object (it may work sometimes with string with SBO, but its just but chance and still undefined to do it).
You would need a way to serialize and deserialize your class; your class can't magically become an object when you read it in like that.
Instead you would need to supply to functions that you call when loading/saving your class that store the class in some format of your choosing e.g. XML.
so instead of
outFile.write((char*)this, sizeof(*this));
have some member function to convert it to a string with some format that you easily can parse when you load it (or some binary format whatever you find easier), then save it.
outFile.write(this->myserialize(), mysize);
You can't write into string like that. For one thing it usually stores its data dynamically, i.e. not inside the object at all, and for another you shall not rely on any particular layout of it.
There are similar issues with vectors, and you don't appear to have considered endianness and padding at all.
Put simply, you're making assumptions that do not hold.
In general, do not mess with complex (non-POD) objects on the byte level. Serialise with some text format instead, using the objects' public member functions to extract and restore their state.
Have you considered JSON?
Things like strings etc may contain pointers - in which case your method can go horribly wrong.
You need to serialise the data - I.e. convert it to a series of bytes.
Then when reading the data you just read the bytes and then create the object from that. The new pointers will be correct.
If you stay with this route I would write the length of the string instead of null terminating it. Easier to allocated on loading. There is alot to consider in a binary format. Each field should have some type of ID so it can be found if in wrong spot or a different version of your program. Also at the beginning of your file write what endianess you are using and the size of your integers etc. Or decide a standard size and endianess for everything. I use to write code like this all the time for networking and file storage. There are much better modern approaches. Also consider using a buffer and creating Serialize() function.
Good modern alternatives include :SQLite3, XML, JSON
Untested Example:
class object
{
Load()
{
ifstream inFile;
int size;
inFile.open("filename", ios::binary);
inFile.read(&size, 4);
stringA.resize(size);
inFile.read(&stringA[0], size);
inFile.read(&size, 4);
stringB.resize(size);
inFile.read(&stringB[0], size);
inFile.close(); //don't forget to close your files
}
Save()
{
ofstream outFile;
int size;
outFile.open("filename", ios::binary);
size = stringA.size();
outFile.write(&size, 4);
outFile.write(&stringA[0], size);
size = stringB.size();
outFile.write(&size, 4);
outFile.write(&stringA[0], size);
outFile.close();
}
private:
std::string stringA
std::string stringB
};

Serializing struct containing char*

I'm getting an error with serializing a char* string error C2228: left of '.serialize' must have class/struct/union I could use a std::string and then get a const char* from it. but I require the char* string.
The error message says it all, there's no support in boost serialization to serialize pointers to primitive types.
You can do something like this in the store code:
int len = strlen(string) + 1;
ar & len;
ar & boost::serialization::make_binary_object(string, len);
and in the load code:
int len;
ar & len;
string = new char[len]; //Don't forget to deallocate the old string
ar & boost::serialization::make_binary_object(string, len);
There is no way to serialize pointer to something in boost::serialization (I suspect, there is no actual way to do that too). Pointer is just a memory address, these memory addresses are generally specific for instance of object, and, what's really important, this address doesn't contain information where to stop the serialization.
You can't just say to your serializer: "Hey, take something out from this pointer and serialize this something. I don't care what size does it have, just do it..."
First and the optimal solution for your problem is wrapping your char* using std::string or your own string implementation. The second would mean writing special serializing routine for char* and, I suspect, will generally do the same as the first method does.
Try this:
struct Example
{
int i;
char c;
char * text; // Prefer std::string to char *
void Serialize(std::ostream& output)
{
output << i << "\n";
output << c << "\n";
// Output the length of the text member,
// followed by the actual text.
size_t text_length = 0;
if (text)
(
text_length = strlen(text);
}
output << text_length << "\n";
output << text << "\n";
};
void Input(std::istream& input)
{
input >> i;
input.ignore(1000, '\n'); // Eat any characters after the integer.
input >> c;
input.ignore(1000, '\n');
// Read the size of the text data.
size_t text_length = 0;
input >> text_length;
input.ignore(1000, '\n');
delete[] text; // Destroy previous contents, if any.
text = NULL;
if (text_length)
{
text = new char[text_length];
input.read(text, text_length);
}
};
Since pointers are not portable, the data must be written instead.
The text is known as a variable length field. Variable length fields are commonly output (serialized) in two data structures: length followed by data OR data followed by terminal character. Specifying the length first allows usage of block reading. With the latter data structure, the data must be read one unit at a time until the terminal character is read. Note: the latter data structure also implies that the terminal character cannot be part of the set of data items.
Some important issue to think about for serialization:
1. Use a format that is platform independent, such as ASCII text for numbers.
2. If a platform method is not available or allowed, define the exact specification for numbers, including Endianness and maximum length.
3. For floating point numbers, the specification should treat the components of a floating point number as individual numbers that have to abide by the specification for a number (i.e. exponent, magnitude and mantissa).
4. Prefer fixed length records to variable length records.
5. Prefer serializing to a buffer. Users of the object can then create a buffer of one or more objects and write the buffer as one block (using one operation). Likewise for input.
6. Prefer using a database to serializing. Although this may not be possible for networking, try every effort to have a database manage the data. The database may be able to send the data over the network.