Defining the structure of a binary file in C++ 11 - c++

Since I have to work with files in binary a lot, I would like to have a more abstract way to do that, I have have to perform the same loop over and over:
write an header
write different kind of chunks ( with different set of values ) in a given order
write an optional closing header
Now I would like to break down this problem in small building blocks, imagine if I can write something like what the DTD is for the XML, a definition of what can possibly be in after a given chunk or inside a given semantic, so I can think about my files in terms of building blocks instead of hex values or something like that, also the code will be much more "idiomatic" and less cryptic.
In the end, there something in the language that can help me with binary files from this prospective ?

I'm not sure about C++11 specific features, but for C++ in general, streams make file I/O much easier to work with. You can overload the stream insertion (<<) and stream extraction (>>) operators to accomplish your goals. If you're not very familiar with operator overloading, chapter 9 of this site, which explains it well, along with numerous examples. Here's the particular page for overloading the << and >> operators in the context of streams.
Allow me to illustrate what I mean. Suppose we define a few classes:
BinaryFileStream - which represents the file you are trying to write to and (possibly) read from.
BinaryFileStreamHeader - which represents the file header.
BinaryFileStreamChunk - which represents one chunk.
BinaryFileStreamClosingHeader - which represents the closing header.
Then, you can overload the stream insertion and extraction operators in your BinaryFileStream to write and read the file (or any other istream or ostream).
...
#include <iostream> // I/O stream definitions, you can specify your overloads for
// ifstream and ofstream, but doing so for istream and ostream is
// more general
#include <vector> // For holding the chunks
class BinaryFileStream
{
public:
...
// Write binary stream
friend const std::ostream& operator<<( std::ostream& os, const BinaryFileStream& bfs )
{
// Write header
os << bfs.mHeader;
// write chunks
std::vector<BinaryFileStreamChunk>::iterator it;
for( it = bfs.mChunks.begin(); it != bfs.mChunks.end(); ++it )
{
os << (*it);
}
// Write Closing Header
os << bfs.mClosingHeader;
return os;
}
...
private:
BinaryFileStreamHeader mHeader;
std::vector<BinaryFileStreamChunk> mChunks;
BinaryFileStreamClosingHeader mClosingHeader;
};
All you must do then, is have operator overloads for your BinaryFileStreamHeader, BinaryFileStreamChunk and BinaryFileStreamClosingHeader classes that convert their data into the appropriate binary representation.
You can overload the stream extraction operator (>>) in an analogous way, though some extra work may be required for parsing.
Hope this helps.

Related

C++: Unwanted Conversions between ostream and ofstream

I've been working on a personal dictionary application which can help you remembering words you learnt. It is operated via the CLI (just don't question this, it's kinda just a test and I got a weird passion for CLI apps). So, of course I am using ostreams for writing information on the CLI. I am used to write operator<< overloads (for ostreams) for every class so that I can build up a multi-level output system (basically every object can "speak" for itself).
In order to persist a dictionary object, I wanted to use ofstream and write a file with it. Naturally, I wrote operator<< overloads also for ofstream and in the same "layered" structure.
As a result, I have now two operator<< overloads in every class, like in "Dictionary":
ostream& operator<<(ostream&, const Dictionary&);
ofstream& operator<<(ofstream&, const Dictionary&);
(this is just the declaration in the header file)
Notice that it is very important that these both overloads do different things. I don't want to have some weird persistence-oriented special-format text on the CLI and also not user-friendly plain text in my file.
The problem is that, because of the inheritance structure of ostream and ofstream, ofstream is sometimes implicitely converted to ostream. And when this happens in the middle of my stack full of file output operations, the program suddenly jumps into the wrong overload and prints plain text in the file.
My question is simply: Is there a way to avoid or revert these unwanted implicit conversions in order to let my program jump into the right overloads? Or is there any other good way to fix this problem?
EDIT 1:
Someone pointed out in the comments that this is not an implicit converison. ofstream is sometimes "seen" as its base class ostream. The problem is that at some point the object "forgets" that it is an ofstream and loses all file-related information. From there on it is only an ostream and that's what I meant with the "conversion".
EDIT 2:
The exact point in the program where the "unwanted conversion" happens can be found here:
ofstream& operator<<(ofstream& of, const Section& s) {
return s.print_ofstream(of);
}
So this operator overoad calls "print_ofstream":
ofstream& Section::print_ofstream(ofstream& of) const {
of << "sec" << Util::ID_TO_STRING(section_id) << ":\n";
for (pair<Wordlist, Wordlist> pwl : translations) {
of << '{' << pwl.first << '=' << pwl.second << "}\n";
}
of << "#\n";
return of;
}
Note that "pwl" is a pair of two Wordlists, therefore pwl.first / pwl.second is a Wordlist. So, normally the line of << '{' << pwl.first << '=' << pwl.second << "}\n"; should call the ofstream operator<< overload in Wordlist. But it doesn't. Instead, the other overload method is called:
ostream& operator<<(ostream& o, const Wordlist& wl) {
return wl.print_ostream(o);
}
You have overloaded only the specific operator<< needed for streaming Dictionary, Section, Wordlist, etc objects to a std::ofstream, but std::ofstream inherits MANY other operator<<s from std::ostream, and those operators all take ostream& as input and return ostream& as output. So, for example, of << "sec" will return ostream& even though of is a std::ofstream, and then that ostream& is used for subsequent << calls until ; is reached. Those are the "implicit conversions" you are experiencing.
The real question is, WHY do you want operator<< to output different data depending on the type of std::ostream being written to? That goes against C++'s streaming model. If you really want that, you would have to change print_ofstream(ofstream&) to print_ostream(ostream&) and then dynamically detect the actual std::ostream derived type using dynamic_cast. Same with Wordlist, and any other classes that need it.
A simpler and safer option would be to just store a flag inside of your classes to control how their data should be output, regardless of the type of std::ostream being used. Then you can set that flag as needed. Maybe even define some helper I/O manipulators to set those flags while making << calls.

Are ifstream/ofstream really used for serialization?

I am using ifstream and and ostream to serialize my data but I am surprised to discover the `<<' operator can't seperate two adjacent strings and seperating them would be quite complicated.
class Name
{
string first_name;
string last name;
friend std::ostream& operator<< (std::ostream& os, const Name& _name)
{
os << _name.first_name << _name.last_name;
return os;
}
friend std::istream& operator>> (std::istream& is, Name& _name)
{
is >> _name.first_name >> _name.last_name;
return is;
}
This doesn't work because << and >> doesn't write null terminator characters and ifstream reads the whole string in variable (first_name) which is kinda disappointing. How can I store the two strings separately so I can read them separately as well? I don't understand what is the motivation of << concatenating all the strings in ostream so we can't read them back seperatly!?
I don't understand what is the motivation of << concatenating all the strings in ostream so we can't read them back seperatly!?
This assumes that the only reason to write them separately is to read them as individual strings. Consider the case where someone has a pair of strings that they want to write to a stream without separators. Or a string followed by a float that they don't want separators for.
If ostreams automatically inserted separators for every << output, then it would be much harder for someone to write text without separators. They'd have to manually concatenate these strings and/or values into a single string, then output that.
And what would they use for this concatenation? They can't use ostringstream like you normally might, because it uses the same facilities as ofstream. So every << would put a separator character in the stream.
In short, the IO streams API writes what you told it to write, not what you may or may not "want" to write. It's not a serialization API; C++ isn't C# or Java. If you want serious serialization features, use Boost.Serialization.
Often times you want to concatenate strings with ostream (commonly stringstream). If you specifically don't want them concatenated it's easy enough to do:
os << _name.first_name << '\n' << _name.last_name;
ifstream and ofstream basically are streams, so they have nothing to indicate limit of data in them. Think about them as a river, all data can read from or write to them. This is true nature of files, so if you need them for serialization you must implement your serialization mechanism or use a library that designed for this purpose like boost::serialization. In C++ every thing implemented as is, and because of this you can gain maximum performance!! :)

Binary stream or something like this to save class like std::vector to a file?

I'm not good in IOstream library since I have accustom to stdio and stuff life this, however I got a problem I hoped to be solved in IOstream but I find that it probably not. So I'm quite new to standard C++ libraries but quite well with C++ OOP/Classes and so on.
So I can't use code like
printf (stream, "...", C);
if C is of an aggregate type because I can't create new format string options like %mytupe. Also I can't expect proper behavior of
fwrite/fread (&C, sizeof(C), 1, stream)
if T contains fields that are pointers because fwrite/fread will save/load value of a pointer but not a value stored in memory where the pointer refers to:
class MyClass
{...
private:
{typename} Tp* Data;
} C;
I don't care much of first limit because I can write a function that convert object of each of my class to a text string, it works even if but the last can't be solved easily. For example, I tried to create a function that save each class to binary file but I got a lot of problems with staff like luck of partial specialization of a template and so on (mo matter).
Being tired of making bugs and mistakes while rewriting standard code (like own string and file holder classes) I hoped that learning (at last!) of standard (written by clever people and well-tested :) library will help me since I read a lot that standard C++ library solve first issue with using of streams. I can overload operator << and operator >> or so on to be sure that my class will be saved to or read from text file properly. But what about binary files which is much much more important for me?
What should I do if I want to save an object of class like vector, for example, to the binary file? Using of << and >> fails at all since it says that vector has no operators << and >> overloaded, but even if it had it would produce text data.
Staff like
vector <MyClass> V;
...
ofstream file ("file.bin", ios::binary);
int size1 = ;
file.write((const char*)&V.size(), sizeof(V.size()));
file.write((const char*)&V[0], V.size() * sizeof(MyClass));
is not suitable (and doesn't differs much from using of fwrite) since it saves value (address) of pointer field but not the data stored there (also, what if I declare a "two-dimension" vector as vector > ??). So, if there was overloading of vector operator << like
template <class T> vector
{public:
...
ostream operator << () const
{ostream s;
for (uint32_t k = 0; k < size(); k++)
s << s << this->operator[] (k);
return s;
}
private:
T* Data;
};
and if each T::operator << was overloaded too in the same way (for MyClass - to provide stream of data stored in MyCLass::Tp) it was saved.
(I know, I know, there should be iterator, but maybe I made a more serious mistake because of total misunderstanding of streams? Anyway just I'm talking about idea.)
Well, it is a way to convert data to text, not to got binary data as it is stored in memory, but I know there can be written an interface to work with binary data in the same way (maybe not using << and >> but function names, but it can be for sure)! The questing is: was it done in standard C++ library or somewhere else (another opensource library for C++)? Yes, yes, to properly write a vector to file in one line. (I'll be very surprised if it is not included into standard C++ because how do people save data they work to files if they want to use multidimension dynamic arrays?)
You're looking for the term "serialization", and you might want to use the Boost::Serialization library for that purpose.

How to write an object to file in C++

I have an object with several text strings as members. I want to write this object to the file all at once, instead of writing each string to file. How can I do that?
You can override operator>> and operator<< to read/write to stream.
Example Entry struct with some values:
struct Entry2
{
string original;
string currency;
Entry2() {}
Entry2(string& in);
Entry2(string& original, string& currency)
: original(original), currency(currency)
{}
};
istream& operator>>(istream& is, Entry2& en);
ostream& operator<<(ostream& os, const Entry2& en);
Implementation:
using namespace std;
istream& operator>>(istream& is, Entry2& en)
{
is >> en.original;
is >> en.currency;
return is;
}
ostream& operator<<(ostream& os, const Entry2& en)
{
os << en.original << " " << en.currency;
return os;
}
Then you open filestream, and for each object you call:
ifstream in(filename.c_str());
Entry2 e;
in >> e;
//if you want to use read:
//in.read(reinterpret_cast<const char*>(&e),sizeof(e));
in.close();
Or output:
Entry2 e;
// set values in e
ofstream out(filename.c_str());
out << e;
out.close();
Or if you want to use stream read and write then you just replace relevant code in operators implementation.
When the variables are private inside your struct/class then you need to declare operators as friend methods.
You implement any format/separators that you like. When your string include spaces use getline() that takes a string and stream instead of >> because operator>> uses spaces as delimiters by default. Depends on your separators.
It's called serialization. There are many serialization threads on SO.
There are also a nice serialization library included in boost.
http://www.boost.org/doc/libs/1_42_0/libs/serialization/doc/index.html
basically you can do
myFile<<myObject
and
myFile>>myObject
with boost serialization.
If you have:
struct A {
char a[30], b[25], c[15];
int x;
}
then you can write it all just with write(fh, ptr, sizeof(struct a)).
Of course, this isn't portable (because we're not saving the endieness or size of "int," but that may not be an issue for you.
If you have:
struct A {
char *a, *b, *c;
int d;
}
then you're not looking to write the object; you're looking to serialize it. Your best bet is to look in the Boost libraries and use their serialization routines, because it's not an easy problem in languages without reflection.
There's not really a simple way, it's C++ after all, not PHP, or JavaScript.
http://www.parashift.com/c++-faq-lite/serialization.html
Boost also has some library for it: http://www.boost.org/doc/libs/release/libs/serialization ... like Tronic already mentioned :)
The better method is to write each field individually along with the string length.
As an alternative, you can create a char array (or std::vector<char>) and write all the members into the buffer, then write the buffer to the output.
The underlying thorn is that a compiler is allowed to insert padding between members in a class or structure. Use memcpy or std::copy will result in padding bytes written to the output.
Just remember that you need to either write the string lengths and the content or the content followed by some terminating character.
Other people will suggest checking out the Boost Serialization library.
Unfortunately that is generally not quite possible. If your struct only contains plain data (no pointers or complex objects), you can store it as a one chunk, but care must be taken if portability is an issue. Padding, data type size and endianess issues make this problematic.
You can use Boost.Serialization to minimize the amount of code required for proper portable and versionable searialization.
Assuming your goal is as stated, to write out the object with a single call to write() or fwrite() or whatever, you'd first need to copy the string and other object data into a single contiguous block of memory. Then you could write() that block of memory out with a single call. Or you might be able to do a vector-write by calling writev(), if that call is available on your platform.
That said, you probably won't gain much by reducing the number of write calls. Especially if you are using fwrite() or similar already, then the C library is already doing buffering for you, so the cost of multiple small calls is minimal anyway. Don't put yourself through a lot of extra pain and code complexity unless it will actually do some good...

When should I concern myself with std::iostream::sentry?

Online references have rather brief and vague descriptions on the purpose of std::iostream::sentry. When should I concern myself with this little critter? If it's only intended to be used internally, why make it public?
It's used whenever you need to extract or output data with a stream. That is, whenever you make an operator>>, the extraction operator, or operator<<, the insertion operator.
It's purpose is to simplify the logic: "Are any fail bits set? Synchronize the buffers. For input streams, optionally get any whitespace out of the way. Okay, ready?"
All extraction stream operators should begin with:
// second parameter to true to not skip whitespace, for input that uses it
const std::istream::sentry ok(stream, icareaboutwhitespace);
if (ok)
{
// ...
}
And all insertion stream operators should begin with:
const std::ostream::sentry ok(stream);
if (ok)
{
// ...
}
It's just a cleaner way of doing (something similar to):
if (stream.good())
{
if (stream.tie())
stream.tie()->sync();
// the second parameter
if (!noskipwhitespace && stream.flags() & ios_base::skipws)
{
stream >> std::ws;
}
}
if (stream.good())
{
// ...
}
ostream just skips the whitespace part.
Most people will never write any code that needs to deal with creating sentry objects. A sentry object is needed when/if you extract data from (or insert it into) the stream buffer that underlies the stream object itself.
As long as your insertion/extraction operator uses other iostream members/operators to do its work, it does not have to deal with creating a sentry object (because those other iostream operators will create and destroy sentry objects as needed).
Formatted input for anything but the basic types (int, double, etc.) doesn't make a lot of sense, and arguably only from them when taken from a non-interactive stream such as an istringstream. So you should probably not be implementing op>> in the first place, and thus not have to worry about sentry objects.