How to write an object to file in C++ - c++

I have an object with several text strings as members. I want to write this object to the file all at once, instead of writing each string to file. How can I do that?

You can override operator>> and operator<< to read/write to stream.
Example Entry struct with some values:
struct Entry2
{
string original;
string currency;
Entry2() {}
Entry2(string& in);
Entry2(string& original, string& currency)
: original(original), currency(currency)
{}
};
istream& operator>>(istream& is, Entry2& en);
ostream& operator<<(ostream& os, const Entry2& en);
Implementation:
using namespace std;
istream& operator>>(istream& is, Entry2& en)
{
is >> en.original;
is >> en.currency;
return is;
}
ostream& operator<<(ostream& os, const Entry2& en)
{
os << en.original << " " << en.currency;
return os;
}
Then you open filestream, and for each object you call:
ifstream in(filename.c_str());
Entry2 e;
in >> e;
//if you want to use read:
//in.read(reinterpret_cast<const char*>(&e),sizeof(e));
in.close();
Or output:
Entry2 e;
// set values in e
ofstream out(filename.c_str());
out << e;
out.close();
Or if you want to use stream read and write then you just replace relevant code in operators implementation.
When the variables are private inside your struct/class then you need to declare operators as friend methods.
You implement any format/separators that you like. When your string include spaces use getline() that takes a string and stream instead of >> because operator>> uses spaces as delimiters by default. Depends on your separators.

It's called serialization. There are many serialization threads on SO.
There are also a nice serialization library included in boost.
http://www.boost.org/doc/libs/1_42_0/libs/serialization/doc/index.html
basically you can do
myFile<<myObject
and
myFile>>myObject
with boost serialization.

If you have:
struct A {
char a[30], b[25], c[15];
int x;
}
then you can write it all just with write(fh, ptr, sizeof(struct a)).
Of course, this isn't portable (because we're not saving the endieness or size of "int," but that may not be an issue for you.
If you have:
struct A {
char *a, *b, *c;
int d;
}
then you're not looking to write the object; you're looking to serialize it. Your best bet is to look in the Boost libraries and use their serialization routines, because it's not an easy problem in languages without reflection.

There's not really a simple way, it's C++ after all, not PHP, or JavaScript.
http://www.parashift.com/c++-faq-lite/serialization.html
Boost also has some library for it: http://www.boost.org/doc/libs/release/libs/serialization ... like Tronic already mentioned :)

The better method is to write each field individually along with the string length.
As an alternative, you can create a char array (or std::vector<char>) and write all the members into the buffer, then write the buffer to the output.
The underlying thorn is that a compiler is allowed to insert padding between members in a class or structure. Use memcpy or std::copy will result in padding bytes written to the output.
Just remember that you need to either write the string lengths and the content or the content followed by some terminating character.
Other people will suggest checking out the Boost Serialization library.

Unfortunately that is generally not quite possible. If your struct only contains plain data (no pointers or complex objects), you can store it as a one chunk, but care must be taken if portability is an issue. Padding, data type size and endianess issues make this problematic.
You can use Boost.Serialization to minimize the amount of code required for proper portable and versionable searialization.

Assuming your goal is as stated, to write out the object with a single call to write() or fwrite() or whatever, you'd first need to copy the string and other object data into a single contiguous block of memory. Then you could write() that block of memory out with a single call. Or you might be able to do a vector-write by calling writev(), if that call is available on your platform.
That said, you probably won't gain much by reducing the number of write calls. Especially if you are using fwrite() or similar already, then the C library is already doing buffering for you, so the cost of multiple small calls is minimal anyway. Don't put yourself through a lot of extra pain and code complexity unless it will actually do some good...

Related

Need help in completing a function for sorting k sorted streams

I am working on my assignment and need help in completing the following function. I have been provided with the following signature:
void merge(const std::vector<istream>& inputStreams, ostream& o);
The function is supposed to take k integer streams as input and sort them and store the result in an ostream object.
I have completed the function definition but the problem is I cannot test the function by providing it the input (ie: vector of istream objects). If I try to pass the function a vector of istream objects, the compiler throws too many errors for me to debug.
Here is the function definition:
void merge( vector<istream>& inputStreams, ostream& o){
vector<long long int> input_vec;
long long int input_vec_size = inputStreams.size();
for(int i=0; i<input_vec_size;i++)
{
long long int temp;
while(inputStreams[i]>>temp)
{
input_vec.push_back(temp);
}
}
sort(input_vec.begin(),input_vec.end());
for(int i=0;i<input_vec.size();i++)
{
o<<input_vec[i];
}
}
And to pass a vector of istream objects i did the following:
int main()
{
//ifstream a1,a2,a3,a4;
filebuf fb1,fb2,fb3;
fb1.open("fb1.txt",ios::in);
fb2.open("fb2.txt",ios::in);
fb3.open("fb3.txt",ios::out);
istream a1(&fb1);
istream a2(&fb2);
ostream out(&fb3);
vector<istream> inp;
inp.push_back(a1);
inp.push_back(a2);
merge(inp,out);
}
can anyone help me?
For starters, it's fairly unusual to see the type istream being used as the actual type of an object. The reason for this is that istream is meant to be used as a base class, and those base classes are what more often get used. For example, you'll see variables of type istringstream or type ifstream much more regularly than just plain old istream. It's not wrong, per se, to have a variable that's an honest-to-goodness istream, but it is unusual.
Typically, if you wanted to work with a function that manipulated some sort of input stream, you'd structure it so that it either took in a reference to the istream or a pointer to the istream. That's the general C++ way of handling polymorphic types.
In your case, the fact that you're trying to use a vector<istream>, regardless of whether the code will compile or not, should therefore make you pause a minute to think about whether you're doing the right thing. It's entirely possible that, yes, you indeed do have a bunch of istream objects, and those objects aren't istringstreams or ifstreams. But more probably, what you were aiming to do here was say "I take in some list of input streams, and I don't really care what kind of input streams they are as long as they inherit from istream."
If that's what you're hoping to do, there are several ways you could address this. Perhaps the easiest is to change vector<istream> to vector<istream *> (or perhaps vector<shared_ptr<istream>>, depending on context). That would mean "I'd like to take as input a list of streams, and since I can't say for certain what specific type each of those streams will be, I'll just have the client give me pointers to each of them." That's going to require you to make some changes to your code so that, as you access the elements of the vector, you treat them as pointers rather than as actual, honest-to-goodness istream objects. For example, the line
while (inputStreams[i] >> temp) { ... }
might need to get rewritten as
while (*inputstreams[i] >> temp) { ... }
to explicitly dereference the pointer.
The other question you asked was how to test this code at all, and that's a separate step. Remember, it's fairly unusual to create objects of type istream, so you'd probably want to make either objects of type istringstream or ifstream. Here's an example of how you might make a few streams and then pass them into your function:
istringstream stream1("137 2718");
istringstream stream2("27 182 818");
istringstream stream3("3 14 15 92 653");
merge({ &stream1, &stream2, &stream3 }, cout);
Here, rather than declaring a local variable of type vector<istream *>, we just use a brace-initializer to say "please make me a vector out of these pointers."
From the sample code you've provided it looks like you want to read data from a bunch of files. Here's how you might do that. Rather than making filebuf objects and wrapping them in istreams, which is legal but fairly uncommon, we'll just use ifstream:
ifstream stream1("fb1.txt");
ifstream stream2("fb2.txt");
ifstream stream3("fb3.txt");
vector<istream *> inputs;
inputs.push_back(&stream1);
inputs.push_back(&stream2);
inputs.push_back(&stream3);
merge(inputs, cout);
Hope this helps!
istream is not copyable or moveable, hence you can't make a vector of istreams. Try using std::vector <std::istream *> instead (and modify your code accordingly).
Live demo: https://wandbox.org/permlink/20I2VQqsRI8ofaxP

Are ifstream/ofstream really used for serialization?

I am using ifstream and and ostream to serialize my data but I am surprised to discover the `<<' operator can't seperate two adjacent strings and seperating them would be quite complicated.
class Name
{
string first_name;
string last name;
friend std::ostream& operator<< (std::ostream& os, const Name& _name)
{
os << _name.first_name << _name.last_name;
return os;
}
friend std::istream& operator>> (std::istream& is, Name& _name)
{
is >> _name.first_name >> _name.last_name;
return is;
}
This doesn't work because << and >> doesn't write null terminator characters and ifstream reads the whole string in variable (first_name) which is kinda disappointing. How can I store the two strings separately so I can read them separately as well? I don't understand what is the motivation of << concatenating all the strings in ostream so we can't read them back seperatly!?
I don't understand what is the motivation of << concatenating all the strings in ostream so we can't read them back seperatly!?
This assumes that the only reason to write them separately is to read them as individual strings. Consider the case where someone has a pair of strings that they want to write to a stream without separators. Or a string followed by a float that they don't want separators for.
If ostreams automatically inserted separators for every << output, then it would be much harder for someone to write text without separators. They'd have to manually concatenate these strings and/or values into a single string, then output that.
And what would they use for this concatenation? They can't use ostringstream like you normally might, because it uses the same facilities as ofstream. So every << would put a separator character in the stream.
In short, the IO streams API writes what you told it to write, not what you may or may not "want" to write. It's not a serialization API; C++ isn't C# or Java. If you want serious serialization features, use Boost.Serialization.
Often times you want to concatenate strings with ostream (commonly stringstream). If you specifically don't want them concatenated it's easy enough to do:
os << _name.first_name << '\n' << _name.last_name;
ifstream and ofstream basically are streams, so they have nothing to indicate limit of data in them. Think about them as a river, all data can read from or write to them. This is true nature of files, so if you need them for serialization you must implement your serialization mechanism or use a library that designed for this purpose like boost::serialization. In C++ every thing implemented as is, and because of this you can gain maximum performance!! :)

encrypting and serializing stl string and other containers

I have data in stl containers (vector). Each node in the vector is a structure which also contains stl strings.
struct record
{
string name;
string location;
int salary;
}
vector< record > employees;
I want to serialize employees but I also want to encrypt it before serializing.
my encryption function looks like this:
Encode(const char * inBfr, const int in_size, char ** outBfr, int& out_size )
By searching it looks like the stl standard doesn't require the memory of my structure to be contiguous so I can't just grab the memory of employees variable. Is there any other smart way that I can use this encoding function with my stl based structures/container? It is good for me that Encode function works in plain char * buffers so I know exactly what goes in and out but stl structures are not and I am tring to find a nice way so I can use stl with this function.
I am also opening to using any other stl containers if that helps.
Although the element in the std::vector<T> are guaranteed to be laid out contiguously, this doesn't really help: the record you have may include padding and, more importantly, will store the std::string's content external to the std::string object (in case the small string optimization is used, the value may be embedded inside the std::string but it will also contain a couple of bytes which are not part of the std::strings value). Thus, you best option is to format your record and encrypt the formatted string.
The formatting is straight forward but personally I would encapsulate the encoding function into a simple std::streambuf so that the encryption can be done by a filtering stream buffer. Given the signature you gave, this could look something like this:
class encryptbuf
: public std::streambuf {
std::streambuf* d_sbuf;
char d_buffer[1024];
public:
encryptbuf(std::streambuf* sbuf)
: d_sbuf(sbuf) {
this->setp(this->d_buffer, this->d_buffer + sizeof(this->d_buffer) - 1);
}
int overflow(int c) {
if (c != std::char_traits<char>::eof()) {
*this->pptr() = std::char_traits<char>::to_char_type(c);
this->pbump(1);
}
return this->pubsync()? std::char_traits<char>::eof(): std::char_traits<char>::not_eof(c);
}
int sync() {
char* out(0);
int size(0);
Encode(this->pbase(), this->pptr() - this->pbase(), &out, size);
this->d_sbuf->sputn(out, size);
delete[] out; // dunno: it seems the output buffer is allocated but how?
this->setp(this->pbase(), this->epptr());
return this->d_sbuf->pubsync();
}
};
int main() {
encryptbuf sbuf(std::cout.rdbuf());
std::ostream eout(&sbuf);
eout << "print something encoded to standard output\n" << std::flush;
}
Now, creating an output operator for your records just printing to an std::ostream can be used to create an encoded
It's probably easiest to serialize your structure into a string, then encrypt the string. For example:
std::ostringstream buffer;
buffer << a_record.name << "\n" << a_record.location << "\n" << a_record.salary;
encode(buffer.str().c_str(), buffer.str().length(), /* ... */);
If it were me, I'd probably write encode (or at least a wrapper for it) to take input (and probably produce output) in a vector, string, or stream though.
If you want to get ambitious, there are other possibilities. First of all, #MooingDuck raises a good point that it's often worthwhile to overload operator<< for the class, instead of working with the individual items all the time. This will typically be a small function similar to what's above:
std::ostream &operator<<(std::ostream &os, record const &r) {
return os << r.name << "\n" << r.location << "\n" << r.salary;
}
Using this, you'd just have:
std::ostringstream os;
os << a_record;
encode(os.str().c_str(), os.str().length(), /* ... */);
Second, if you want to get really ambitious, you can put the encryption into (for one example) a codecvt facet, so you can automatically encrypt all the data as you write it to a stream, and decrypt it as you read it back in. Another possibility is to build the encryption into a filtering streambuf object instead. The codecvt facet is probably the method that should theoretically be preferred, but the streambuf is almost certainly easier to implement, with less unrelated "stuff" involved.

Binary stream or something like this to save class like std::vector to a file?

I'm not good in IOstream library since I have accustom to stdio and stuff life this, however I got a problem I hoped to be solved in IOstream but I find that it probably not. So I'm quite new to standard C++ libraries but quite well with C++ OOP/Classes and so on.
So I can't use code like
printf (stream, "...", C);
if C is of an aggregate type because I can't create new format string options like %mytupe. Also I can't expect proper behavior of
fwrite/fread (&C, sizeof(C), 1, stream)
if T contains fields that are pointers because fwrite/fread will save/load value of a pointer but not a value stored in memory where the pointer refers to:
class MyClass
{...
private:
{typename} Tp* Data;
} C;
I don't care much of first limit because I can write a function that convert object of each of my class to a text string, it works even if but the last can't be solved easily. For example, I tried to create a function that save each class to binary file but I got a lot of problems with staff like luck of partial specialization of a template and so on (mo matter).
Being tired of making bugs and mistakes while rewriting standard code (like own string and file holder classes) I hoped that learning (at last!) of standard (written by clever people and well-tested :) library will help me since I read a lot that standard C++ library solve first issue with using of streams. I can overload operator << and operator >> or so on to be sure that my class will be saved to or read from text file properly. But what about binary files which is much much more important for me?
What should I do if I want to save an object of class like vector, for example, to the binary file? Using of << and >> fails at all since it says that vector has no operators << and >> overloaded, but even if it had it would produce text data.
Staff like
vector <MyClass> V;
...
ofstream file ("file.bin", ios::binary);
int size1 = ;
file.write((const char*)&V.size(), sizeof(V.size()));
file.write((const char*)&V[0], V.size() * sizeof(MyClass));
is not suitable (and doesn't differs much from using of fwrite) since it saves value (address) of pointer field but not the data stored there (also, what if I declare a "two-dimension" vector as vector > ??). So, if there was overloading of vector operator << like
template <class T> vector
{public:
...
ostream operator << () const
{ostream s;
for (uint32_t k = 0; k < size(); k++)
s << s << this->operator[] (k);
return s;
}
private:
T* Data;
};
and if each T::operator << was overloaded too in the same way (for MyClass - to provide stream of data stored in MyCLass::Tp) it was saved.
(I know, I know, there should be iterator, but maybe I made a more serious mistake because of total misunderstanding of streams? Anyway just I'm talking about idea.)
Well, it is a way to convert data to text, not to got binary data as it is stored in memory, but I know there can be written an interface to work with binary data in the same way (maybe not using << and >> but function names, but it can be for sure)! The questing is: was it done in standard C++ library or somewhere else (another opensource library for C++)? Yes, yes, to properly write a vector to file in one line. (I'll be very surprised if it is not included into standard C++ because how do people save data they work to files if they want to use multidimension dynamic arrays?)
You're looking for the term "serialization", and you might want to use the Boost::Serialization library for that purpose.

Structure alignment in C++

struct Vector
{
float x, y, z;
};
func(Vector *vectors) {...}
usage:
load float *coords = load(file);
func(coords);
I have a question about the alignment of structures in C++. I will pass a set of points to the function func(). Is is OK to do it in the way shown above, or is this relying on platform-dependent behavior? (it works at least with my current compiler) Can somebody recommend a good article on the topic?
Or, is it better to directly create a set of points while loading the data from the file?
Thanks
Structure alignment is implementation-dependent. However, most compilers give you a way of specifying that a structure should be "packed" (that is, arranged in memory with no padding bytes between fields). For example:
struct Vector {
float x;
float y;
float z;
} __attribute__((__packed__));
The above code will cause the gcc compiler to pack the structure in memory, making it easier to dump to a file and read back in later. The exact way to do this may be different for your compiler (details should be in your compiler's manual).
I always list members of packed structures on separate lines in order to be clear about the order in which they should appear. For most compilers this should be equivalent to float x, y, z; but I'm not certain if that is implementation-dependent behavior or not. To be safe, I would use one declaration per line.
If you are reading the data from a file, you need to validate the data before passing it to func. No amount of data alignment enforcement will make up for a lack of input validation.
Edit:
After further reading your code, I understand more what you are trying to do. You have a structure that contains three float values, and you are accessing it with a float* as if it were an array of floats. This is very bad practice. You don't know what kind of padding that your compiler might be using at the beginning or end of your structure. Even with a packed structure, it's not safe to treat the structure like an array. If an array is what you want, then use an array. The safest way is to read the data out of the file, store it into a new object of type struct Vector, and pass that to func. If func is defined to take a struct Vector* as an argument and your compiler is allowing you to pass a float* without griping, then this is indeed implementation-dependent behavior that you should not rely on.
Use an operator>> extraction overload.
std::istream& operator>>(std::istream& stream, Vector& vec) {
stream >> vec.x;
stream >> vec.y;
stream >> vec.z;
return stream;
}
Now you can do:
std::ifstream MyFile("My Filepath", std::ios::openmodes);
Vector vec;
MyFile >> vec;
func(&vec);
Prefer passing by reference than passing by pointer:
void func(Vector& vectors)
{ /*...*/ }
The difference here between a pointer and a reference is that a pointer can be NULL or point to some strange place in memory. A reference refers to an existing object.
As far as alignment goes, don't concern yourself. Compilers handle this automagically (at least alignment in memory).
If you are talking about alignment of binary data in a file, search for the term "serialization".
First of all, your example code is bad:
load float *coords = load(file);
func(coords);
You're passing func() a pointer to a float var instead of a pointer to a Vector object.
Secondly, Vector's total size if equal to (sizeof(float) * 3), or in other words to 12 bytes.
I'd consult my compiler's manual to see how to control the struct's aligment, and just to get a peace of mind I'd set it to, say 16 bytes.
That way I'll know that the file, if contains one vector, is only 16 bytes in size always and I need to read only 16 bytes.
Edit:
Check MSVC9's align capabilities .
Writing binary data is non portable between machines.
About the only portable thing is text (even then can not be relied as not all systems use the same text format (luckily most accept the 127 ASCII characters and hopefully soon we will standardize on something like Unicode (he says with a smile)).
If you want to write data to a file you must decide the exact format of the file. Then write code that will read the data from that format and convert it into your specific hardware's representation for that type. Now this format could be binary or it could be a serialized text format it does not matter much in performance (as the disk IO speed will probably be your limiting factor). In terms of compactness the binary format will probably be more efficient. In terms of ease of writing decoding functions on each platform the text format is definitely easier as a lot of it is already built into the streams.
So simple solution:
Read/Write to a serialized text format.
Also no alignment issues.
#include <algorithm>
#include <fstream>
#include <vector>
#include <iterator>
struct Vector
{
float x, y, z;
};
std::ostream& operator<<(std::ostream& stream, Vector const& data)
{
return stream << data.x << " " << data.y << " " << data.z << " ";
}
std::istream& operator>>(std::istream& stream, Vector& data)
{
return stream >> data.x >> data.y >> data.z;
}
int main()
{
// Copy an array to a file
Vector data[] = {{1.0,2.0,3.0}, {2.0,3.0,4.0}, { 3.0,4.0,5.0}};
std::ofstream file("plop");
std::copy(data, data+3, std::ostream_iterator<Vector>(file));
// Read data from a file.
std::vector<Vector> newData; // use a vector as we don't know how big the file is.
std::ifstream input("inputFile");
std::copy(std::istream_iterator<Vector>(input),
std::istream_iterator<Vector>(),
std::back_inserter(newData)
);
}