Save an std::vector<int> to file - c++

I noticed, that when using:
std::vector<int> v(100000);
...
std::ofstream outfile("outfile.dat", std::ios::out | std::ofstream::binary);
std::copy(v.begin(), v.end(), std::ostream_iterator<int>(outfile));
outfile.close();
my std::vector<int> is not serialized as raw bytes data (4 bytes per int) but as string, i.e. the string representation of each integer is saved to disk, which I don't want.
How to save a std::vector<int> as binary data?
(Note: I'd like to learn it with standard C++03, before learning new methods for it).

To write binary data, use std::ostream::write() instead of std::ostream_iterator (which uses operator<< internally, thus formatted output), eg:
std::vector<int> v(100000);
...
std::ofstream outfile("outfile.dat", std::ofstream::binary);
outfile.write(reinterpret_cast<const char*>(v.data() /* or &v[0] pre-C++11 */), sizeof(int) * v.size());
outfile.close();

std::ostream_iterator writes values to the stream using its operator<<. Elements are written as if you used outfile << value for each member of the vector, which means converting values to text.
So, what you want to do instead is define a class that serializes itself to the stream in a binary representation, eg:
std::copy(v.begin(), v.end(), std::ostream_iterator<BinaryInt>(outfile));
^^^^^^^^^
Now you have to define the BinaryInt type so that it can be constructed by an int value but serialize itself via operator<< appropriately:
struct BinaryInt
{
int value;
BinaryValue(int v): value(v) {}
friend std::ostream& operator<<(std::ostream& str, BinaryInt const& bi)
{
// convert bi.value into a binary representation.
// Note C++ does not define a specific size for int.
// Nor does it define an endianess.
// Nor does it define a specific representation.
// So to be cross platform/OS/compiler you will need to define these
// and convert the integer into this representation.
//
// return str.write(<data>, <size>);
//
// If this is just a test the following would work
// but is extremely brittle for the long term.
return str.write(reinterpret_cast<const char*>(&bi.value), sizeof(bi.value));
}
};

Might I recommend a saner way of doing it by using Protobufs? I will not type the code, but if you are working on a project, do not reinvent the wheel.
Using a protobuf would allow you to save the "type" of your data along with the data, and it would help you to extend your code with minimal fuss.

Related

Writing struct of vector to a binary file in c++

I have a struct and I would like to write it to a binary file (c++ / visual studio 2008).
The struct is:
struct DataItem
{
std::string tag;
std::vector<int> data_block;
DataItem(): data_block(1024 * 1024){}
};
I am filling tha data_block vector with random values:
DataItem createSampleData ()
{
DataItem data;
std::srand(std::time(NULL));
std::generate(data.data_block.begin(), data.data_block.end(), std::rand);
data.tag = "test";
return data;
}
And trying to write the struct to file:
void writeData (DataItem data, long fileName)
{
ostringstream ss;
ss << fileName;
string s(ss.str());
s += ".bin";
char szPathedFileName[MAX_PATH] = {0};
strcat(szPathedFileName,ROOT_DIR);
strcat(szPathedFileName,s.c_str());
ofstream f(szPathedFileName, ios::out | ios::binary | ios::app);
// ******* first I tried to write this way then one by one
//f.write(reinterpret_cast<char *>(&data), sizeof(data));
// *******************************************************
f.write(reinterpret_cast<const char *>(&data.tag), sizeof(data.tag));
f.write(reinterpret_cast<const char *>(&data.data_block), sizeof(data.data_block));
f.close();
}
And the main is:
int main()
{
DataItem data = createSampleData();
for (int i=0; i<5; i++) {
writeData(data,i);
}
}
So I expect a file size at least (1024 * 1024) * 4 (for vector)+ 48 (for tag) but it just writes the tag to the file and creates 1KB file to hard drive.
I can see the contents in while I'm debugging but it doesn't write it to file...
What's wrong with this code, why can't I write the strcut to vector to file? Is there a better/faster or probably efficient way to write it?
Do I have to serialize the data?
Thanks...
Casting a std::string to char * will not produce the result you expect. Neither will using sizeof on it. The same for a std::vector.
For the vector you need to use either the std::vector::data method, or using e.g. &data.data_block[0]. As for the size, use data.data_block.size() * sizeof(int).
Writing the string is another matter though, especially if it can be of variable length. You either have to write it as a fixed-length string, or write the length (in a fixed-size format) followed by the actual string, or write a terminator at the end of the string. To get a C-style pointer to the string use std::string::c_str.
Welcome to the merry world of C++ std::
Basically, vectors are meant to be used as opaque containers.
You can forget about reinterpret_cast right away.
Trying to shut the compiler up will allow you to create an executable, but it will produce silly results.
Basically, you can forget about most of the std::vector syntactic sugar that has to do with iterators, since your fstream will not access binary data through them (it would output a textual representation of your data).
But all is not lost.
You can access the vector underlying array using the newly (C++11) introduced .data() method, though that defeats the point of using an opaque type.
const int * raw_ptr = data.data_block.data();
that will gain you 100 points of cool factor instead of using the puny
const int * raw_ptr = &data.data_block.data[0];
You could also use the even more cryptic &data.data_block.front() for a cool factor bonus of 50 points.
You can then write your glob of ints in one go:
f.write (raw_ptr, sizeof (raw_ptr[0])*data.data_block.size());
Now if you want to do something really too simple, try this:
for (int i = 0 ; i != data.data_block.size() ; i++)
f.write (&data.data_block[i], sizeof (data.data_block[i]));
This will consume a few more microseconds, which will be lost in background noise since the disk I/O will take much more time to complete the write.
Totally not cool, though.

encrypting and serializing stl string and other containers

I have data in stl containers (vector). Each node in the vector is a structure which also contains stl strings.
struct record
{
string name;
string location;
int salary;
}
vector< record > employees;
I want to serialize employees but I also want to encrypt it before serializing.
my encryption function looks like this:
Encode(const char * inBfr, const int in_size, char ** outBfr, int& out_size )
By searching it looks like the stl standard doesn't require the memory of my structure to be contiguous so I can't just grab the memory of employees variable. Is there any other smart way that I can use this encoding function with my stl based structures/container? It is good for me that Encode function works in plain char * buffers so I know exactly what goes in and out but stl structures are not and I am tring to find a nice way so I can use stl with this function.
I am also opening to using any other stl containers if that helps.
Although the element in the std::vector<T> are guaranteed to be laid out contiguously, this doesn't really help: the record you have may include padding and, more importantly, will store the std::string's content external to the std::string object (in case the small string optimization is used, the value may be embedded inside the std::string but it will also contain a couple of bytes which are not part of the std::strings value). Thus, you best option is to format your record and encrypt the formatted string.
The formatting is straight forward but personally I would encapsulate the encoding function into a simple std::streambuf so that the encryption can be done by a filtering stream buffer. Given the signature you gave, this could look something like this:
class encryptbuf
: public std::streambuf {
std::streambuf* d_sbuf;
char d_buffer[1024];
public:
encryptbuf(std::streambuf* sbuf)
: d_sbuf(sbuf) {
this->setp(this->d_buffer, this->d_buffer + sizeof(this->d_buffer) - 1);
}
int overflow(int c) {
if (c != std::char_traits<char>::eof()) {
*this->pptr() = std::char_traits<char>::to_char_type(c);
this->pbump(1);
}
return this->pubsync()? std::char_traits<char>::eof(): std::char_traits<char>::not_eof(c);
}
int sync() {
char* out(0);
int size(0);
Encode(this->pbase(), this->pptr() - this->pbase(), &out, size);
this->d_sbuf->sputn(out, size);
delete[] out; // dunno: it seems the output buffer is allocated but how?
this->setp(this->pbase(), this->epptr());
return this->d_sbuf->pubsync();
}
};
int main() {
encryptbuf sbuf(std::cout.rdbuf());
std::ostream eout(&sbuf);
eout << "print something encoded to standard output\n" << std::flush;
}
Now, creating an output operator for your records just printing to an std::ostream can be used to create an encoded
It's probably easiest to serialize your structure into a string, then encrypt the string. For example:
std::ostringstream buffer;
buffer << a_record.name << "\n" << a_record.location << "\n" << a_record.salary;
encode(buffer.str().c_str(), buffer.str().length(), /* ... */);
If it were me, I'd probably write encode (or at least a wrapper for it) to take input (and probably produce output) in a vector, string, or stream though.
If you want to get ambitious, there are other possibilities. First of all, #MooingDuck raises a good point that it's often worthwhile to overload operator<< for the class, instead of working with the individual items all the time. This will typically be a small function similar to what's above:
std::ostream &operator<<(std::ostream &os, record const &r) {
return os << r.name << "\n" << r.location << "\n" << r.salary;
}
Using this, you'd just have:
std::ostringstream os;
os << a_record;
encode(os.str().c_str(), os.str().length(), /* ... */);
Second, if you want to get really ambitious, you can put the encryption into (for one example) a codecvt facet, so you can automatically encrypt all the data as you write it to a stream, and decrypt it as you read it back in. Another possibility is to build the encryption into a filtering streambuf object instead. The codecvt facet is probably the method that should theoretically be preferred, but the streambuf is almost certainly easier to implement, with less unrelated "stuff" involved.

c++ append a struct into a binary file

I have my struct:
struct a
{
int x;
float f;
double d;
char c;
char s[50];
};
and I wish append each time into my timer schedule into a binary file.
// declaration
std::ofstream outFile;
// constructor:
outFile.open( "save.dat", ios::app );
// tick:
outFile << a << endl;
but inside the save.dat appears only this:
0C3A0000..0C3A0000..0C3A0000..0C3A0000..0C3A0000..0C3A0000..0C3A0000..0C3A0000..0C3A0000..
thanks in advance
What you're currently doing is writing the address of the struct definition.
What you want to do is use ostream::write
outfile.write(reinterpret_cast<char*>(&myStruct), sizeof(a));
This will work as long as your struct is a POD (Plain Old Data) type (which your example is). POD type means that all members are of fixed size.
If you on the other hand have variable sized members then you would need to write out each member one by one.
A sensible way to serialize custom objects is to overload your own output stream operator:
std::ostream & operator<<(std::ostream & o, const a & x)
{
o.write(reinterpret_cast<char*>(&x.x), sizeof(int));
o.write(reinterpret_cast<char*>(&x.f), sizeof(float));
/* ... */
return o;
}
a x;
std::ofstream ofile("myfile.bin", std::ios::binary | std::ios::app);
ofile << a;
This is still platform-dependent, so to be a bit safer, you should probably use fixed-width data types like int32_t etc.
It might also not be the best idea semantically to use << for binary output, since it's often used for formatted output. Perhaps a slightly safer method would be to write a function void serialize(const a &, std::ostream &);

how to iterate into a smaller container (i.e. stride != 1)

There is a question that is very similar in spirit here. Unfortunately that question didn't prompt much response - I thought I would ask a more specific question with the hope that an alternative method can be suggested.
I'm writing a binary file into std::cin (with tar --to-command=./myprog).
The binary file happens to be a set of floats and I want to put the data into std::vector<float> - ideally the c++ way.
I can generate a std::vector<char> very nicely (thanks to this answer)
#include <fstream>
#include <iostream>
#include <iterator>
#include <algorithm>
#include <vector>
int
main (int ac, char **av)
{
std::istream& input = std::cin;
std::vector<char> buffer;
std::copy(
std::istreambuf_iterator<char>(input),
std::istreambuf_iterator<char>( ),
std::back_inserter(buffer)); // copies all data into buffer
}
I now want to transform my std::vector<char> into a std::vector<float>, presumably with std::transform and a function that does the conversion (a char[2] to a float, say). I am struggling however, because my std::vector<float> will have half as many elements as std::vector<char>. If I could iterate with a stride of 2 then I think I would be fine, but from the previous question it seems that I cannot do that (at least not elegantly).
I would write my own class that reads two chars and converts it to float.
struct FloatConverter
{
// When the FloatConverter object is assigned to a float value
// i.e. When put into the vector<float> this method will be called
// to convert the object into a float.
operator float() { return 1.0; /* How you convert the 2 chars */ }
friend std::istream& operator>>(std::istream& st, FloatConverter& fc)
{
// You were not exactly clear on what should be read in.
// So I went pedantic and made sure we just read 2 characters.
fc.data[0] = str.get();
fc.data[1] = str.get();
retun str;
}
char data[2];
};
Based on comments by GMan:
struct FloatConverterFromBinary
{
// When the FloatConverterFromBinary object is assigned to a float value
// i.e. When put into the vector<float> this method will be called
// to convert the object into a float.
operator float() { return data }
friend std::istream& operator>>(std::istream& st, FloatConverterFromBinary& fc)
{
// Use reinterpret_cast to emphasis how dangerous and unportable this is.
str.read(reinterpret_cast<char*>(&fc.data), sizeof(float));
retun str;
}
float data;
};
Then use it like this:
int main (int ac, char **av)
{
std::istream& input = std::cin;
std::vector<float> buffer;
// Note: Because the FloatConverter does not drop whitespace while reading
// You can potentially use std::istream_iterator<>
//
std::copy(
std::istreambuf_iterator<FloatConverter>(input),
std::istreambuf_iterator<FloatConverter>( ),
std::back_inserter(buffer));
}
It seems to me that the best answer is to write a pair of your own iterators that parse the file the way that you want. You could change std::vector<char> to std::vector<float> and use the same streambuf iterators provided the input was formatted with at least one space between values.
use boost range adaptors:
boost::copy(istream_range(input)|stride(2),back_inserter(buffer));
you might need to write your own istreambuf_iterator, which is trivial.

Structure alignment in C++

struct Vector
{
float x, y, z;
};
func(Vector *vectors) {...}
usage:
load float *coords = load(file);
func(coords);
I have a question about the alignment of structures in C++. I will pass a set of points to the function func(). Is is OK to do it in the way shown above, or is this relying on platform-dependent behavior? (it works at least with my current compiler) Can somebody recommend a good article on the topic?
Or, is it better to directly create a set of points while loading the data from the file?
Thanks
Structure alignment is implementation-dependent. However, most compilers give you a way of specifying that a structure should be "packed" (that is, arranged in memory with no padding bytes between fields). For example:
struct Vector {
float x;
float y;
float z;
} __attribute__((__packed__));
The above code will cause the gcc compiler to pack the structure in memory, making it easier to dump to a file and read back in later. The exact way to do this may be different for your compiler (details should be in your compiler's manual).
I always list members of packed structures on separate lines in order to be clear about the order in which they should appear. For most compilers this should be equivalent to float x, y, z; but I'm not certain if that is implementation-dependent behavior or not. To be safe, I would use one declaration per line.
If you are reading the data from a file, you need to validate the data before passing it to func. No amount of data alignment enforcement will make up for a lack of input validation.
Edit:
After further reading your code, I understand more what you are trying to do. You have a structure that contains three float values, and you are accessing it with a float* as if it were an array of floats. This is very bad practice. You don't know what kind of padding that your compiler might be using at the beginning or end of your structure. Even with a packed structure, it's not safe to treat the structure like an array. If an array is what you want, then use an array. The safest way is to read the data out of the file, store it into a new object of type struct Vector, and pass that to func. If func is defined to take a struct Vector* as an argument and your compiler is allowing you to pass a float* without griping, then this is indeed implementation-dependent behavior that you should not rely on.
Use an operator>> extraction overload.
std::istream& operator>>(std::istream& stream, Vector& vec) {
stream >> vec.x;
stream >> vec.y;
stream >> vec.z;
return stream;
}
Now you can do:
std::ifstream MyFile("My Filepath", std::ios::openmodes);
Vector vec;
MyFile >> vec;
func(&vec);
Prefer passing by reference than passing by pointer:
void func(Vector& vectors)
{ /*...*/ }
The difference here between a pointer and a reference is that a pointer can be NULL or point to some strange place in memory. A reference refers to an existing object.
As far as alignment goes, don't concern yourself. Compilers handle this automagically (at least alignment in memory).
If you are talking about alignment of binary data in a file, search for the term "serialization".
First of all, your example code is bad:
load float *coords = load(file);
func(coords);
You're passing func() a pointer to a float var instead of a pointer to a Vector object.
Secondly, Vector's total size if equal to (sizeof(float) * 3), or in other words to 12 bytes.
I'd consult my compiler's manual to see how to control the struct's aligment, and just to get a peace of mind I'd set it to, say 16 bytes.
That way I'll know that the file, if contains one vector, is only 16 bytes in size always and I need to read only 16 bytes.
Edit:
Check MSVC9's align capabilities .
Writing binary data is non portable between machines.
About the only portable thing is text (even then can not be relied as not all systems use the same text format (luckily most accept the 127 ASCII characters and hopefully soon we will standardize on something like Unicode (he says with a smile)).
If you want to write data to a file you must decide the exact format of the file. Then write code that will read the data from that format and convert it into your specific hardware's representation for that type. Now this format could be binary or it could be a serialized text format it does not matter much in performance (as the disk IO speed will probably be your limiting factor). In terms of compactness the binary format will probably be more efficient. In terms of ease of writing decoding functions on each platform the text format is definitely easier as a lot of it is already built into the streams.
So simple solution:
Read/Write to a serialized text format.
Also no alignment issues.
#include <algorithm>
#include <fstream>
#include <vector>
#include <iterator>
struct Vector
{
float x, y, z;
};
std::ostream& operator<<(std::ostream& stream, Vector const& data)
{
return stream << data.x << " " << data.y << " " << data.z << " ";
}
std::istream& operator>>(std::istream& stream, Vector& data)
{
return stream >> data.x >> data.y >> data.z;
}
int main()
{
// Copy an array to a file
Vector data[] = {{1.0,2.0,3.0}, {2.0,3.0,4.0}, { 3.0,4.0,5.0}};
std::ofstream file("plop");
std::copy(data, data+3, std::ostream_iterator<Vector>(file));
// Read data from a file.
std::vector<Vector> newData; // use a vector as we don't know how big the file is.
std::ifstream input("inputFile");
std::copy(std::istream_iterator<Vector>(input),
std::istream_iterator<Vector>(),
std::back_inserter(newData)
);
}