Binary Serialization of multiple data types - c++

I'm trying to serialize a struct in C++ in Visual Studio with multiple data types into binary file and de-serialize them all at once. But facing problem with memory allocation with strings while reading the data back. I know what the problem is and that there are other open source libraries which I can use but I do not want to use them unless it is really necessary and I also know that I can write/read data types one by one but that method is too long for struct containing large number of data types. I want to perform write/read operations in one go without using any open source library.
Here is a struct for example:
struct Frame {
bool isPass{ true };
uint64_t address{ 0 };
uint32_t age{ 0 };
float marks{ 0.0 };
std::string userName;
};
is there any way to perform write/read operation in one go in binary format?
Thankyou

Not using existing libraries is NEVER good. Still,...
You could, for example, create a create pure virtual class like
class Serializable
{
public:
virtual std::vector<char> serialize() = 0;
}
Then:
You implement it for all your own classes that you have
You implement serialization methods for all STL and PoD types that you use (std::strings, PoD types and structs with only PoD types) inside of some static class. Basically, during serialization you can put there something like [size][type][data ~ [size][type][data][size][type][data]].
Then, when you process a class for serialization, you create a buffer, first put a size into it, then type identifier, then put all bytes from all members serialized by those you have implemented in 1) and 2)
When you read anything from such an array, you do the same backwards: read N bytes from an array (first field), determine its actual type (second field), read all members, deserialize all stuff included.
The process is recursive.
But... man, its really a bad idea. Use protobuf, or boost::serialization. Or anything else - there's a lot of serialization libraries on the internet. Read these precious comments under your question. People are right.

Assuming you have some other mechanism to keep track of your frame size rewrite your struct as:
struct Frame {
bool isPass{ true };
uint8_t pad1[3]{};
uint32_t age{ 0 };
uint64_t address{ 0 };
double marks{ 0.0 };
char userName[];
};
If we have a pointer Frame* frame. We can write this using write(fd, frame, frame_size). (frame_size > sizeof(frame)).
Assuming you have read the frame into a buffer, you can access the data using:
auto frame = reinterpret<const Frame*>(buf) The length of userName will therefore be frame_size - sizeof(Frame). You can now access the elements through your struct.
The is very C like and the approach is limited to only one variable length element at the end of the array.

Related

Change endianness of entire struct in C++

I am writing a parser in C++ to parse a well defined binary file. I have declared all the required structs. And since only particular fields are of interest to me, so in my structs I have skipped non-required fields by creating char array of size equal to skipped bytes. So I am just reading the file in char array and casting the char pointer to my struct pointer. Now problem is that all data fields in that binary are in big endian order, so after typecasting I need to change the endianness of all the struct fields. One way is to do it manually for each and every field. But there are various structs with many fields, so it'll be very cumbersome to do it manually. So what's the best way to achieve this. And since I'll be parsing very huge such files (say in TB's), so I require a fast way to do this.
EDIT : I have use attribute(packed) so no need to worry about padding.
If you can do misaligned accesses with no penalty, and you don't mind compiler- or platform-specific tricks to control padding, this can work. (I assume you are OK with this since you mention __attribute__((packed))).
In this case the nicest approach is to write value wrappers for your raw data types, and use those instead of the raw type when declaring your struct in the first place. Remember the value wrapper must be trivial/POD-like for this to work. If you have a POSIX platform you can use ntohs/ntohl for the endian conversion, it's likely to be better optimized that whatever you write yourself.
If misaligned accesses are illegal or slow on your platform, you need to deserialize instead. Since we don't have reflection yet, you can do this with the same value wrappers (plus an Ignore<N> placeholder that skips N bytes for fields you're not interested), and declare them in a tuple instead of a struct - you can iterate over the members in a tuple and tell each to deserialize itself from the message.
One way to do that is combine C preprocessor with C++ operators. Write a couple of C++ classes like this one:
#include "immintrin.h"
class FlippedInt32
{
int value;
public:
inline operator int() const
{
return _bswap( value );
}
};
class FlippedInt64
{
__int64 value;
public:
inline operator __int64() const
{
return _bswap64( value );
}
};
Then,
#define int FlippedInt32
before including the header that define these structures. #undef immediately after the #include.
This will replace all int fields in the structures with FlippedInt32, which has the same size but returns flipped bytes.
If it’s your own structures which you can modify you don’t need the preprocessor part. Just replace the integers with the byte-flipping classes.
If you can come up with a list of offsets (in-bytes, relative to the top of the file) of the fields that need endian-conversion, as well as the size of those fields, then you could do all of the endian-conversion with a single for-loop, directly on the char array. E.g. something like this (pseudocode):
struct EndianRecord {
size_t offsetFromTop;
size_t fieldSizeInByes;
};
std::vector<EndianRecord> todoList;
// [populate the todo list here...]
char * rawData = [pointer to the raw data]
for (size_t i=0; i<todoList.size(); i++)
{
const EndianRecord & er = todoList[i];
ByteSwap(&rawData[er.offsetFromTop], er.fieldSizeBytes);
}
struct MyPackedStruct * data = (struct MyPackedStruct *) rawData;
// Now you can just read the member variables
// as usual because you know they are already
// in the correct endian-format.
... of course the difficult part is coming up with the correct todoList, but since the file format is well-defined, it should be possible to generate it algorithmically (or better yet, create it as a generator with e.g. a GetNextEndianRecord() method that you can call, so that you don't have to store a very large vector in memory)

C++ program crashes on exit (binary files) [duplicate]

I have a small hierarchy of objects that I need to serialize and transmit via a socket connection. I need to both serialize the object, then deserialize it based on what type it is. Is there an easy way to do this in C++ (as there is in Java)?
Just to be clear, I'm looking for methods on converting an object into an array of bytes, then back into an object. I can handle the socket transmission.
Talking about serialization, the boost serialization API comes to my mind. As for transmitting the serialized data over the net, I'd either use Berkeley sockets or the asio library.
If you want to serialize your objects to a byte array, you can use the boost serializer in the following way (taken from the tutorial site):
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
class gps_position
{
private:
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & degrees;
ar & minutes;
ar & seconds;
}
int degrees;
int minutes;
float seconds;
public:
gps_position(){};
gps_position(int d, int m, float s) :
degrees(d), minutes(m), seconds(s)
{}
};
Actual serialization is then pretty easy:
#include <fstream>
std::ofstream ofs("filename.dat", std::ios::binary);
// create class instance
const gps_position g(35, 59, 24.567f);
// save data to archive
{
boost::archive::binary_oarchive oa(ofs);
// write class instance to archive
oa << g;
// archive and stream closed when destructors are called
}
Deserialization works in an analogous manner.
There are also mechanisms which let you handle serialization of pointers (complex data structures like tress etc are no problem), derived classes and you can choose between binary and text serialization. Besides all STL containers are supported out of the box.
There is a generic pattern you can use to serialize objects. The fundemental primitive is these two functions you can read and write from iterators:
template <class OutputCharIterator>
void putByte(char byte, OutputCharIterator &&it)
{
*it = byte;
++it;
}
template <class InputCharIterator>
char getByte(InputCharIterator &&it, InputCharIterator &&end)
{
if (it == end)
{
throw std::runtime_error{"Unexpected end of stream."};
}
char byte = *it;
++it;
return byte;
}
Then serialization and deserialization functions follow the pattern:
template <class OutputCharIterator>
void serialize(const YourType &obj, OutputCharIterator &&it)
{
// Call putbyte or other serialize overloads.
}
template <class InputCharIterator>
void deserialize(YourType &obj, InputCharIterator &&it, InputCharIterator &&end)
{
// Call getByte or other deserialize overloads.
}
For classes you can use the friend function pattern to allow the overload to be found using ADL:
class Foo
{
int internal1, internal2;
// So it can be found using ADL and it accesses private parts.
template <class OutputCharIterator>
friend void serialize(const Foo &obj, OutputCharIterator &&it)
{
// Call putByte or other serialize overloads.
}
// Deserialize similar.
};
Then in your program you can serialize and object into a file like this:
std::ofstream file("savestate.bin");
serialize(yourObject, std::ostreambuf_iterator<char>(file));
Then read:
std::ifstream file("savestate.bin");
deserialize(yourObject, std::istreamBuf_iterator<char>(file), std::istreamBuf_iterator<char>());
My old answer here:
Serialization means turning your object into binary data. While deserialization means recreating an object from the data.
When serializing you are pushing bytes into an uint8_t vector.
When unserializing you are reading bytes from an uint8_t vector.
There are certainly patterns you can employ when serializing stuff.
Each serializable class should have a serialize(std::vector<uint8_t> &binaryData) or similar signatured function that will write its binary representation into the provided vector. Then this function may pass this vector down to it's member's serializing functions so they can write their stuff into it too.
Since the data representation can be different on different architectures.
You need to find out a scheme how to represent the data.
Let's start from the basics:
Serializing integer data
Just write the bytes in little endian order. Or use varint representation if size matters.
Serialization in little endian order:
data.push_back(integer32 & 0xFF);
data.push_back((integer32 >> 8) & 0xFF);
data.push_back((integer32 >> 16) & 0xFF);
data.push_back((integer32 >> 24) & 0xFF);
Deserialization from little endian order:
integer32 = data[0] | (data[1] << 8) | (data[2] << 16) | (data[3] << 24);
Serializing floating point data
As far as I know the IEEE 754 has a monopoly here. I don't know of any mainstream architecture that would use something else for floats. The only thing that can be different is the byte order. Some architectures use little endian, others use big endian byte order. This means you need to be careful which order to you loud up the bytes on the receiving end. Another difference can be handling of the denormal and infinity and NAN values. But as long as you avoid these values you should be OK.
Serialization:
uint8_t mem[8];
memcpy(mem, doubleValue, 8);
data.push_back(mem[0]);
data.push_back(mem[1]);
...
Deserialization is doing it backward. Mind the byte order of your architecture!
Serializing strings
First you need to agree on an encoding. UTF-8 is common. Then store it as a length prefixed manner: first you store the length of the string using a method I mentioned above, then write the string byte-by-byte.
Serializing arrays.
They are the same as a strings. You first serialize an integer representing the size of the array then serialize each object in it.
Serializing whole objects
As I said before they should have a serialize method that add content to a vector.
To unserialize an object, it should have a constructor that takes byte stream. It can be an istream but in the simplest case it can be just a reference uint8_t pointer. The constructor reads the bytes it wants from the stream and sets up the fields in the object.
If the system is well designed and serialize the fields in object field order, you can just pass the stream to the field's constructors in an initializer list and have them deserialized in the right order.
Serializing object graphs
First you need to make sure if these objects are really something you want to serialize. You don't need to serialize them if instances of these objects present on the destination.
Now you found out you need to serialize that object pointed by a pointer.
The problem of pointers that they are valid only the in the program that uses them. You cannot serialize pointer, you should stop using them in objects. Instead create object pools.
This object pool is basically a dynamic array which contains "boxes". These boxes have a reference count. Non-zero reference count indicates a live object, zero indicates an empty slot. Then you create smart pointer akin to the shared_ptr that doesn't store the pointer to the object, but the index in the array. You also need to agree on an index that denotes the null pointer, eg. -1.
Basically what we did here is replaced the pointers with array indexes.
Now when serializing you can serialize this array index as usual. You don't need to worry about where does the object will be in memory on the destination system. Just make sure they have the same object pool too.
So we need to serialize the object pools. But which ones? Well when you serialize an object graph you are not serializing just an object, you are serializing an entire system. This means the serialization of the system shouldn't start from parts of the system. Those objects shouldn't worry about the rest of the system, they only need to serialize the array indexes and that's it. You should have a system serializer routine that orchestrates the serialization of the system and walks through the relevant object pools and serialize all of them.
On the receiving end all the arrays an the objects within are deserialized, recreating the desired object graph.
Serializing function pointers
Don't store pointers in the object. Have a static array which contains the pointers to these functions and store the index in the object.
Since both programs have this table compiled into themshelves, using just the index should work.
Serializing polymorphic types
Since I said you should avoid pointers in serializable types and you should use array indexes instead, polymorphism just cannot work, because it requires pointers.
You need to work this around with type tags and unions.
Versioning
On top of all the above. You might want different versions of the software interoperate.
In this case each object should write a version number at the beginning of their serialization to indicate version.
When loading up the object at the other side the, newer objects maybe able to handle the older representations but the older ones cannot handle the newer so they should throw an exception about this.
Each time a something changes, you should bump the version number.
So to wrap this up, serialization can be complex. But fortunately you don't need to serialize everything in your program, most often only the protocol messages are serialized, which are often plain old structs. So you don't need the complex tricks I mentioned above too often.
In some cases, when dealing with simple types, you can do:
object o;
socket.write(&o, sizeof(o));
That's ok as a proof-of-concept or first-draft, so other members of your team can keep working on other parts.
But sooner or later, usually sooner, this will get you hurt!
You run into issues with:
Virtual pointer tables will be corrupted.
Pointers (to data/members/functions) will be corrupted.
Differences in padding/alignment on different machines.
Big/Little-Endian byte ordering issues.
Variations in the implementation of float/double.
(Plus you need to know what you are unpacking into on the receiving side.)
You can improve upon this by developing your own marshalling/unmarshalling methods for every class. (Ideally virtual, so they can be extended in subclasses.) A few simple macros will let you to write out different basic types quite quickly in a big/little-endian-neutral order.
But that sort of grunt work is much better, and more easily, handled via boost's serialization library.
By way of learning I wrote a simple C++11 serializer. I had tried various of
the other more heavyweight offerings, but wanted something that I could actually
understand when it went wrong or failed to compile with the latest g++ (which
happened for me with Cereal; a really nice library but complex and I could not
grok the errors the compiler threw up on upgrade.) Anyway, it's header only
and handles POD types, containers, maps etc... No versioning and it will only
load files from the same arch it was saved in.
https://github.com/goblinhack/simple-c-plus-plus-serializer
Example usage:
#include "c_plus_plus_serializer.h"
static void serialize (std::ofstream out)
{
char a = 42;
unsigned short b = 65535;
int c = 123456;
float d = std::numeric_limits<float>::max();
double e = std::numeric_limits<double>::max();
std::string f("hello");
out << bits(a) << bits(b) << bits(c) << bits(d);
out << bits(e) << bits(f);
}
static void deserialize (std::ifstream in)
{
char a;
unsigned short b;
int c;
float d;
double e;
std::string f;
in >> bits(a) >> bits(b) >> bits(c) >> bits(d);
in >> bits(e) >> bits(f);
}

Storing dynamic length data 'inside' structure

Problem statement : User provides some data which I have to store inside a structure. This data which I receive come in a data structure which allows user to dynamically add data to it.
Requirement: I need a way to store this data 'inside' the structure, contiguously.
eg. Suppose user can pass me strings which I have to store. So I wrote something like this :
void pushData( string userData )
{
struct
{
string junk;
} data;
data.junk = userData;
}
Problem : When I do this kind of storage, actual data is not really stored 'inside' the structure because string is not POD. Similar problem comes when I receive vector or list.
Then I could do something like this :
void pushData( string userData )
{
struct
{
char junk[100];
} data;
// Copy userdata into array junk
}
This store the data 'inside' the structure, but then, I can't put an upper limit on the size of string user can provide.
Can someone suggest some approach ?
P.S. : I read something about serializability, but couldnt really make out clearly if it could be helpful in my case. If it is the way to go forward, can someone give idea how to proceed with it ?
Edit :
No this is not homework.
I have written an implementation which can pass this kind of structure over message queues. It works fine with PODs, but I need to extend it to pass on dynamic data as well.
This is how message queue takes data:
i. Give it a pointer and tell the size till which it should read and transfer data.
ii. For plain old data types, data is store inside the structure, I can easily pass on the pointer of this structure to message queue to other processes.
iii. But in case of vector/string/list etc, actual data is not inside the structure and thus if I pass on the pointer of this structure, message queue will not really pass on the actual data, but rather the pointers which would be stored inside this structure.
You can see this and this. I am trying to achieve something similar.
void pushData( string userData )
{
struct Data
{
char junk[1];
};
struct Data* data = malloc(userData.size() + 1);
memcpy(data->junk, userData.data(), userData.size());
data->junk[userData.size()] = '\0'; // assuming you want null termination
}
Here we use an array of length 1, but we allocate the struct using malloc so it can actually have any size we want.
You ostensibly have some rather artificial constraints, but to answer the question: for a single struct to contain a variable amount of data is not possible... the closest you can come is to have the final member be say char [1], put such a struct at the start of a variably-sized heap region, and use the fact that array indexing is not checked to access memory beyond that character. To learn about this technique, see http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html (or the answer John Zwinck just posted)
Another approach is e.g. template <size_t N> struct X { char data_[size]; };, but each instantiation will be a separate struct type, and you can't pre-instantiate every size you might want at run-time (given you've said you don't want an upper bound). Even if you could, writing code that handles different instantiations as the data grows would be nightmarish, as would the code bloat caused.
Having a structure in one place with a string member with data in another place is almost always preferable to the hackery above.
Taking a hopefully-not-so-wild guess, I assume your interest is in serialising the object based on starting address and size, in some generic binary block read/write...? If so, that's still problematic even if your goal were satisfied, as you need to find out the current data size from somewhere. Writing struct-specific serialisation routines that incorporates the variable-length data on the heap is much more promising.
Simple solution:estimate max_size of data (ex 1000), to prevent memory leak(if free memory & malloc new size memory -> fragment memory) when pushData multiple called.
#define MAX_SIZE 1000
void pushData( string userData )
{
struct Data
{
char junk[MAX_SIZE];
};
memcpy(data->junk, userData.data(), userData.size());
data->junk[userData.size()] = '\0'; // assuming you want null termination
}
As mentioned by John Zwinck....you can use dynamic memory allocation to solve your problem.
void pushData( string userData )
{
struct Data
{
char *junk;
};
struct Data *d = calloc(sizeof(struct data), 1);
d->junk = malloc(strlen(userData)+1);
strcpy(d->junk, userdata);
}

Handling different datatypes in a single structure

I need to send some information on a VxWorks message queue. The information to be sent is decided at runtime and may be of different data types. I am using a structure for this -
struct structData
{
char m_chType; // variable to indicate the data type - long, float or string
long m_lData; // variable to hold long value
float m_fData; // variable to hold float value
string m_strData; // variable to hold string value
};
I am currently sending an array of structData over the message queue.
structData arrStruct[MAX_SIZE];
The problem here is that only one variable in the structure is useful at a time, the other two are useless. The message queue is therefore unneccessarily overloaded.
I can't use unions because the datatype and the value are required.
I tried using templates, but it doesn't solve the problem.I can only send an array of structures of one datatype at a time.
template <typename T>
struct structData
{
char m_chType;
T m_Data;
}
structData<int> arrStruct[MAX_SIZE];
Is there a standard way to hold such information?
I don't see why you cannot use a union. This is the standard way:
struct structData
{
char m_chType; // variable to indicate the data type - long, float or string
union
{
long m_lData; // variable to hold long value
float m_fData; // variable to hold float value
char *m_strData; // variable to hold string value
}
};
Normally then, you switch on the data type, and then access on the field which is valid for that type.
Note that you cannot put a string into a union, because the string type is a non-POD type. I have changed it to use a pointer, which could be a C zero-terminated string. You must then consider the possibility of allocating and deleting the string data as necessary.
You can use boost::variant for this.
There are many ways to handle different datatypes. Besides the union solution you can use a generic struct like :
typedef struct
{
char m_type;
void* m_data;
}
structData;
This way you know the type and you can cast the void* pointer into the right type.
This is like the union solution a more C than C++ way of doing things.
The C++ way would be something using inheritance. You define a base "Data" class an use inheritance to specialize the data. You can use RTTI to check for type if needed.
But as you stated, you need to send your data over a VxWork queue. I'm no specialist but if those queues are OS realtime queue, all the previous solutions are not good ones. Your problem is that your data have variable length (in particular string) and you need to send them through a queue that probably ask for something like a fixed length datastruct and the actual length of this datastruct.
In my experience, the right way to handle this is to serialize the data into something like a buffer class/struct. This way you can optimize the size (you only serialize what you need) and you can send your buffer through your queue.
To serialize you can use something like 1 byte for type then data. To handle variable length data, you can use 1 to n bytes to encode data length, so you can deserialize the data.
For a string :
1 byte to code the type (0x01 = string, ...)
2 bytes to code the string length (if you need less than 65536 bytes)
n data bytes
So the string "Hello" will be serialized as :
0x00 0x00 0x07 0x65 0x48 0x6c 0x6c
You need a buffer class and a serializer/deserializer class. Then you do something like :
serialize data
send serialized data into queue
and on the other side
receive data
deserialize data
I hope it helps and that I have not misunderstood your problem. The serialization part is overkill if the VxWorks queues are not what I think ...
Be very careful with the "string" member in the message queue. Under the hood, it's a pointer to some malloc'd memory that contains the actual string characters, so you're only passing the 'pointer' in your queue, not the real string.
The receiving process may potentially not be able to access the string memory, or -worse - it may have already been destroyed by the time your message reader tries to get it.
+1 for 1800 and Ylisar.
Using an union for this kind of things is probably the way to go. But, as others pointed out, it has several drawbacks:
inherently error prone.
not safely extensible.
can't handle members with constructors (although you can use pointers).
So unless you can built a nice wrapper, going the boost::variant way is probably safer.
This is a bit offtopic, but this issue is one of the reasons why languages of the ML family have such a strong appeal (at least for me). For example, your issue is elegantly solved in OCaml with:
(*
* LData, FData and StrData are constructors for this sum type,
* they can have any number of arguments
*)
type structData = LData of int | FData of float | StrData of string
(*
* the compiler automatically infers the function signature
* and checks the match exhaustiveness.
*)
let print x =
match x with
| LData(i) -> Printf.printf "%d\n" i
| FData(f) -> Printf.printf "%f\n" f
| StrData(s) -> Printf.printf "%s\n" s
Try QVariant in Qt

How do you serialize an object in C++?

I have a small hierarchy of objects that I need to serialize and transmit via a socket connection. I need to both serialize the object, then deserialize it based on what type it is. Is there an easy way to do this in C++ (as there is in Java)?
Just to be clear, I'm looking for methods on converting an object into an array of bytes, then back into an object. I can handle the socket transmission.
Talking about serialization, the boost serialization API comes to my mind. As for transmitting the serialized data over the net, I'd either use Berkeley sockets or the asio library.
If you want to serialize your objects to a byte array, you can use the boost serializer in the following way (taken from the tutorial site):
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
class gps_position
{
private:
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & degrees;
ar & minutes;
ar & seconds;
}
int degrees;
int minutes;
float seconds;
public:
gps_position(){};
gps_position(int d, int m, float s) :
degrees(d), minutes(m), seconds(s)
{}
};
Actual serialization is then pretty easy:
#include <fstream>
std::ofstream ofs("filename.dat", std::ios::binary);
// create class instance
const gps_position g(35, 59, 24.567f);
// save data to archive
{
boost::archive::binary_oarchive oa(ofs);
// write class instance to archive
oa << g;
// archive and stream closed when destructors are called
}
Deserialization works in an analogous manner.
There are also mechanisms which let you handle serialization of pointers (complex data structures like tress etc are no problem), derived classes and you can choose between binary and text serialization. Besides all STL containers are supported out of the box.
There is a generic pattern you can use to serialize objects. The fundemental primitive is these two functions you can read and write from iterators:
template <class OutputCharIterator>
void putByte(char byte, OutputCharIterator &&it)
{
*it = byte;
++it;
}
template <class InputCharIterator>
char getByte(InputCharIterator &&it, InputCharIterator &&end)
{
if (it == end)
{
throw std::runtime_error{"Unexpected end of stream."};
}
char byte = *it;
++it;
return byte;
}
Then serialization and deserialization functions follow the pattern:
template <class OutputCharIterator>
void serialize(const YourType &obj, OutputCharIterator &&it)
{
// Call putbyte or other serialize overloads.
}
template <class InputCharIterator>
void deserialize(YourType &obj, InputCharIterator &&it, InputCharIterator &&end)
{
// Call getByte or other deserialize overloads.
}
For classes you can use the friend function pattern to allow the overload to be found using ADL:
class Foo
{
int internal1, internal2;
// So it can be found using ADL and it accesses private parts.
template <class OutputCharIterator>
friend void serialize(const Foo &obj, OutputCharIterator &&it)
{
// Call putByte or other serialize overloads.
}
// Deserialize similar.
};
Then in your program you can serialize and object into a file like this:
std::ofstream file("savestate.bin");
serialize(yourObject, std::ostreambuf_iterator<char>(file));
Then read:
std::ifstream file("savestate.bin");
deserialize(yourObject, std::istreamBuf_iterator<char>(file), std::istreamBuf_iterator<char>());
My old answer here:
Serialization means turning your object into binary data. While deserialization means recreating an object from the data.
When serializing you are pushing bytes into an uint8_t vector.
When unserializing you are reading bytes from an uint8_t vector.
There are certainly patterns you can employ when serializing stuff.
Each serializable class should have a serialize(std::vector<uint8_t> &binaryData) or similar signatured function that will write its binary representation into the provided vector. Then this function may pass this vector down to it's member's serializing functions so they can write their stuff into it too.
Since the data representation can be different on different architectures.
You need to find out a scheme how to represent the data.
Let's start from the basics:
Serializing integer data
Just write the bytes in little endian order. Or use varint representation if size matters.
Serialization in little endian order:
data.push_back(integer32 & 0xFF);
data.push_back((integer32 >> 8) & 0xFF);
data.push_back((integer32 >> 16) & 0xFF);
data.push_back((integer32 >> 24) & 0xFF);
Deserialization from little endian order:
integer32 = data[0] | (data[1] << 8) | (data[2] << 16) | (data[3] << 24);
Serializing floating point data
As far as I know the IEEE 754 has a monopoly here. I don't know of any mainstream architecture that would use something else for floats. The only thing that can be different is the byte order. Some architectures use little endian, others use big endian byte order. This means you need to be careful which order to you loud up the bytes on the receiving end. Another difference can be handling of the denormal and infinity and NAN values. But as long as you avoid these values you should be OK.
Serialization:
uint8_t mem[8];
memcpy(mem, doubleValue, 8);
data.push_back(mem[0]);
data.push_back(mem[1]);
...
Deserialization is doing it backward. Mind the byte order of your architecture!
Serializing strings
First you need to agree on an encoding. UTF-8 is common. Then store it as a length prefixed manner: first you store the length of the string using a method I mentioned above, then write the string byte-by-byte.
Serializing arrays.
They are the same as a strings. You first serialize an integer representing the size of the array then serialize each object in it.
Serializing whole objects
As I said before they should have a serialize method that add content to a vector.
To unserialize an object, it should have a constructor that takes byte stream. It can be an istream but in the simplest case it can be just a reference uint8_t pointer. The constructor reads the bytes it wants from the stream and sets up the fields in the object.
If the system is well designed and serialize the fields in object field order, you can just pass the stream to the field's constructors in an initializer list and have them deserialized in the right order.
Serializing object graphs
First you need to make sure if these objects are really something you want to serialize. You don't need to serialize them if instances of these objects present on the destination.
Now you found out you need to serialize that object pointed by a pointer.
The problem of pointers that they are valid only the in the program that uses them. You cannot serialize pointer, you should stop using them in objects. Instead create object pools.
This object pool is basically a dynamic array which contains "boxes". These boxes have a reference count. Non-zero reference count indicates a live object, zero indicates an empty slot. Then you create smart pointer akin to the shared_ptr that doesn't store the pointer to the object, but the index in the array. You also need to agree on an index that denotes the null pointer, eg. -1.
Basically what we did here is replaced the pointers with array indexes.
Now when serializing you can serialize this array index as usual. You don't need to worry about where does the object will be in memory on the destination system. Just make sure they have the same object pool too.
So we need to serialize the object pools. But which ones? Well when you serialize an object graph you are not serializing just an object, you are serializing an entire system. This means the serialization of the system shouldn't start from parts of the system. Those objects shouldn't worry about the rest of the system, they only need to serialize the array indexes and that's it. You should have a system serializer routine that orchestrates the serialization of the system and walks through the relevant object pools and serialize all of them.
On the receiving end all the arrays an the objects within are deserialized, recreating the desired object graph.
Serializing function pointers
Don't store pointers in the object. Have a static array which contains the pointers to these functions and store the index in the object.
Since both programs have this table compiled into themshelves, using just the index should work.
Serializing polymorphic types
Since I said you should avoid pointers in serializable types and you should use array indexes instead, polymorphism just cannot work, because it requires pointers.
You need to work this around with type tags and unions.
Versioning
On top of all the above. You might want different versions of the software interoperate.
In this case each object should write a version number at the beginning of their serialization to indicate version.
When loading up the object at the other side the, newer objects maybe able to handle the older representations but the older ones cannot handle the newer so they should throw an exception about this.
Each time a something changes, you should bump the version number.
So to wrap this up, serialization can be complex. But fortunately you don't need to serialize everything in your program, most often only the protocol messages are serialized, which are often plain old structs. So you don't need the complex tricks I mentioned above too often.
In some cases, when dealing with simple types, you can do:
object o;
socket.write(&o, sizeof(o));
That's ok as a proof-of-concept or first-draft, so other members of your team can keep working on other parts.
But sooner or later, usually sooner, this will get you hurt!
You run into issues with:
Virtual pointer tables will be corrupted.
Pointers (to data/members/functions) will be corrupted.
Differences in padding/alignment on different machines.
Big/Little-Endian byte ordering issues.
Variations in the implementation of float/double.
(Plus you need to know what you are unpacking into on the receiving side.)
You can improve upon this by developing your own marshalling/unmarshalling methods for every class. (Ideally virtual, so they can be extended in subclasses.) A few simple macros will let you to write out different basic types quite quickly in a big/little-endian-neutral order.
But that sort of grunt work is much better, and more easily, handled via boost's serialization library.
By way of learning I wrote a simple C++11 serializer. I had tried various of
the other more heavyweight offerings, but wanted something that I could actually
understand when it went wrong or failed to compile with the latest g++ (which
happened for me with Cereal; a really nice library but complex and I could not
grok the errors the compiler threw up on upgrade.) Anyway, it's header only
and handles POD types, containers, maps etc... No versioning and it will only
load files from the same arch it was saved in.
https://github.com/goblinhack/simple-c-plus-plus-serializer
Example usage:
#include "c_plus_plus_serializer.h"
static void serialize (std::ofstream out)
{
char a = 42;
unsigned short b = 65535;
int c = 123456;
float d = std::numeric_limits<float>::max();
double e = std::numeric_limits<double>::max();
std::string f("hello");
out << bits(a) << bits(b) << bits(c) << bits(d);
out << bits(e) << bits(f);
}
static void deserialize (std::ifstream in)
{
char a;
unsigned short b;
int c;
float d;
double e;
std::string f;
in >> bits(a) >> bits(b) >> bits(c) >> bits(d);
in >> bits(e) >> bits(f);
}