C++ program crashes on exit (binary files) [duplicate] - c++

I have a small hierarchy of objects that I need to serialize and transmit via a socket connection. I need to both serialize the object, then deserialize it based on what type it is. Is there an easy way to do this in C++ (as there is in Java)?
Just to be clear, I'm looking for methods on converting an object into an array of bytes, then back into an object. I can handle the socket transmission.

Talking about serialization, the boost serialization API comes to my mind. As for transmitting the serialized data over the net, I'd either use Berkeley sockets or the asio library.
If you want to serialize your objects to a byte array, you can use the boost serializer in the following way (taken from the tutorial site):
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
class gps_position
{
private:
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & degrees;
ar & minutes;
ar & seconds;
}
int degrees;
int minutes;
float seconds;
public:
gps_position(){};
gps_position(int d, int m, float s) :
degrees(d), minutes(m), seconds(s)
{}
};
Actual serialization is then pretty easy:
#include <fstream>
std::ofstream ofs("filename.dat", std::ios::binary);
// create class instance
const gps_position g(35, 59, 24.567f);
// save data to archive
{
boost::archive::binary_oarchive oa(ofs);
// write class instance to archive
oa << g;
// archive and stream closed when destructors are called
}
Deserialization works in an analogous manner.
There are also mechanisms which let you handle serialization of pointers (complex data structures like tress etc are no problem), derived classes and you can choose between binary and text serialization. Besides all STL containers are supported out of the box.

There is a generic pattern you can use to serialize objects. The fundemental primitive is these two functions you can read and write from iterators:
template <class OutputCharIterator>
void putByte(char byte, OutputCharIterator &&it)
{
*it = byte;
++it;
}
template <class InputCharIterator>
char getByte(InputCharIterator &&it, InputCharIterator &&end)
{
if (it == end)
{
throw std::runtime_error{"Unexpected end of stream."};
}
char byte = *it;
++it;
return byte;
}
Then serialization and deserialization functions follow the pattern:
template <class OutputCharIterator>
void serialize(const YourType &obj, OutputCharIterator &&it)
{
// Call putbyte or other serialize overloads.
}
template <class InputCharIterator>
void deserialize(YourType &obj, InputCharIterator &&it, InputCharIterator &&end)
{
// Call getByte or other deserialize overloads.
}
For classes you can use the friend function pattern to allow the overload to be found using ADL:
class Foo
{
int internal1, internal2;
// So it can be found using ADL and it accesses private parts.
template <class OutputCharIterator>
friend void serialize(const Foo &obj, OutputCharIterator &&it)
{
// Call putByte or other serialize overloads.
}
// Deserialize similar.
};
Then in your program you can serialize and object into a file like this:
std::ofstream file("savestate.bin");
serialize(yourObject, std::ostreambuf_iterator<char>(file));
Then read:
std::ifstream file("savestate.bin");
deserialize(yourObject, std::istreamBuf_iterator<char>(file), std::istreamBuf_iterator<char>());
My old answer here:
Serialization means turning your object into binary data. While deserialization means recreating an object from the data.
When serializing you are pushing bytes into an uint8_t vector.
When unserializing you are reading bytes from an uint8_t vector.
There are certainly patterns you can employ when serializing stuff.
Each serializable class should have a serialize(std::vector<uint8_t> &binaryData) or similar signatured function that will write its binary representation into the provided vector. Then this function may pass this vector down to it's member's serializing functions so they can write their stuff into it too.
Since the data representation can be different on different architectures.
You need to find out a scheme how to represent the data.
Let's start from the basics:
Serializing integer data
Just write the bytes in little endian order. Or use varint representation if size matters.
Serialization in little endian order:
data.push_back(integer32 & 0xFF);
data.push_back((integer32 >> 8) & 0xFF);
data.push_back((integer32 >> 16) & 0xFF);
data.push_back((integer32 >> 24) & 0xFF);
Deserialization from little endian order:
integer32 = data[0] | (data[1] << 8) | (data[2] << 16) | (data[3] << 24);
Serializing floating point data
As far as I know the IEEE 754 has a monopoly here. I don't know of any mainstream architecture that would use something else for floats. The only thing that can be different is the byte order. Some architectures use little endian, others use big endian byte order. This means you need to be careful which order to you loud up the bytes on the receiving end. Another difference can be handling of the denormal and infinity and NAN values. But as long as you avoid these values you should be OK.
Serialization:
uint8_t mem[8];
memcpy(mem, doubleValue, 8);
data.push_back(mem[0]);
data.push_back(mem[1]);
...
Deserialization is doing it backward. Mind the byte order of your architecture!
Serializing strings
First you need to agree on an encoding. UTF-8 is common. Then store it as a length prefixed manner: first you store the length of the string using a method I mentioned above, then write the string byte-by-byte.
Serializing arrays.
They are the same as a strings. You first serialize an integer representing the size of the array then serialize each object in it.
Serializing whole objects
As I said before they should have a serialize method that add content to a vector.
To unserialize an object, it should have a constructor that takes byte stream. It can be an istream but in the simplest case it can be just a reference uint8_t pointer. The constructor reads the bytes it wants from the stream and sets up the fields in the object.
If the system is well designed and serialize the fields in object field order, you can just pass the stream to the field's constructors in an initializer list and have them deserialized in the right order.
Serializing object graphs
First you need to make sure if these objects are really something you want to serialize. You don't need to serialize them if instances of these objects present on the destination.
Now you found out you need to serialize that object pointed by a pointer.
The problem of pointers that they are valid only the in the program that uses them. You cannot serialize pointer, you should stop using them in objects. Instead create object pools.
This object pool is basically a dynamic array which contains "boxes". These boxes have a reference count. Non-zero reference count indicates a live object, zero indicates an empty slot. Then you create smart pointer akin to the shared_ptr that doesn't store the pointer to the object, but the index in the array. You also need to agree on an index that denotes the null pointer, eg. -1.
Basically what we did here is replaced the pointers with array indexes.
Now when serializing you can serialize this array index as usual. You don't need to worry about where does the object will be in memory on the destination system. Just make sure they have the same object pool too.
So we need to serialize the object pools. But which ones? Well when you serialize an object graph you are not serializing just an object, you are serializing an entire system. This means the serialization of the system shouldn't start from parts of the system. Those objects shouldn't worry about the rest of the system, they only need to serialize the array indexes and that's it. You should have a system serializer routine that orchestrates the serialization of the system and walks through the relevant object pools and serialize all of them.
On the receiving end all the arrays an the objects within are deserialized, recreating the desired object graph.
Serializing function pointers
Don't store pointers in the object. Have a static array which contains the pointers to these functions and store the index in the object.
Since both programs have this table compiled into themshelves, using just the index should work.
Serializing polymorphic types
Since I said you should avoid pointers in serializable types and you should use array indexes instead, polymorphism just cannot work, because it requires pointers.
You need to work this around with type tags and unions.
Versioning
On top of all the above. You might want different versions of the software interoperate.
In this case each object should write a version number at the beginning of their serialization to indicate version.
When loading up the object at the other side the, newer objects maybe able to handle the older representations but the older ones cannot handle the newer so they should throw an exception about this.
Each time a something changes, you should bump the version number.
So to wrap this up, serialization can be complex. But fortunately you don't need to serialize everything in your program, most often only the protocol messages are serialized, which are often plain old structs. So you don't need the complex tricks I mentioned above too often.

In some cases, when dealing with simple types, you can do:
object o;
socket.write(&o, sizeof(o));
That's ok as a proof-of-concept or first-draft, so other members of your team can keep working on other parts.
But sooner or later, usually sooner, this will get you hurt!
You run into issues with:
Virtual pointer tables will be corrupted.
Pointers (to data/members/functions) will be corrupted.
Differences in padding/alignment on different machines.
Big/Little-Endian byte ordering issues.
Variations in the implementation of float/double.
(Plus you need to know what you are unpacking into on the receiving side.)
You can improve upon this by developing your own marshalling/unmarshalling methods for every class. (Ideally virtual, so they can be extended in subclasses.) A few simple macros will let you to write out different basic types quite quickly in a big/little-endian-neutral order.
But that sort of grunt work is much better, and more easily, handled via boost's serialization library.

By way of learning I wrote a simple C++11 serializer. I had tried various of
the other more heavyweight offerings, but wanted something that I could actually
understand when it went wrong or failed to compile with the latest g++ (which
happened for me with Cereal; a really nice library but complex and I could not
grok the errors the compiler threw up on upgrade.) Anyway, it's header only
and handles POD types, containers, maps etc... No versioning and it will only
load files from the same arch it was saved in.
https://github.com/goblinhack/simple-c-plus-plus-serializer
Example usage:
#include "c_plus_plus_serializer.h"
static void serialize (std::ofstream out)
{
char a = 42;
unsigned short b = 65535;
int c = 123456;
float d = std::numeric_limits<float>::max();
double e = std::numeric_limits<double>::max();
std::string f("hello");
out << bits(a) << bits(b) << bits(c) << bits(d);
out << bits(e) << bits(f);
}
static void deserialize (std::ifstream in)
{
char a;
unsigned short b;
int c;
float d;
double e;
std::string f;
in >> bits(a) >> bits(b) >> bits(c) >> bits(d);
in >> bits(e) >> bits(f);
}

Related

Binary Serialization of multiple data types

I'm trying to serialize a struct in C++ in Visual Studio with multiple data types into binary file and de-serialize them all at once. But facing problem with memory allocation with strings while reading the data back. I know what the problem is and that there are other open source libraries which I can use but I do not want to use them unless it is really necessary and I also know that I can write/read data types one by one but that method is too long for struct containing large number of data types. I want to perform write/read operations in one go without using any open source library.
Here is a struct for example:
struct Frame {
bool isPass{ true };
uint64_t address{ 0 };
uint32_t age{ 0 };
float marks{ 0.0 };
std::string userName;
};
is there any way to perform write/read operation in one go in binary format?
Thankyou
Not using existing libraries is NEVER good. Still,...
You could, for example, create a create pure virtual class like
class Serializable
{
public:
virtual std::vector<char> serialize() = 0;
}
Then:
You implement it for all your own classes that you have
You implement serialization methods for all STL and PoD types that you use (std::strings, PoD types and structs with only PoD types) inside of some static class. Basically, during serialization you can put there something like [size][type][data ~ [size][type][data][size][type][data]].
Then, when you process a class for serialization, you create a buffer, first put a size into it, then type identifier, then put all bytes from all members serialized by those you have implemented in 1) and 2)
When you read anything from such an array, you do the same backwards: read N bytes from an array (first field), determine its actual type (second field), read all members, deserialize all stuff included.
The process is recursive.
But... man, its really a bad idea. Use protobuf, or boost::serialization. Or anything else - there's a lot of serialization libraries on the internet. Read these precious comments under your question. People are right.
Assuming you have some other mechanism to keep track of your frame size rewrite your struct as:
struct Frame {
bool isPass{ true };
uint8_t pad1[3]{};
uint32_t age{ 0 };
uint64_t address{ 0 };
double marks{ 0.0 };
char userName[];
};
If we have a pointer Frame* frame. We can write this using write(fd, frame, frame_size). (frame_size > sizeof(frame)).
Assuming you have read the frame into a buffer, you can access the data using:
auto frame = reinterpret<const Frame*>(buf) The length of userName will therefore be frame_size - sizeof(Frame). You can now access the elements through your struct.
The is very C like and the approach is limited to only one variable length element at the end of the array.

Serialization of objects to files - is dumping char* back-and-forth safe?

I would like to write an object to a file, and later be able to read it from the file. It should work on different machines.
One simple option is the following:
struct OBJECT{ // The object to be serialized / deserialized
public:
// Members are serialized / deserialized in the order they are declared. Can use bitpacking as well.
DATATYPE member1;
DATATYPE member2;
DATATYPE member3;
DATATYPE member4;
};
void write(const std::string& file_name, OBJECT& data) // Writes the given OBJECT data to the given file name.
{
std::ofstream out;
out.open(file_name,std::ios::binary);
out.write(reinterpret_cast<char*>(&data), sizeof(OBJECT));
out.close();
};
void read(const std::string& file_name, OBJECT& data) // Reads the given file and assigns the data to the given OBJECT.
{
std::ifstream in;
in.open(file_name,std::ios::binary);
in.read(reinterpret_cast<char*>(&data), sizeof(OBJECT));
in.close();
};
For serialization, this approach casts the struct to a char*, and then just writes it to the file according to its sizeof. For deserialization, the opposite is performed.
Please consider the following situation: we run the program on Machine 1 and save an object to a File 1. Later, we copy File 1 to Machine 2 and run the program (which might have been compiled with a different compiler) on Machine 2. We want to be able to read the data from the file.
Is this approach safe? Or is it better to "manually" read individual pieces from the file and copy them into the resulting struct, and vice versa?
This approach is generally safe only if all of the following are true:
The type contains only "inline" data. This means only fundamental types (no pointers or references) and arrays thereof.
Nested objects that also meet these criteria are also OK.
All of the types have the same representation on all systems/compilers where the program will run.
The endian-ness of the system must match.
The size of each type must match.
Any padding is in the same location and of the same size.
However, there are still some drawbacks. One of the most severe is that you are locking yourself in to exactly this structure, without some kind of version mechanism. Other data interchange formats (XML, JSON, etc.), while more verbose, self-document their structure and are significantly more future-proof.

Change endianness of entire struct in C++

I am writing a parser in C++ to parse a well defined binary file. I have declared all the required structs. And since only particular fields are of interest to me, so in my structs I have skipped non-required fields by creating char array of size equal to skipped bytes. So I am just reading the file in char array and casting the char pointer to my struct pointer. Now problem is that all data fields in that binary are in big endian order, so after typecasting I need to change the endianness of all the struct fields. One way is to do it manually for each and every field. But there are various structs with many fields, so it'll be very cumbersome to do it manually. So what's the best way to achieve this. And since I'll be parsing very huge such files (say in TB's), so I require a fast way to do this.
EDIT : I have use attribute(packed) so no need to worry about padding.
If you can do misaligned accesses with no penalty, and you don't mind compiler- or platform-specific tricks to control padding, this can work. (I assume you are OK with this since you mention __attribute__((packed))).
In this case the nicest approach is to write value wrappers for your raw data types, and use those instead of the raw type when declaring your struct in the first place. Remember the value wrapper must be trivial/POD-like for this to work. If you have a POSIX platform you can use ntohs/ntohl for the endian conversion, it's likely to be better optimized that whatever you write yourself.
If misaligned accesses are illegal or slow on your platform, you need to deserialize instead. Since we don't have reflection yet, you can do this with the same value wrappers (plus an Ignore<N> placeholder that skips N bytes for fields you're not interested), and declare them in a tuple instead of a struct - you can iterate over the members in a tuple and tell each to deserialize itself from the message.
One way to do that is combine C preprocessor with C++ operators. Write a couple of C++ classes like this one:
#include "immintrin.h"
class FlippedInt32
{
int value;
public:
inline operator int() const
{
return _bswap( value );
}
};
class FlippedInt64
{
__int64 value;
public:
inline operator __int64() const
{
return _bswap64( value );
}
};
Then,
#define int FlippedInt32
before including the header that define these structures. #undef immediately after the #include.
This will replace all int fields in the structures with FlippedInt32, which has the same size but returns flipped bytes.
If it’s your own structures which you can modify you don’t need the preprocessor part. Just replace the integers with the byte-flipping classes.
If you can come up with a list of offsets (in-bytes, relative to the top of the file) of the fields that need endian-conversion, as well as the size of those fields, then you could do all of the endian-conversion with a single for-loop, directly on the char array. E.g. something like this (pseudocode):
struct EndianRecord {
size_t offsetFromTop;
size_t fieldSizeInByes;
};
std::vector<EndianRecord> todoList;
// [populate the todo list here...]
char * rawData = [pointer to the raw data]
for (size_t i=0; i<todoList.size(); i++)
{
const EndianRecord & er = todoList[i];
ByteSwap(&rawData[er.offsetFromTop], er.fieldSizeBytes);
}
struct MyPackedStruct * data = (struct MyPackedStruct *) rawData;
// Now you can just read the member variables
// as usual because you know they are already
// in the correct endian-format.
... of course the difficult part is coming up with the correct todoList, but since the file format is well-defined, it should be possible to generate it algorithmically (or better yet, create it as a generator with e.g. a GetNextEndianRecord() method that you can call, so that you don't have to store a very large vector in memory)

Vector of Pointers C++ Boost Serialization Error

namespace featurelib
{
typedef vector<float*> HOG_List;
}
void saveFeaturesFile(vector<featurelib::HOG_List> &features, string filename){
ofstream out(filename.c_str());
stringstream ss;
boost::archive::binary_oarchive oa(ss);
oa << features;
out << ss.str();
out.close();
}
This is my code snippet and I am trying to save vector<featurelib::HOG_List> into a binary file using Boost Serialization.
But this results in an error:
error C4308: negative integral constant converted to unsigned type
But when I remove the pointer i.e
typedef vector<float> HOG_List;
The code compiles and works fine.
However I need to find a way to save vectors of typedef vector<float*> HOG_List; to binary files using Boost
Why would you ever have a vector of float*?
Besides that, pointers are special to serialization because of object tracking: http://www.boost.org/doc/libs/1_60_0/libs/serialization/doc/special.html#objecttracking and potential polymorphism/aliasing.
Because float is a primitive type polymorphism is not a concern here. However, for the same reason object tracking would be unwieldy and has been disabled:
By default, data types designated primitive by the Implementation Level class serialization trait are never tracked. If it is desired to track a shared primitive object through a pointer (e.g. a long used as a reference count), It should be wrapped in a class/struct so that it is an identifiable type. The alternative of changing the implementation level of a long would affect all longs serialized in the whole program - probably not what one would intend.
This, in turn, means that the serialization library doesn't know how to deal with float*
I'd suggest you use a ptr_vector or really think why you need the pointers.
What do the pointers mean? Replace them with an abstraction that captures the semantics. E.g. if the float* is /really/ the address of the first element in array like float[32] then redefine your HOG_List as being e.g.
typedef std::vector<std::vector<float> > HOG_List;
If you have to cater for "unowned" arrays, you could make do with something like GLS'd array_view
struct float_array_view {
float* base;
size_t n;
};
typedef std::vector<float_array_view> HOG_List;
NOTE that in such a case you will have to define a way to deserialize it (serialize using boost::serialization::make_array). Because there's no way the library could know how to allocate the base*.
See also:
Boost serialization with template classes
Can't get C++ Boost Pointer Serialization to work (this does the manual work to allocate for deserialization)
Boost binary serialization - double array of fixed length error
There are more potentially relevant posts here: https://stackoverflow.com/search?q=user%3A85371+boost+make_array

How do you serialize an object in C++?

I have a small hierarchy of objects that I need to serialize and transmit via a socket connection. I need to both serialize the object, then deserialize it based on what type it is. Is there an easy way to do this in C++ (as there is in Java)?
Just to be clear, I'm looking for methods on converting an object into an array of bytes, then back into an object. I can handle the socket transmission.
Talking about serialization, the boost serialization API comes to my mind. As for transmitting the serialized data over the net, I'd either use Berkeley sockets or the asio library.
If you want to serialize your objects to a byte array, you can use the boost serializer in the following way (taken from the tutorial site):
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
class gps_position
{
private:
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & degrees;
ar & minutes;
ar & seconds;
}
int degrees;
int minutes;
float seconds;
public:
gps_position(){};
gps_position(int d, int m, float s) :
degrees(d), minutes(m), seconds(s)
{}
};
Actual serialization is then pretty easy:
#include <fstream>
std::ofstream ofs("filename.dat", std::ios::binary);
// create class instance
const gps_position g(35, 59, 24.567f);
// save data to archive
{
boost::archive::binary_oarchive oa(ofs);
// write class instance to archive
oa << g;
// archive and stream closed when destructors are called
}
Deserialization works in an analogous manner.
There are also mechanisms which let you handle serialization of pointers (complex data structures like tress etc are no problem), derived classes and you can choose between binary and text serialization. Besides all STL containers are supported out of the box.
There is a generic pattern you can use to serialize objects. The fundemental primitive is these two functions you can read and write from iterators:
template <class OutputCharIterator>
void putByte(char byte, OutputCharIterator &&it)
{
*it = byte;
++it;
}
template <class InputCharIterator>
char getByte(InputCharIterator &&it, InputCharIterator &&end)
{
if (it == end)
{
throw std::runtime_error{"Unexpected end of stream."};
}
char byte = *it;
++it;
return byte;
}
Then serialization and deserialization functions follow the pattern:
template <class OutputCharIterator>
void serialize(const YourType &obj, OutputCharIterator &&it)
{
// Call putbyte or other serialize overloads.
}
template <class InputCharIterator>
void deserialize(YourType &obj, InputCharIterator &&it, InputCharIterator &&end)
{
// Call getByte or other deserialize overloads.
}
For classes you can use the friend function pattern to allow the overload to be found using ADL:
class Foo
{
int internal1, internal2;
// So it can be found using ADL and it accesses private parts.
template <class OutputCharIterator>
friend void serialize(const Foo &obj, OutputCharIterator &&it)
{
// Call putByte or other serialize overloads.
}
// Deserialize similar.
};
Then in your program you can serialize and object into a file like this:
std::ofstream file("savestate.bin");
serialize(yourObject, std::ostreambuf_iterator<char>(file));
Then read:
std::ifstream file("savestate.bin");
deserialize(yourObject, std::istreamBuf_iterator<char>(file), std::istreamBuf_iterator<char>());
My old answer here:
Serialization means turning your object into binary data. While deserialization means recreating an object from the data.
When serializing you are pushing bytes into an uint8_t vector.
When unserializing you are reading bytes from an uint8_t vector.
There are certainly patterns you can employ when serializing stuff.
Each serializable class should have a serialize(std::vector<uint8_t> &binaryData) or similar signatured function that will write its binary representation into the provided vector. Then this function may pass this vector down to it's member's serializing functions so they can write their stuff into it too.
Since the data representation can be different on different architectures.
You need to find out a scheme how to represent the data.
Let's start from the basics:
Serializing integer data
Just write the bytes in little endian order. Or use varint representation if size matters.
Serialization in little endian order:
data.push_back(integer32 & 0xFF);
data.push_back((integer32 >> 8) & 0xFF);
data.push_back((integer32 >> 16) & 0xFF);
data.push_back((integer32 >> 24) & 0xFF);
Deserialization from little endian order:
integer32 = data[0] | (data[1] << 8) | (data[2] << 16) | (data[3] << 24);
Serializing floating point data
As far as I know the IEEE 754 has a monopoly here. I don't know of any mainstream architecture that would use something else for floats. The only thing that can be different is the byte order. Some architectures use little endian, others use big endian byte order. This means you need to be careful which order to you loud up the bytes on the receiving end. Another difference can be handling of the denormal and infinity and NAN values. But as long as you avoid these values you should be OK.
Serialization:
uint8_t mem[8];
memcpy(mem, doubleValue, 8);
data.push_back(mem[0]);
data.push_back(mem[1]);
...
Deserialization is doing it backward. Mind the byte order of your architecture!
Serializing strings
First you need to agree on an encoding. UTF-8 is common. Then store it as a length prefixed manner: first you store the length of the string using a method I mentioned above, then write the string byte-by-byte.
Serializing arrays.
They are the same as a strings. You first serialize an integer representing the size of the array then serialize each object in it.
Serializing whole objects
As I said before they should have a serialize method that add content to a vector.
To unserialize an object, it should have a constructor that takes byte stream. It can be an istream but in the simplest case it can be just a reference uint8_t pointer. The constructor reads the bytes it wants from the stream and sets up the fields in the object.
If the system is well designed and serialize the fields in object field order, you can just pass the stream to the field's constructors in an initializer list and have them deserialized in the right order.
Serializing object graphs
First you need to make sure if these objects are really something you want to serialize. You don't need to serialize them if instances of these objects present on the destination.
Now you found out you need to serialize that object pointed by a pointer.
The problem of pointers that they are valid only the in the program that uses them. You cannot serialize pointer, you should stop using them in objects. Instead create object pools.
This object pool is basically a dynamic array which contains "boxes". These boxes have a reference count. Non-zero reference count indicates a live object, zero indicates an empty slot. Then you create smart pointer akin to the shared_ptr that doesn't store the pointer to the object, but the index in the array. You also need to agree on an index that denotes the null pointer, eg. -1.
Basically what we did here is replaced the pointers with array indexes.
Now when serializing you can serialize this array index as usual. You don't need to worry about where does the object will be in memory on the destination system. Just make sure they have the same object pool too.
So we need to serialize the object pools. But which ones? Well when you serialize an object graph you are not serializing just an object, you are serializing an entire system. This means the serialization of the system shouldn't start from parts of the system. Those objects shouldn't worry about the rest of the system, they only need to serialize the array indexes and that's it. You should have a system serializer routine that orchestrates the serialization of the system and walks through the relevant object pools and serialize all of them.
On the receiving end all the arrays an the objects within are deserialized, recreating the desired object graph.
Serializing function pointers
Don't store pointers in the object. Have a static array which contains the pointers to these functions and store the index in the object.
Since both programs have this table compiled into themshelves, using just the index should work.
Serializing polymorphic types
Since I said you should avoid pointers in serializable types and you should use array indexes instead, polymorphism just cannot work, because it requires pointers.
You need to work this around with type tags and unions.
Versioning
On top of all the above. You might want different versions of the software interoperate.
In this case each object should write a version number at the beginning of their serialization to indicate version.
When loading up the object at the other side the, newer objects maybe able to handle the older representations but the older ones cannot handle the newer so they should throw an exception about this.
Each time a something changes, you should bump the version number.
So to wrap this up, serialization can be complex. But fortunately you don't need to serialize everything in your program, most often only the protocol messages are serialized, which are often plain old structs. So you don't need the complex tricks I mentioned above too often.
In some cases, when dealing with simple types, you can do:
object o;
socket.write(&o, sizeof(o));
That's ok as a proof-of-concept or first-draft, so other members of your team can keep working on other parts.
But sooner or later, usually sooner, this will get you hurt!
You run into issues with:
Virtual pointer tables will be corrupted.
Pointers (to data/members/functions) will be corrupted.
Differences in padding/alignment on different machines.
Big/Little-Endian byte ordering issues.
Variations in the implementation of float/double.
(Plus you need to know what you are unpacking into on the receiving side.)
You can improve upon this by developing your own marshalling/unmarshalling methods for every class. (Ideally virtual, so they can be extended in subclasses.) A few simple macros will let you to write out different basic types quite quickly in a big/little-endian-neutral order.
But that sort of grunt work is much better, and more easily, handled via boost's serialization library.
By way of learning I wrote a simple C++11 serializer. I had tried various of
the other more heavyweight offerings, but wanted something that I could actually
understand when it went wrong or failed to compile with the latest g++ (which
happened for me with Cereal; a really nice library but complex and I could not
grok the errors the compiler threw up on upgrade.) Anyway, it's header only
and handles POD types, containers, maps etc... No versioning and it will only
load files from the same arch it was saved in.
https://github.com/goblinhack/simple-c-plus-plus-serializer
Example usage:
#include "c_plus_plus_serializer.h"
static void serialize (std::ofstream out)
{
char a = 42;
unsigned short b = 65535;
int c = 123456;
float d = std::numeric_limits<float>::max();
double e = std::numeric_limits<double>::max();
std::string f("hello");
out << bits(a) << bits(b) << bits(c) << bits(d);
out << bits(e) << bits(f);
}
static void deserialize (std::ifstream in)
{
char a;
unsigned short b;
int c;
float d;
double e;
std::string f;
in >> bits(a) >> bits(b) >> bits(c) >> bits(d);
in >> bits(e) >> bits(f);
}