Serialization in component based game engine - c++

I'm implementing serialization in my component based game engine, to enable saving and loading in my game. I'm using Cereal to help me with serialization. However, two things are unclear to me:
I have a lot of components, and those components also contain classes, etc. Do I need to write serialization functions for all of them? That would mean that I have to write about 100 serialization functions. Most of them would be the same (just serialize all member variables). Is there a way to reduce the amount of work?
What do I do if I want to serialize a class containing classes from another codebase? For example, I'm using SDL and TinyXml. Would that mean that I have to write serialization functions in those codebases?
I hope I can prevent the grunt work of adding all those serialization functions.

Unfortunately there is no magic. Whatever serialization library you use, be it boost::serialization, s11n or MFC, the problem is that you always have to declare for each individual class how to serialize itself.
This is inherent to the fact that there is no metadata available on the members of classes, that could permit to automate serialization of complex classes on the base their member's type.
The only way round this problem is adopt classes designed on purpose to address dynamic self referencing. But this might be at a cost in terms of performance, or with an overhead in construction instead of archiving. Arpproaches could be a combination of:
archiving aware base classes.
map or containers of properties instead of local hard-coded variables.
eventually use self archiving base types for each serializable member, and at construction of an object register them in a kind of archivwing worklist.
Another way round would be to design a code generator that could run on your headers, and generate mecanically the selialization code. But this is already an ambitious project by itself.
A last thought: all this manual archiving code is for sure an overhead. However, it allows you to handle evlolution of your object structure, for example if a newer version of your code adds or removes some members and has to deserialize a file written with an older version. THis is something that could not easily be achieved with an automated approach.

Related

C++ compile-time / runtime options and parameters, how to handle?

What is the proper way to handle compile-time and runtime options in a generic library? What is good practice for large software in which there are simply too many options for the user to bother about most of them?
Suppose the task is to write a large library to perform calculations on a number of datasets at the same time. There are numerous ways to perform these calculations, and the library must be be highly configurable. Typically, there are options relative to how the calculation is performed as a whole. Then, each dataset has its own set of calculation options. Finally, each calculation has a number of tuning parameters, which must be set as well.
The library itself is generic, but each application which uses that library will use a particular kind of dataset, for which tuning parameters will take on a certain value. Since they will not change throughout the life of the application, I make them known at application compile-time. The way I would implement these tuning parameters in the library is through a Traits class, which contains the tuning parameters as static const elements. Calibration of their final value is part of the development of the application.
The datasets will of course change depending on what the user feeds to the application, and therefore a number of runtime options must be provided as well (with intelligent defaults). Calibration of their default value is also part of the development of the application. I would implement these options as a Config class which contains these options, and can be changed on application startup (e.g. parsing a config text file). It gets passed to the constructor of a lot of the classes in the library. Each class then calls the Config::get_x for their specific option x.
The thing I don't really like about this design, is that both Traits and Config classes break encapsulation. Some options relate to some parts of the library. Most of the time, however, they don't. And having them suddenly next to each other annoys me, because they affect separate things in the code, which are often in different abstraction layers.
One solution I was thinking about, is using multiple public inheritance for these different parts. A class which needs to know an option then casts the Config object or calls the relevant Trait parent to access it. Also, this passing along of Config to every class that needs it (or whose members need it) is very inelegant. Maybe Config should be a singleton?
You could have your parameters in a single struct named Config (to keep your words) and make it a singleton.
Encapsulation is important to preserve classes consistency, because a class is responsible of itself. But in your case where the Config class must be accessible to everyone, it is necessary. Furthermore, adding getters and setters to this type of class will only add overhead (in the best case you compiler will probably just inlined it).
Also, if you really want a Traits class to implement compile time parameters, you should probably just have an initialization function (like the constructor of your library).

What is the best way to go about designing a persistence model for many classes in C++ while also ensuring SOLID is followed?

I've scoured the web for specific design information on how to effectively design a persistence model in for C++ classes and I've come up short. So I decided to ask it here.
Let's say I have 4 classes:
Class A
Class B
Class C
Class Persist
I want to persist classes A, B, and C to disk using the class "Persist" such that one file contains the config info for each class:
my_module.json
{
A {
...
},
B {
...
},
C {
...
}
}
My question is, what's the best approach to design this such that SOLID principles are followed?
For example, the single responsibility principle suggests that a class should have only one responsibility (or only one reason to change) and the Law of Demeter suggests that classes know as little of each other as possible.
So then:
1) Should a class know how to serialize itself or would that already violate single responsibility?
2) If I use a third party library like "cereal" to serialize "Class A", I will need to add tags to the internal members of "Class A" to show the engine how it should serialize. This increases coupling between "Class A" and the third party engine, which has to be bad right?
3) Should I instead use intermediate classes that translate the object from "Class A" into an object with proper tags that a third party library will understand? This removes any knowledge of serialization from "Class A", but it adds more complexity and more classes.
4) Should the class be aware of the persistence module? Is this realistic when trying to serialize? If so, how should the class notify the persistence module that state has changed and it's time to persist? Or should the persistence module simply poll the objects periodically for fresh configuration?
I'm fairly new to OO but deeply interested. Specifics on interactions between classes will be very helpful. Thanks.
Generally in software engineering, decisions aren’t about principles, they’re about tradeoffs.
1 — it violates single responsibility, but in C++ you don’t have much choice. The language doesn’t natively supports runtime reflection, nor virtual constructors / class factories. Without these two features, it’s very hard to implement objects serialization without objects serializing themselves. By “very hard” I mean it requires external tooling, or lots of macros / templates, IMO very expensive to support long-term.
2 — it increases coupling, adds a lot of complexity (you’ll have to support any 3rd party library you’re using), but if you need to support multiple different serialization formats at once, e.g. both JSON, XML and binary, it could still be a good thing.
3 — depends but I would say “no”. The good format for intermediate representation is something DOM tree-like, and you’ll waste too much CPU time building that, because too many small RAM allocations and pointer chasing. If for some reason you want to design and support well-defined intermediate representation, I’d structured it as a pair of event-driven interfaces (one reader another writer), see SAX for inspiration.
4 — Most serialization libraries are designed so that save/load is a one-time process. The reason for that is most modern generic serialization formats (XML, JSON, most binary formats) don’t support updates. You’ll have to write out a complete file anyway. Notifications only bring value when the storage format supports partial updates, e.g. if you’re saving/loading in an [embedded] database, not just in a JSON file. The common practice is, call save manually from the outside, every time you’ve finished modifying your objects. With automatic saves it’s easy to waste too much IO bandwidth and CPU time writing objects in some inconsistent intermediate states.

(C++) serialize object of class from external library

Is there any way to serialize (or, more generally to save to file) an object from a class that I can't modify?
All of the serialization approaches that I've found so far require some manner of intrusion, at the very least adding methods to the class, adding friends etc.
If the object is from a class in a library that I didn't write myself, I don't want to have to change the classes in that library.
More concretely: I'm developing a module-based application which also involves inter-module communication (which is as simple as: one module, the source, is the owner of some data, and another module, the destination, has a pointer to that data, and there's a simple updating/hand shaking protocol between the modules). I need to be able to save the state of the modules (which data was last communicated between the modules). Now, it would be ideal if this communication could handle any type, including ones from external libraries. So I need to save this data when saving the module state.
In general, there is no native serialization in C++, at least according to the wikipedia article on the topic of serialization.
There's a good FAQ resource on this topic here that might be useful if you haven't found it already. This is just general info on serialization, though; it doesn't address the specific problem you raise.
There are a couple of (very) ugly approaches I can think of. You've likely already thought of and rejected these, but I'm throwing them out there just in case:
Do you have access to all the data members of the class? Worst case, you could read those out into a c-style structure, apply your compiler's version of "packed" to keep the sizes of things predictable, and get a transferable binary blob that way.
You could cast a pointer to the object as a pointer to uint8_t, and treat the whole thing like an array. This will get messy if there are references, pointers, or a vtable in there. This approach might work for POD objects, though.

Script to binary conversion & Object serialization

All the code I write (C++ or AS3) is heavily scripted (JSON or XML). My problem is that parsing can be very slow at times, especially with less powerful devices like mobiles.
Here is an example of a Flash script of mine:
<players class="fanlib.gfx.TSprite" vars="x=0|y=-50|visible=Bool:true">
<player0 class="fanlib.gfx.TSprite" vars="x=131|y=138">
<name class="fanlib.text.TTextField" format="Myriad Pro,18,0xffffff,true,,,,,center,,,,0" alignX="center" alignY="bottom" filters="DropShadow,2" vars="background=Bool:false|backgroundColor=0|embedFonts=Bool:true|multiline=Bool:false|mouseEnabled=Bool:false|autoSize=center|text=fae skata|y=-40"/>
<avatar class="fanlib.gfx.FBitmap" alignX="center" alignY="center" image="userDefault.png"/>
<chip class="fanlib.gfx.FBitmap" alignX="center" alignY="center" image="chip1.png" vars="x=87|y=68"/>
<info class="fanlib.text.TTextField" format="Myriad Pro,18,0xffffff,true,,,,,center,,,,0" alignX="center" alignY="top" filters="DropShadow,2" css=".win {color: #40ff40}" vars="y=40|background=Bool:false|backgroundColor=0|embedFonts=Bool:true|multiline=Bool:false|mouseEnabled=Bool:false|autoSize=center"/>
</player0>
<player1 class="Copy:player0" vars="x=430|y=70">
<chip class="Child:chip" image="chip2.png" vars="x=-82|y=102"/>
</player1>
<player2 class="Copy:player0" vars="x=778|y=70">
<chip class="Child:chip" image="chip3.png" vars="x=88|y=103"/>
</player2>
<player3 class="Copy:player0" vars="x=1088|y=137">
<chip class="Child:chip" image="chip4.png" vars="x=-111|y=65"/>
</player3>
<player4 class="Copy:player0" vars="x=1088|y=533">
<chip class="Child:chip" image="chip5.png" vars="x=-88|y=-23"/>
</player4>
<player5 class="Copy:player0" vars="x=585|y=585">
<chip class="Child:chip" image="chip6.png" vars="x=82|y=-54"/>
</player5>
<player6 class="Copy:player0" vars="x=117|y=533">
<chip class="Child:chip" image="chip7.png" vars="x=85|y=-26"/>
</player6>
</players>
The script above creates "native" (as in "non-dynamic") Flash objects. TSprite is a Sprite descendant, FBitmap inherits from Bitmap etc. At 71KBs, it takes tens of seconds to be parsed on my Sony XPeria.
Instead of optimizing the parser (which wouldn't probably gain too much anyway) I am contemplating converting my scripts to binaries, so that scripts will be used for debugging and the finalized binaries for release-built code.
One question is, how does one handle pointers from one object to another when serializing them? How are pointers translated from memory to disk file-friendly format,then back to memory?
Another question is, what about "nested" objects? In Flash for example, an object can be a graphics container of other objects. Could such a state be serialized? Or must objects be saved separately and, when loaded from disk, added to their parents through the nesting functions (i.e. addChild etc...)?
If possible, I would prefer generic guidelines that could apply to languages as different as C++ or AS3.
As far as I've understood your idea is to save some time by replacing creation of the objects from some fixture script (xml/json) to deserializing (from binary) previously serialized objects. If that's the case I beleive that you've taken wrong approach to solve this issue.
Since you've asked for general guidelines I'll try to explain my reasoning, not delving too deeply into language-specific details. Note that I'll talk about the common case and there may be exceptions from it. There is no silver bullet and you should analyze your scenario to pick the best solution for it.
From one point of view creating set of objects based on fixture/script is not that different from deserializing objects from binary. In the end they both are about turning some "condensed" state into objects that you can use. Yes, it is true that binaries are usually smaller in size but formats like JSON have not that many overhead in commomon case (XML is usually more redundant though). In general you won't save much time/memory on deserializing this state from binary instead of parsing it from script. Here is a real world example from somthing I worked with: Mental Ray is a de-facto standard for rendering 3d scenes\special effects in a movie industry. It uses textual file format to represent scenes that is somewhat similar to JSON in many aspects. Mental Ray is heavy on computations, performance is one of the key issues here, yet it lives perfectly fine without binary scene file format. So, analyzing this aspect you can say that there is no substantial difference between these 2 approaches.
From another point of view, there is a difference that may come into play. While deserializing object implies only creating object and loading state into it's fields, creating object from script may also include some extra initialization on top of that. So, in some cases there may be benefit from using deserialization approach.
However, in the end I would argue that it is not a good idea to simply replase your scripted objects to serialized objects because scripting and serialization are conceptually different things and have different purposes (though they do have something in common). Using serialization approach you'll loose flexibility in modifying your fixtured state (it is usually much harder for humans to edit binaries instead of JSON/XML) as well as ability to do initialization work.
So, think about what you actually need in your scenario and stick with it. That's the best way.
Now, if it happen to be that you actually need your objects to be scripted but this approach is not fast enough I would investigate speeding it up in one of 2 ways:
Analyze whether it is possible to restructure your data the way it will take less time to load. This is not always possible, however, it might be worth trying it.
Analyze what else your scripting engine does to init object except of simply creating them and load state into their fields and try to optimize it. This approach is actually has the most potential since this is the only part that has substantional difference in terms of performance (between scripting and deserialization approach) and does not lead to misuse of concepts. Try to see whether you are able to reduce this amount of work, needed to init object. It may be a good idea to tailor something more specific for your needs if you are currently using some generic scripting engine\framework at the moment.
Now, answering your original questions...
how does one handle pointers from one object to another when serializing them?
References are headache most of serialization implementations does not mess with.
One of the approaches is to use something to identify object during serialization (pointer, for example), serialize object preserving this this identity, and store references from another objects to this object not as primitive type, but as a reference type (basically saving identity). When deserializing - keep track of all deserialized objects and reuse them when deseializing reference typed field.
How are pointers translated from memory to disk file-friendly format,then back to memory?
It is a rarity when serialization deals with raw memory. This approach is only good for primitive types and does not work with poiners/references well. Languages that support reflection\introspection usually use it to inspect values of the fields to serialize objects. If we talk about pure C++, where reflection is poor - there is no other reliable way except to make object itself define means to serialize itself to bytestream and use these methods.
Another question is, what about "nested" objects? In Flash for example, an object can be a graphics container of other objects. Could such a state be serialized? Or must objects be saved separately and, when loaded from disk, added to their parents through the nesting functions (i.e. addChild etc...)?
If we talk about deserialization - these should probably be treated like references (see answer above). Using methods like addChild is not a good idea since it can contain some logic that will mess things.
Hope this answers your questions.
You should realty take a look at Adobe Remote Object.
Typically using serialization could cost you problems, such as:
I have a serialized object from application version 2.3 and now at the new version 2.4 it been modified: property removed / property added. This makes my serialized object unparsable.
While developing a serialization protocol that will support cross platform, you may actually wish to kill yourself while debugging. I remember myself doing this and I spent hours to find out that my flash using Big Indian and C# using Small Indian.
Adobe solved those problems for you, they created a nice binary protocol called AMF - Action Message Format. It have many implementations on various platforms that can communicate with your actions script.
Here you may find some C++ implementations.

Good Design for C++ Serialization

i^m currenty searching for a good OO design to serialize a C++/Qt Application.
Imagine the classes of the application organized based on a tree structure, implemented with the Composite-Pattern, like in the following picture.
The two possible principles i thought of:
1.)
Put the save()/load() function in every class which has to be serializable.
If have seen this many times, usually implemented with boost.
Somewhere in the class you will find something like this:
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & m_meber1;
}
You could also seperate this into save() and load().
But the disadvantage of this approach is :
If you wannt to change the serialization two months later (to XML, HTML or something very curious, which boost not supports) you have to adopt all the thousands of classes.
Which in my opinion is not a good OO-Design.
And if you wannt to support different serializations (XML, Binary, ASCII, whatever....) at the same time than 80% of the cpp exists just for serialization functions.
2.)
I know boost also provides a Non intrusive Version of the Serialization
"http://www.boost.org/doc/libs/1_49_0/libs/serialization/doc/tutorial.html"
So another way is to implement an Iterator which iterates over the composite tree structure and serializes every object (and one iterator for the deserialization)
(I think this is what the XmlSerializer-Class of the .NET-Framework does, but i^m not realy familiar with .NET)
This sounds better because seperate save() and load() and only one spot to change if serialization changes.
So this sounds better, BUT:
- You have to provide a setter() and a getter() for every parameter you wannt to serialize. (So, there is no private Data anymore. (Is this good/bad?))
- You could have a long inheritance hirarchy (more than 5 Classes) hanging on the composite tree.
So how do you call the setter()/getter() of the derived classes?
When you can only call a interface function of the base Composite-Component.
Another way is to serialize the objects data into a seperate abstract format.
From which all the possible following serializations (XML, TEXT, whatever is possible) get their data.
One Idea was to serialize it to QDomNode.
But i think the extra abstraction will cost performance.
So my question is:
Does anyone know a good OO-Design for serialization?
Maybe from other programming languages like JAVA, Python, C#, whatever....
Thank you.
Beware of serialization.
Serialization is about taking a snapshot of your in-memory representation and restoring it later on.
This is all great, except that it starts fraying at the seams when you think about loading a previously stored snapshot with a newer version of the software (Backward Compatibility) or (god forbid) a recently stored snapshot with an older version of the software (Forward Compatibility).
Many structures can easily deal with backward compatibility, however forward compatibility requires that your newer format is very close to its previous iteration: basically, just add/remove some fields but keeps the same overall structure.
The problem is that serialization, for performance reasons, tends to tie the on-disk structure to the in-memory representation; changing the in-memory representation then requires either the deprecation of the old archives (and/or a migration utility).
On the other hand, messaging systems (and this is what google protobuf is) are about decoupling the exchanged messages structures from the in-memory representation so that your application remains flexible.
Therefore, you first need to choose whether you will implement serialization or messaging.
Now you are right that you can either write the save/load code within the class or outside it. This is once again a trade-off:
in-class code has immediate access to all-members, usually more efficient and straightforward, but less flexible, so it goes hand in hand with serialization
out-of-class code requires indirect access (getters, visitors hierarchy), less efficient, but more flexible, so it goes hand in hand with messaging
Note that there is no drawback about hidden state. A class has no (truly) hidden state:
caches (mutable values) are just that, they can be lost without worry
hidden types (think FILE* or other handle) are normally recoverable through other ways (serializing the name of the file for example)
...
Personally I use a mix of both.
Caches are written for the current version of the program and use fast (de)serialization in v1. New code is written to work with both v1 and v2, and writes in v1 by default until the previous version disappears; then switches to writing v2 (assuming it's easy). Occasionally, massive refactoring make backward compatibility too painful, we drop it on the floor at this point (and increment the major digit).
On the other hand, exchanges with other applications/services and more durable storage (blobs in database or in files) use messaging because I don't want to tie myself down to a particular code structure for the next 10 years.
Note: I am working on server applications, so my advices reflect the particulars of such an environment. I imagine client-side apps have to support old versions forever...