How to perform flexible serialization of a polymorphic inheritance hierarchy? - c++

I have tried to read carefully all the advice given in the C++FAQ on this subject. I have implemented my system according to item 36.8 and now after few months (with a lot of data serialized), I want to make changes in both public interface of some of the classes and the inheritance structure itself.
class Base
{
public:
Vector field1() const;
Vector field2() const;
Vector field3() const;
std::string name() const {return "Base";}
};
class Derived : public Base
{
public:
std::string name() const {return "Derived";}
};
I would like to know how to make changes such as:
Split Derived into Derived1 and Derived2, while mapping the original Derived into Derived1 for existing data.
Split Base::field1() into Base::field1a() and Base::field1b() while mapping field1 to field1a and having field1b empty for existing data.
I will have to
deserialize all the gigabytes of my old data
convert them to the new inheritance structure
reserialize them in a new and more flexible way.
I would like to know how to make the serialization more flexible, so that when I decide to make some change in the future, I would not be facing conversion hell like now.
I thought of making a system that would use numbers instead of names to serialize my objects. That is for example Base = 1, Derived1 = 2, ... and a separate number-to-name system that would convert numbers to names, so that when I want to change the name of some class, I would do it only in this separate number-to-name system, without changing the data.
The problems with this approach are:
The system would be brittle. That is changing anything in the number-to-name system would possibly change the meaning of gigabytes of data.
The serialized data would lose some of its human readability, since in the serialized data, there would be numbers instead of names.
I am sorry for putting so many issues into one question, but I am inexperienced at programming and the problem I am facing seems so overwhelming that I just do not know where to start.
Any general materials, tutorials, idioms or literature on flexible serialization is most welcomed!

It's probably a bit late for that now, but whenever designing
a serialization format, you should provide for versionning.
This can be mangled into the type information in the stream, or
treated as a separate (integer) field. When writing the class
out, you always write the latest version. When reading, you
have to read both the type and the version before you can
construct; if you're using the static map suggested in the FAQ,
then the key would be:
struct DeserializeKey
{
std::string type;
int version;
};
Given the situation you are in now, the solution is probably to
mangle the version into the type name in a clearly recognizable
way, say something along the lines of
type_name__version; if the
type_name isn't followed by two underscore,
then use 0. This isn't the most efficient method, but it's
usually acceptable, and will solve the problem with backwards
compatibility, while providing for evolution in the future.
For your precise questions:
In this case, Derived is just a previous version of
Derived1. You can insert the necessary factory function into
the map under the appropriate key.
This is just classical versionning. Version 0 of Base has
a field1 attribute, and when you deserialize, you use it to
initialize field1a, and you initialize field1b empty.
Version 2 of Base has both.
If you mangle the version into the type name, as I suggest
above, you shouldn't have to convert any existing data. Long
term, of course, either some of the older versions simply
disappear from your data sets, so that you can remove the
support for them, or your program keeps getting bigger, with
support for lots of older versions. In practice, I've usually
seen the latter.

maybe Thrift Can help you do it.

Related

C++ Inheritance & Virtual Functions Where Base Parameters require Replacement with Derived Ones

I have looked high and low for answers to this question - here on this forum and on the general internet. While I have found posts discussing similar topics, I am at a point where I need to make some design choices and am wondering if I am going about it the right way, which is as follows:
In C++ I have created 3 data structures: A linked list, a binary tree and a grid. I want to be able to store different classes in these data structures - one may be a class to manipulate strings, another numbers, etc. Now, each of these classes, assigned to the nodes, has the ability to perform and handle comparison operations for the standard inequality operators.
I thought C++ inheritance would provide the perfect solution to the matter - it would allow for a base "data class" (the abstract class) and all the other data classes, such as JString, to inherit from it. So the data class would have the following inequality method:
virtual bool isGreaterThan(const dataStructure & otherData) const = 0;
Then, JString will inherit from dataStructure and the desire would be to override this method, since isGreaterThan will obviously have a different meaning depending on the class. However, what I need is this:
virtual bool isGreaterThan(const JString & otherData) const;
Which, I know will not work since the parameters are of a different data type and C++ requires this for the overriding of virtual methods. The only solution I could see is doing something like this in JString:
virtual bool isGreaterThan(const dataStructure & otherData);
{
this->isGreaterThanJString(dynamic_cast<const JString&>(theSourceData));
};
virtual bool isGreaterThanJString(const JString & otherData) const;
In other words, the overriding method just calls the JString equivalent, down-casting otherData to a JString object, since this will always be true and if not, it should fail regardless.
My question is this: Does this seem like an acceptable strategy or am I missing some ability in C++. I have used templates as well, but I am trying to avoid this as I find debugging becomes very difficult. The other option would be to try a void* that can accept any data type, but this comes with issues as well and shifts the burden onto the code resulting in lengthier classes.
The LSP means operations on a reference to base class must work and have the same semantics as operations on both base and derived class instances when those operations are referentially polymorphic.
Your example fails this test. The base isGreaterThan claims to work on all dataStructure, but it does not.
I would make the dataStructure argument types templates in your containers. Then you know the concrete type of the stored data.
Look at std list for an idea of what a linked list template might look like.
I will now go onto complex additional steps you can do in the 0.1% of cases where the above advice is not correct.
If this causes issues, because of template bloat, you could create a polymorphic container that enforces the type of the stored data, either with a thin template wrapper or runtime tests. Once stored, you blindly cast to the known stored type, and store how to copy/compare/etc said type either in a C or C++ style polymorphic method.
Here is an 8 year old fun talk about this approach: https://channel9.msdn.com/Events/GoingNative/2013/Inheritance-Is-The-Base-Class-of-Evil

C++ Class Superclass

I'm currently making a C++ version of python's svg.path. There are multiple types of paths, like a Line, CubicBezier, etc. which are separate classes (with no inheritance, except for Line and Close which are inherited from Linear but that can be removed if necessary). There's also a Path class, which in python has a list of segments. But I'm not sure how to have a vector of segments in C++.
So something like this:
class Line {};
class CubicBezier {};
class Arc {};
class Path {
// Segment should be able to store any type of Segment like Line, Arc, etc.
vector<Segment> segments;
};
Currently the best thing I can think of is where I have a Segment class that stores all the segment types and has various setters and getters for each of them, but that seems tedious and annoying, as well as inefficient.
Also, if there's a better way to do this (and there almost certainly is), please explain how to do that, and I'll try it.
If needed, I can post the python code from svg.path.
Since you're not using inheritance, you'll need a different tool. It appears you want something like std::variant<Line, CubicBezier, Arc>.
The downside of this approach is that you'll need to handle all the different cases yourself, since there's no common base class interface.

C++ Model View Design

I am currently struggling with the design of an application of the visualization and manipulation of sensor data. I have a database that contains several STL-containers with measured data in it.
std::unordered_map<std::string, std::array<uint16_t, 3366>> data1;
std::unordered_map<std::string, QImage> data2;
std::unordered_map<std::string, std::vector<Point3D>> data3;
I also have different Views (mostly Qt-based) and each view should be associated with a model which is specific to one of the data sets. So data1 is supposed to be processed and manipulated in a class called model1 which is then displayed by means of a class view1 and so forth.
But I cant seem to find a suitable design structure to incorporate this idea. the models grant access to their processed data, but that data is contained in different container structures as given above. That makes it unfeasible to use inheritance with a pure virtual function in the base class like
std::map<...,...> getModelData() = 0;
The initial idea of this inheritance was to avoid code duplication but that doesnt seem to be the right solution here. I know that Qt in their "Model-View" concepts makes use of their QVariant class to have maximum flexibility in terms of types being returned. However, I am wondering, what is the best solution with standard C++ here? I read a lot about striving for loose-coupling, code reuseability, Dependendy Inversion and how to favour composition over inheritance but I do have problems putting these theoretical advise into practice and end up with code bloat and repetitive code most of the times. Can you help me?
Maybe you can give more code but so far, I can give a few hints:
-Can you use QMap instead of std::unordered_map ? It is more agile if you need to tangle with a UI
-Maybe make your second argument of the list a common base type like (code not tested, treat as pseudo code)
class BaseDataClass
{
public:
int getType();
QImage* getImageData();
std::array<uint16_t, 3366>>& getArray();
std::vector<Point3D>>& getVector();
private:
int mType;
BaseDataClass(); //hide ctor or make abstract, as you wish
}
You can avoid code duplicating with this. Make three new classes that each inherit from BaseDataClass. You can then make a method that iterates over all BaseDataClass, checks the type (e.g. 1=QImage; 2 = array ; 3 = vector), and exectues the right method according to the type (get QImage from all type 1`s ...). You also can cast the pointer to the right type then. Which makes it even better if your derived classes gain more and more functionality (like sorting or validating data)

Store any value and a value of a specific range of classes

I have a class that looks something like this:
class container{
private:
std::vector<physical_component> physical;
std::vector<storage_component> storage;
--some other stuff not relevant--
public:
--constructors, getters setters, methods to add to the vectors etc--
}
Now I am struggeling with making the physical_component and storage_component classes since I dont know a proper datatype to handle this sort of thing.
Physical_component should be able to:
Store a set amount of types, and fully retaining a type (something I can cast to is good enough)
Should store the objects in a way that makes them individual from the ones passed (and therefore secure from changes to the orignial class)
I remember something like that excisting in c alongside enum but I dont know the name. Also c++ probably has a better way for that.
Storage_component is supposed to:
Store any type
(optional) remember the original type
I have no idea how to achieve this properly. I saw std::any but it seems to be rather new therefore I dont know if its a good way to go about this. Also I cant make storage_component a template because I cant store it in a vector then
What is the (proper) way to implement these classes?
Store a set amount of types, and fully retaining a type
You probably want std::variant<Ts...> (or boost::variant<Ts...>). It stores one of Ts... at a particular point in time.
Store any type
If all the types share the same interface, use a traditional virtual + std::unique_ptr polymorphism approach. Otherwise std::any is the right choice here.

Dynamic structures in C++

I am running a simulation in which I have objects of a class which use different models. These models are randomly selected for some objects of the class and specifically decided for some objects too. These objects communicate with each other for which I am using structures (aka struct) in C++ which has some
standard variables and
some additional variables which depends on models which the objects communicating with each other have.
So, how can I do this?
Thanks in advance.
You can hack around with:
the preprocessor;
template meta-programming;
inheritance/polymorphism.
Each gives a different way of producing a different user-defined type, based on different kinds of conditions.
Without knowing what you're trying to accomplish, this is the best I can do.
All instances of a structure or class have the same structure. Luckily, there are some tricks that can be used to 'simulate' what you try to do.
The first trick (which can also be used in C), is to use a union, e.g.:
struct MyStruct
{
int field1;
char field2;
int type;
union
{
int field3a;
char field3b;
double field3c;
} field3;
};
In a union, all members take up the same space in memory. As a programmer you have to be careful. You can only get out of the union what you put in. If you initialize one member of a union, but you read another member, you will probable get garbage (unless you want to do some low-level hacks, but don't do this unless you are very experienced).
Unions often come together with another field (outside the union) that indicates which member is actually used in the union. You could consider this your 'condition'.
A second trick is use the 'state' pattern (see http://en.wikipedia.org/wiki/State_pattern). From the outside world, the context class looks always the same, but internally, the different states can contain different kinds of information.
A somewhat simplified approach for state is to use simple inheritance, and to use dynamic casts. Depending on your 'condition', use a different subclass, and perform a dynamic cast to get the specific information.
E.g., suppose that we have a Country class. Some countries have a president, others have a king, others have an emperor. You could something like this:
class Country
{
...
};
class Republic : public Country
{
public:
const string &getPresident() const;
const string &getVicePresident() const;
};
class Monarchy : public Country
{
public:
const string &getKing() const;
const string &getQueen() const;
};
In your application you could work with pointers to Country, and do a dynamic cast to Republic or Monarchy where the president or king is needed.
This example can be easily transformed into one using the 'state' pattern, but I leave this as an exercise for you.
Personally, I would go for the state pattern. I'm not a big fan of dynamic casts and they always seem to be kind-of-hack for me.
If it's at compile-time, a simple #ifdef or template specialization will serve this purpose just fine. If it's at run-time and you need value semantics, you can use a boost::optional<my_struct_of_optional_members>, and if you're fine with reference semantics, inheritance will solve the problem at hand.
A union and that kind of dirty trick is not necessary.
There are several common approaches for "dynamic" attributes/properties in languages, and a few that tend to work well in C++.
For example, you can make a C++ class called "MyProperties" that has a sparse set of values, and your MyStructureClass would have its well-known members, plus a single MyProperties instance which may have zero-or-more values.
Similarly, languages like Python and Perl make extensive use of Associative Arrays/Dictionaries/Hashes to achieve this: The (string) key uniquely identifies the value. In C++, you can index your MyProperties class with a string or any type you want (after overloading the operator[]()), and the value can be a string, a MyVariant, or any other pointer-or-type that you want to inspect. The values are dynamically added to the parent container as they are assigned (e.g., the class "remembers" the last value it is given, uniquely identified by key).
Finally, in the "olden days", what you describe was commonly done for distributed application processing: You defined a C-struct with "well-known" (typed) fields/members, and the last field was a char* member. Then, that char* member would identify the start of a serialized stream of bytes that were also part of that struct (you merely serialized that array of chars when you marshalled the struct across systems). In the context of C++, you could similarly extract your values dynamically from that char* stream buffer on-access-demand (which logically should be "owned" by the class). This worked for marshalling across systems because the size of the struct was the size of everything (including the last char* member), but the "allocation" for that struct was much larger (e.g., the size of the struct itself, which was logically a "header", plus a certain number of bytes after that header, which represented the "payload" and which was indexed by the last member, the char* member.) Thus, it was a contiguous-block-of-memory struct, with dynamic size. (This would also work in C++ as long as you passed-by-reference, and never by value.)
embed an union into your structure, and use a flag to tell which part of the union is valid.
enum struct_type
{
cool,
fine,
bad
};
struct demo
{
struct_type type;
union
{
struct
{
double cool_factor;
} cool_part;
struct
{
int fineness;
} fine_part;
struct
{
char *bad_stuff;
} bad_part;
};
struct
{
int life_is_cool;
} common_part;
};
The pure and simple C++ answer is: use classes.
I can't determine from your question what you are trying to achieve: runtime variation or compile time variation, but either way, I doubt you'll get a workable implementation any other way. (Template metaprogramming aside... which isn't for the faint of heart.)