Using xml to load objects. Which is the best approach? - c++

TinyXML
I have a XML file that keeps a bunch of data that is loaded into objects. Right now, I have one giant method that parses the XML file and creates the appropriate objects depending on the contents of the XML file. This function is very large and imports lots of class definitions.
Would it be better to each class type to do its own loading from XML. That way the XML code is dispersed throughout my files and not in one location. The problem is that I need to pass it the exact node inside the XML file where that function should read from. Is this feasible? I'm using tinyxml so if imagine each class can be passed the XML stream (an array containing the XML data actually) and then I'd also pass the root element for that object \images\fractal\traversal\ so it knows what it should be reading.
Then the saving would work the same way.
Which approach is best and more widely used?

I don't know anything about TinyXML, but I have been using that kind of class design with libxml2 for several years now and it has been working fine for me.

Serialization functions should be friends of the classes they serialize. If you want to serialize and deserialize to XML you should write friend function that perform this function. You could even write custom ostream & operator <<() functions that do this, but this becomes problematic if you want to aggregate objects. A better strategy is to define a mechanism that turns individual objects into Node's in a DOM document.

I can think of an approach, based on a factory to serve up the objects based on a tag.
The difficulty here is not really how to decouple the deserialization of each object content, but rather to decouple the association of a tag and an object.
For example, let's say you have the following XML
<my_xml>
<bird> ... </bird>
</my_xml>
How do you know that you should build a Bird object with the content of the <bird> tag ?
There are 2 approaches there:
1 to 1 mapping, ig: <my_xml> represents a single object and thus knows how to deserialize itself.
Collection: <my_xml> is nothing more than a loose collection of objects
The first is quite obvious, you know what to expect and can use a regular constructor.
The problem in C++ is that you have static typing, and that makes the second case more difficult, since you need virtual construction there.
Virtual construction can be achieved using prototypes though.
// Base class
class Serializable:
{
public:
virtual std::auto_ptr<XmlNode*> serialize() const = 0;
virtual std::auto_ptr<Serializable> deserialize(const XmlNode&) const = 0;
};
// Collection of prototypes
class Deserializer:
{
public:
static void Register(Tag tag, const Serializable* item)
{
GetMap()[tag] = item;
}
std::auto_ptr<Serializable> Create(const XmlNode& node)
{
return GetConstMap()[node.tag()]->deserialize(node);
// I wish I could write that ;)
}
private:
typedef std::map<Tag, const Serializable*> prototypes_t;
prototypes_t& GetMap()
{
static prototypes_t _Map;
return _Map;
}
prototypes_t const& GetConstMap() { return GetMap(); }
};
// Example
class Bird: public Serializable
{
virtual std::auto_ptr<Bird> deserialize(const XmlNode& node);
};
// In some cpp (bird.cpp is indicated)
const Bird myBirdPrototype;
Deserializer::Register('bird', myBirdPrototype);
Deserialization is always a bit messy in C++, dynamic typing really helps there :)
Note: it also works with streaming, but is a bit more complicated to put in place safely. The problem of streaming is that you ought to make sure not to read past your data and to read all of your data, so that the stream is in a 'good' state for the next object :)

Related

Design pattern for isolating parsing code?

I have C++ class Foo:
class Foo
{
public:
[constructor, methods]
private:
[methods, data members]
};
I want to add to class Foo the possibility for it to be constructed by reading data from a text file. The code for reading such data is complicated enough that it requires, in addition to a new constructor, several new private methods and data members:
class Foo
{
public:
[constructor, methods]
Foo(const std::string& filePath); // new constructor - constructs a Foo from a text file
private:
[methods, data members]
[several methods used for text file parsing] // new methods
[several data members used for text file parsing] // new data members
};
This works, but I feel it would be better to isolate the new parsing code and data members into their own entity.
What would be an adequate design pattern in order to achieve this goal?
I think this would be a good opportunity to use the so-called Method Object pattern. You can read about that pattern on various web sites. The best description I have found, though, is in Chapter 8 of Kent Beck's book Implementation Patterns.
Your use case is unusual in the sense that this pattern would apply to a constructor instead of a regular method, but this is of secondary importance.
This is purely an opinion piece, so I'm surprised it's not closed yet. That being said... To me, it depends upon the format of your input file.
At my company, we use JSON representation for no end of things. We store JSON files. We pass JSON in our REST calls. This is pretty common. I have a virtual base class called JSON_Serializable with a toJSON and fromJSON method, and all the classes that are going to do this implement those.
I consider this 100% reasonable. There's nothing wrong with a class being able to serialize itself.
Do you control the format of your input file? Is it a format you're going to use a lot? If so, there's nothing wrong with making the class smart enough to serialize from a string.
I wrote a http server which involded parsing the request and response to something the server client recognized. Both fit builder pattern(https://refactoring.guru/design-patterns/builder)
Heres a http example of request builder
https://bitbucket.org/ptroen/crossplatformnetwork/src/master/OSI/Session/HTTP/HTTP_Request_Builder.h
Their is also a response builder in same folder
Use case is similar your building something from or to txt file stream. But depending on the nesting of data could be more complicated so best to write requirements first

Setting a property that I can access in the serialize functions when using Cereal serialization library

I'm using the 'Cereal' serialization library (uscilab.github.io/cereal/) to serialize objects that can have millions of numbers, and the meta-data that describes the numbers. In some instances, I do not need the numbers to be serialized, just the meta-data; other times I would like both in the archive.
The only way I could think to achieve was to add a boolean property to the OutputArchive class defined in the cereal.hpp file. My thinking is that when I construct the archive, I set this value. Then, when the serialization code runs, any object could access this property and serialize the appropriate values. Most objects would ignore this property, but the objects holding the (potentially) millions of numbers could either ignore the numbers or not, based on the value of this property.
Here is some pseudocode to help explain (derived from the examples on the Cereal website). Creating an archive would look like this:
int main()
{
std::stringstream ss;
{
cereal::BinaryOutputArchive oarchive(ss, true); // I modified the constructor to accept a boolean parameter, and set the property
...
}
...
Then, within the function that serializes my data object (the object that holds metadata and the millions of numbers):
template<class Archive>
void save(Archive& ar) const
{
ar(metadata);
ar(more_meta_data);
boolean bArchiveEverything = ar.ArchiveNumbers(); //<<-- this is what I don't know how to accomplish
ar(bArchiveEverything); // put this into the archive, so I know what to expect when deserializing
if (bArchiveEverything) {
ar(bigVectorOfNumbers);
}
}
My questions:
1) Am I going about this all wrong? Is there a simpler more elegant way I'm missing?
2) If not, and this seems reasonable, I'm not sure how I can access my property in the OutputArchive through the 'Archive&' parameter that gets passed into the template functions that Cereal needs for serializing.
Thanks in advance for any help.
I still don't know if this was the best way, so I can't answer my first question.
However, accessing the property didn't end up being that difficult. It turns out, that as long as all of the classes that get passed into the 'save' function as 'ar' have the same function, I can use that function just like my pseudo-code function "ArchiveNumbers()". So, all I had to do was add that function to the 'OutputArchive' class in Cereal, and have it return my property.
I didn't think that would even compile, but I was wrong about that. I'm still trying to wrap my head around template programming. While I got this to work, I certainly can't say this is a 'best practice'.

Architecture of a director / executive / master / top-level application layer

I have a collection of classes and functions which can interact with one another in rich and complex manners. Now I am devising an architecture for the top-level layer coordinating the interaction of these objects; if this were a word processor (it is not), I am now working on the Document class.
How do you implement the top-level layer of your system?
These are some important requirements:
Stand-alone: this is the one thing that can stand on its own
Serializable: it can be stored into a file and restored from a file
Extensible: I anticipate adding new functionality to the system
These are the options I have considered:
The GOF Mediator Pattern used to define an object that encapsulates how a set of objects interact [...] promotes loose coupling by by keeping objects from referring to each other explicitly, and it lets you vary their interaction independently.
The problem I see with mediator is that I would need to subclass every object from a base class capable of communicating with the Mediator. For example:
class Mediator;
class Colleague {
public:
Colleague(Mediator*);
virtual ~Colleague() = default;
virtual void Changed() {
m_mediator->ColleagueChanged(this);
}
private:
Mediator* m_mediator;
};
This alone makes me walk away from Mediator.
The brute force blob class where I simply define an object and all methods which I need on those objects.
class ApplicationBlob {
public:
ApplicationBlob() { }
SaveTo(const char*);
static ApplicationBlob ReadFrom(const char*);
void DoFoo();
void DoBar();
// other application methods
private:
ClassOne m_cone;
ClassTwo m_ctwo;
ClassThree m_cthree;
std::vector<ClassFour> m_cfours;
std::map<ClassFive, ClassSix> m_cfive_to_csix_map;
// other application variables
};
I am afraid of the Blob class because it seems that every time I need to add behaviour I will need to tag along more and more crap into it. But it may be good enough! I may be over-thinking this.
The complete separation of data and methods, where I isolate the state in a struc-like object (mostly public data) and add new functions taking a reference to such struct-like object. For example:
struct ApplicationBlob {
ClassOne cone;
ClassTwo ctwo;
ClassThree cthree;
std::vector<ClassFour> cfours;
std::map<ClassFive, ClassSix> cfive_to_csix_map;
};
ApplicationBlob Read(const char*);
void Save(const ApplicationBlob&);
void Foo(const ApplicationBlob&);
void Bar(ApplicationBlob&);
While this approach looks exactly like the blob-class defined above, it allows me to physically separate responsibilities without having to recompile the entire thing everytime I add something. It is along the lines (not exactly, but in the same vein) of what Herb Sutter suggests with regards to preferring non-member non-friends functions (of course, everyone is a friend of a struct!).
I am stumped --- I don't want a monolith class, but I feel that at some point or another I need to bring everything together (the whole state of a complex system) and I cannot think of the best way to do it.
Please advise from your own experience (i.e., please tell me how do you do it in your application), literature references, or open source projects from where I can take some inspiration.

Need library for binary stream serialization, C++

What I'm looking for is similar to the serialization library built into RakNet (which I cannot use on my current project). I need to be able save/load binary streams into a custom format locally, and also send them over a network. The networking part is solved, but I really don't want to write my own methods for serializing all of my different types of data into binary, especially since it would be inefficient without any compression methods.
Here's some pseudocode similar to how RakNet's bitstreams work, this is along the lines of what I'm looking for:
class Foo
{
public:
void Foo::save(BitStream& out)
{
out->write(m_someInt);
out->write(m_someBool);
m_someBar.save(out);
// Alternative syntax
out->write<int>(m_someInt);
// Also, you can define custom serialization for custom types so you can do this...
out->write<Bar>(m_someBar);
// Or this...
out->write(m_someBar);
}
void Foo::load(BitStream& in)
{
in->read(m_someInt);
in->read(m_someBool);
in->read(m_someBar);
}
private:
int m_someInt;
bool m_someBool;
Bar m_someBar;
};
Are there any free C++ libraries out there that allow for something like this? I basically just want something to pack my data into binary, and compress it for serialization, and then decompress it back into binary that I can feed back into my data.
EDIT, adding more information:
Unfortunately neither Google Protocol Buffers or Boost Serialization will work for my needs. Both expect to serialize object members, I need to simply serialize data. For example, lets say I have a std::vector<Person>, and the class Person has a std::string for name, and other data in it, but I only want to serialize and deserialize their names. With Google Protocol Buffers it expects me to give it the Person object as a whole for serialization. I can achieve however, achieve this with Boost Serialization, but if I have another scenario where I need the entire Person to be serialized, there is no way to do that, you either have to serialize all of it, or none. Basically I need quite a bit of flexibility to craft the binary stream however I see fit, I just want a library to help me manage reading and writing binary data to/from the stream, and compressing/decompressing it.
Google Protocol Buffers
Boost serialization
UPDATE
Looking at the updated question I think it might be easiest to write a small custom library that does exactly what is required. I have a similar one and it is only a few hundred lines of code (without compression). It is extremely easy to write unit tests for this kind of code, so it can be reliable from day one.
To serialize custom types, I have a Persistent base class that has save and load methods:
class Storage {
public:
void writeInt( int i );
void writeString( string s );
int readInt();
string readString();
};
class Persistent {
public:
virtual void save( Storage & storage ) = 0;
virtual void load( Storage & storage ) = 0;
};
class Person : public Persistent {
private:
int height;
string name;
public:
void save( Storage & storage ) {
storage.writeInt( height );
storage.writeString( name );
}
void load( Storage & storage ) {
storage.readInt( height );
storage.readString( name );
}
};
And then there's a simple layer on top of that that stores some type information when saving and uses a Factory to create new objects when loading.
This could be further simplified by using C++'s streams (which I don't like very much, hence the Storage class). Or copying Boost's approach of using the & operator to merge load and save into a single method.

STL Metaprogramming - which types of my template class have been created at compile time?

First the apologies, i'm not sure if my question title even accuratly explains what I'm asking - I've had a look through google, but i'm not sure which terms I need in my search query, so the answer may be out there (or even on StackOverflow) already.
I have a templated class, which basically looks like the following - it uses the Singleton pattern, hence everything is static, I'm not looking for comments on why I'm storing the keys in a set and using strings etc, unless it actually provides a solution. There's a bit more to the class, but that isn't relevant to the question.
template<typename T>
class MyClass
{
private:
//Constructor and other bits and peices you don't need to know about
static std::set<std::string> StoredKeys;
public:
static bool GetValue(T &Value, std::string &Key)
{
//implementation
}
static SetValue(const T Value, std::string &Key)
{
//implementation
StoredKeys.Insert(Key);
}
static KeyList GetKeys()
{
return KeyList(StoredKeys);
}
};
Later on in some other part of the application I want to get all the Keys for all of the values - regardless of type.
Whilst I am fairly confident that at the moment only 3 or 4 types are being used with the class so I could write something like:
KeyList Keys = MyClass<bool>::GetKeys();
Keys += MyClass<double>::GetKeys();
Keys += MyClass<char>::GetKeys();
This will need to be updated each time a new type is used. It also has the downside of instantiating the class if it's not used anywhere.
I think (again I could be wrong) that metaprogramming is the answer here, some sort of macro maybe?
We're using boost, so I'm guessing the MPL library could be useful here?
This aspect of STL is a bit new to me, so I'm happy to read up and learn as much as I need, just as soon as I know exactly what it is I need to learn to engineer a solution.
Move StoredKeys into a non-template base class class MyClassBase, or add an AllStoredKeys static member to a non-template base class.
Alternatively, create a static init method called from SetValue that adds a pointer to StoredKeys to a static list.
There's no magic. If you need to enumerate all the types used to instantiate MyClass in your program, then you have to enumerate them explicitly, somewhere. somehow. And you have to manually update the list whenever it changes.
With template metaprogramming, the number of places you need to update manually can be reduced down to one, but you do need that one place.
Fortunately, in this particular problem you don't need to enumerate all the types. You just need to store all keys in one set, as opposed to splitting them between several sets. You may create a common non-template base to MyClass and add static std::set<std::string> StoredKeys there (or perhaps make it a multiset if there's a possibility of identical keys in different type-specific sets).
The first answere: Its not possible!
Template classes dont actually have a "generics" in common (like in java) but a separate classes which dont have anything to do with eachother.
The second answere: Theres a workaround. One can define a base class MyClassBase which defines properties shared by all templated subclasses. The problem is that you have a singleton pattern here which might makes the situation a bit more compilcated. I think a solution might look like this:
class MyClassBase {
static std::vector<MyClassBase*> childs;
static KeyList getAllKeys(){
//iterate over childs here and call ->GetKeys
}
virtual KeyList GetKeys() = 0;
template<typename T>
static T* instance() {
T* instance = MyClass<T>::instance();
if(std::find(childs.begin(), childs.end(), instance) != childs.end()){
childs.push_back(instance);
}
return instance;
}
};
Please forgive me any syntactic errors; I just typed that in the Stackoverflow editor, but i think it should make my point clear.
Edit:
I just saw that I named the singleton method of the subclasses also instance(). This will probably not work. Give it some other name like privateInstance() or so. Then you must change T* instance = MyClass<T>::instance(); to T* instance = MyClass<T>::privateInstance();