C++ 'wrapper class' for XML library - c++

So I've been attempting to create some classes around the xerces XML library so I can 'hide' it from the rest of my project the underlying xml library stays independent from the rest of my project.
This was supposed to be a fairly easy task, however it seems entirely impossible to hide a library from the rest of a project by writing some classes around it.
Have I got the wrong approach or is my 'wrapper' idea completely silly?
I end up with something like this:
DOMElement* root(); //in my 'wrapper' class, however this DOMElement is part of the xerces library, at this point my 'wrapper' is broken. Now I have to use the xerces library everywhere I want to use this function.
Where is my thinking gone wrong?

I would recommend avoiding the wrapper in the first stage. Just make sure that the layers and their borders are clear, i.e. the network layer takes care of serializing/deserializing the XML, and from there on you only use your internal types. If you do this, and at a later stage you need to replace xerces with any other library, just replace the serialization layer. That is, instead of wrapping each XML object, just wrap the overall operation: serialize/deserialize.

Writing your own abstract interface for a library is not a silly idea IF you have plan to change or to have the possibility to change the library you are using.
You should not rely on your library object to implement your wrapper interface. Implement your own structure and your own function interface. It will ease a lot of work when you will want to change how xml is implemented (eg: change library).
One example of implementation:
class XmlElement
{
private:
DOMElement element; // point to the element of your library
public:
// Here you define how its public interface.
// There should be enough method/parameter to interact
// with any xml interface you will use in the future
XmlElement getSubElement(param)
{
// Create the Xmlelement
// Set the DOMElement wanted
// return it
}
}
In your program you will see:
void function()
{
XmlElement root();
root.getSubElement("value"); // for example
}
Like that no DOMElement or their function appear in the project.

As I mentioned in my comments, I would take a slightly different approach. I would not want my codebase to be dependent on the particular messaging format (xml) that I am using (what if for example you decide to change the xml to something else later?) Instead I would work with a well defined object model and have a simple encoder/decoder to handle the conversion to XML string and vice versa. This encode/decoder would then be the bit that I would replace if the underlying wire format changed.
The decoder would take in the data read from the socket, and produce a suitable object (with nested objects to represent the request) and the decoder would take a similar object and generate the XML from it. If performance is not a primary concern, I would use a library such as TinyXML which is quite lightweight - heck, you can strip that down even further and make it more light weight...

Related

POCO C++ object to JSON string serialization

I wonder how could I serialize an object of a given class (e.g. Person) with its attributes (e.g. name, age) to a JSON string using POCO C++ libraries.
Maybe I should create my models using Poco::Dynamic and Poco::Dynamic::Var in order to use POCO::JSON::Stringifier? I can't imagine how to do this...
Thanks in advance!
Unlike Java or C#, C++ doesn't have an introspection/reflection feature outside of Run-time type information (RTTI), which has a different focus and is limited to polymorphic objects. That means outside of a non-standard pre-compiler, you'll have to tell the serialisation framework one way or another how your object is structured and how you would eventually like to map it to a hierarchy of int, std::string and other basic data types. I usually differentiate between three different approaches to do so: pre-compiler, inline specification, property conversion.
Pre-compiler: A good example of the pre-compiler approach is Google Protocol Buffers: https://developers.google.com/protocol-buffers/docs/cpptutorial. You define your entities in a separate .proto file, which is transformed using a proprietary compiler to .c and .h entity classes. These classes can be used like regular POCO entities and can be serialised using Protocol Buffers.
Inline specification: Boost serialization (https://www.boost.org/doc/libs/1_67_0/libs/serialization/doc/index.html), s11n (www.s11n.net) and restc-cpp (https://github.com/jgaa/restc-cpp) are examples of explicitly specifying the structure of your POCOs for the framework inside your own code. The API to do so may be more or less sophisticated, but the principle behind it is always the same: You provide the framework serialise/deserialise implementations for your classes or you register metadata information which allows the framework to generate them. The example below is from restc-cpp:
struct Post {
int userId = 0;
int id = 0;
string title;
string body;
};
BOOST_FUSION_ADAPT_STRUCT(
Post,
(int, userId)
(int, id)
(string, title)
(string, body)
)
Property conversion: The last kind of serialisation that I don't want to miss mentioning is the explicit conversion to a framework-provided intermediate data type. Boost property tree (https://www.boost.org/doc/libs/1_67_0/doc/html/property_tree.html) and JsonCpp (http://open-source-parsers.github.io/jsoncpp-docs/doxygen/index.html) are good examples of this approach. You are responsible for implementing a conversion from your own types to ptree, which Boost can serialise to and from any format you like (XML, JSON).
Having had my share of experience with all three approaches in C++, I would recommend option 3 as your default. It seems to map nicely to POCO C++'s Parser and Var model for JSON. One option is to have all your entity POCO classes implement a to_var or from_var function, or you can keep these serialisation functions in a different namespace for each POCO class, so that you only have to include them when necessary.
If you are working on projects with a significant number of objects to serialise (e.g. messages in communication libraries), the pre-compiler option may be worth the initial setup effort and additional build complexity, but that depends, as always, on the specific project you're dealing with.

Strategy for Wrapping external library return types

I have been asked to unit test some legacy code.
Currently, the code is tightly coupled with a 3rd party library both in terms of method calls and types used.
I am planning on writing a wrapper around the library in the form of a Façade design pattern which will aid in testability, create a cleaner interface for the rest of the code and allow me to swap out the library in the future if required.
This works fine where the method calls are void return type because the library functions are self contained. But what if the existing code uses library specific types? An example is here:
LibrarySpecificType[] myVar = wrappedLibrary.DoX();
Although I have wrapped my library call in the above example, it still returns a library specific type, so it is still somewhat coupled.
Does anybody know a way around this?
you can just create wrapper classes around the types that are returned and have the wrappedLibrary return those wrapped types instead. This might be quite a lot of work if each of those types also exposes methods which accept and return other types. Something like this:
WrappedLibrarySpecificType[] myVar = wrappedLibrary.DoX();
Then in the library wrapper will have to call the actual library and wrap the type the library returns and return the wrapped type.
This ends up being a rabbit hole though and you will probably need to wrap every type.
If this is a large library you might find some benefit in writing (or using) a tool which will be able to generate the wrappers for you by reflecting over the types in the third-party library
you might have some assistance in generating the delegating members, depending on your ide

Parsing different xml messages. Versions

Say we want to Parse a XML messages to Business Objects. We split the process in two parts, namely:
-Parsing the XML messages to XML Grammar Objects.
-Transform XML Objects to Business Objects.
The first part is done automatically, generation a grammar object for each node.
The second part is done following the XML architecture so far. Example:
If we have the XML Message(Simplified):
<Main>
<ChildA>XYZ</ChildA>
<ChildB att1="0">
<InnerChild>YUK</InnerChild>
</ChildB>
</Main>
We could find the following classes:
DecodeMain(Calls DecodeChildA and B)
DecodeChildA
DecodeChildB(Calls DecodeInnerChild)
DecodeInnerChild
The main problem arrives when we need to handle versions of the same messages. Say we have a new version where only DecodeInnerChild changes(e.g.: We need to add an "a" at the end of the value)
It is really important that the solutions agile for further versions and as clean as possible. I considered the following options:
1)Simple Inheritance:Create two classes of DecodeInnerChild. One for each version.
Shortcomming: I will need to create different classes for every parent class to call the right one.
2)Version Parameter: Add to each method an Object with the version as a parameter. This way we will know what to do within each method according to each version.
Shortcoming: Not clean at all. The code of different versions is mixed.
3)Inheritance + Version Parameter: Create 2 classes with a base class for the common code for the nodes that directly changes (Like InnerChild) and add the version as a parameter in each method. When a node call the another class to decode the child object, it will use one or another class depending on the Version parameter.
4)Some kind of executor pattern(I do not know how to do it): Define at the start some kind of specifications object, where all the methods that are going to be used are indicated and I pass this object to a class that is in charge of execute them.
How would you do it? Other ideas are welcomed.
Thanks in advance. :)
How would you do it? Other ideas are welcomed.
Rather than parse XML myself I would as first step let something like CodesynthesisXSD to generate all needed classes for me and work on those. Later when performance or something becomes issue I would possibly start to look aound for more efficient parsers and if that is not fruitful only then i would start to design and write my own parser for specific case.
Edit:
Sorry, I should have been more specific :P, the first part is done
automatically, the whole code is generated from the XML schema.
OK, lets discuss then how to handle the usual situation that with evolution of software you will eventually have evolved input too. I put all silver bullets and magic wands on table here. If and what you implement of them is totally up to you.
Version attribute I have anyway with most things that I create. It is sane to have before backward-compatibility issue that can not be solved elegantly. Most importantly it achieves that when old software fails to parse newer input then it will produce complaint that makes immediately sense to everybody.
I usually also add some interface for converter. So old software can be equipped with converter from newer version of input when it fails to parse that. Also new software can use same converter to parse older input. Plus it is place where to plug converter from totally "alien" input. Win-win-win situation. ;)
On special case of minor change I would consider if it is cheap to make new DecodeInnerChild to be internally more flexible so accepts the value with or without that "a" in end as valid. In converter I have still to get rid of that "a" when converting for older versions.
Often what actually happens is that InnerChild does split and both versions will be used side-by-side. If there is sufficient behavioral difference between two InnerChilds then there is no point to avoid polymorphic InnerChilds. When polymorphism is added then indeed like you say in your 1) all containing classes that now have such polymorphic members have to be altered. Converter should usually on such cases either produce crippled InnerChild or forward to older version that the input is outside of their capabilities.

Options for parsing/processing C++ files

So I have a need to be able to parse some relatively simple C++ files with annotations and generate additional source files from that.
As an example, I may have something like this:
//# service
struct MyService
{
int getVal() const;
};
I will need to find the //# service annotation, and get a description of the structure that follows it.
I am looking at possibly leveraging LLVM/Clang since it seems to have library support for embedding compiler/parsing functionality in third-party applications. But I'm really pretty clueless as far as parsing source code goes, so I'm not sure what exactly I would need to look for, or where to start.
I understand that ASTs are at the core of language representations, and there is library support for generating an AST from source files in Clang. But comments would not really be part of an AST right? So what would be a good way of finding the representation of a structure that follows a specific comment annotation?
I'm not too worried about handling cases where the annotation would appear in an inappropriate place as it will only be used to parse C++ files that are specifically written for this application. But of course the more robust I can make it, the better.
One way I've been doing this is annotating identifiers of:
classes
base classes
class members
enumerations
enumerators
E.g.:
class /* #ann-class */ MyClass
: /* #ann-base-class */ MyBaseClass
{
int /* #ann-member */ member_;
};
Such annotation makes it easy to write a python or perl script that reads the header line by line and extracts the annotation and the associated identifier.
The annotation and the associated identifier make it possible to generate C++ reflection in the form of function templates that traverse objects passing base classes and members to a functor, e.g:
template<class Functor>
void reflect(MyClass& obj, Functor f) {
f.on_object_start(obj);
f.on_base_subobject(static_cast<MyBaseClass&>(obj));
f.on_member(obj.member_);
f.on_object_end(obj);
}
It is also handy to generate numeric ids (enumeration) for each base class and member and pass that to the functor, e.g:
f.on_base_subobject(static_cast<MyBaseClass&>(obj), BaseClassIndex<MyClass>::MyBaseClass);
f.on_member(obj.member_, MemberIndex<MyClass>::member_);
Such reflection code allows to write functors that serialize and de-serialize any object type to/from a number of different formats. Functors use function overloading and/or type deduction to treat different types appropriately.
Parsing C++ code is an extremely complex task. Leveraging a C++ compiler might help but it could be beneficial to restrict yourself to a more domain-specific less-powerful format i.e., to generate the source and additional C++ files from a simpler representation something like protobufs proto files or SOAP's WSDL or even simpler in your specific case.
I did some very similar work recently. The research I did indicated that there wasn't any out-of-the-box solutions available already, so I ended up hand-rolling one.
The other answers are dead-on regarding parsing C++ code. I needed something that could get ~90% of C++ code parsed correctly; I ended up using srcML. This tool takes C++ or Java source code and converts it to an XML document, which makes it easier for you to parse. It keeps the comments in-tact. Furthermore, if you need to do a source code transformation, it comes with an reverse tool which will take the XML document and produce source code.
It works in 90% of the cases correctly, but it trips on complicated template metaprogramming and the darkest corners of C++ parsing. Fortunately, my input source code is fairly consistent in design (not a lot of C++ trickery), so it works for us.
Other items to look at include gcc-xml and reflex (which actually uses gcc-xml). I'm not sure if GCC-XML preserves comments or not, but it does preserve GCC attributes and pragmas.
One last item to look at is this blog on writing GCC plugins, written by the author of the CodeSynthesis ODB tool.
Good luck!

What is the best design pattern to register data "chunks"?

I have a library which can save/load on disk "chunks" which are POD structs with constant size and unique static CHUNK_ID field. So load looks somethink like this.
void Load(int docId, char* ptr, int type, size_t& size)...
If you want to add new chunk you just add struct with new CHUNK_ID and use Save Load functions to it.
What I want is to force all "chunks" to have functions like PrintHumanReadable, CompareThisTypeOfChunk etc(Ideally program should not compile without such functions). Also I want to mark/register/enumerate all chunk-structs.
I have a few ideas but all of them have problems.
Create base class with pure virtual functions PrintHumanReadable, CompareThisTypeOfChunk.
Problem:breaks pod type and requires library rewriting.
Implement factory which creates chunk struct from CHUNK_ID. Problem: compiles when I add new chunk without required functions.
Could you recomend elegant design solution for my problem?
Implement a simple code generator. You can use something like Mako or Cheetah (both Python libraries). Make a text file containing all the class names, then have the generator build the factory method and a series of methods which aren't really used but which refer to the desired methods in all the classes. This will also make it straightforward to enumerate the classes (again, using generated code).
The proper design pattern for this is called "use Boost.Serialization". It's really the best tool for writing objects to a format and then reading them back later. It can write in text, binary, and even XML formats (and others if you write a proper stream for them). It's can be non-intrusive, so you don't need to modify the objects to serialize them. And so forth.
Once you're using the proper tool for this job, you can then use whatever class hierarchy or other method you like to ensure that the proper functions for an object exist.
If you can't/won't use Boost.Serialization, then you're pretty much stuck with a runtime solution. And since the solution is runtime rather than compile time, there's no way to ensure at compile time that any particular chunk ID has the requisite functions.