POCO C++ object to JSON string serialization - c++

I wonder how could I serialize an object of a given class (e.g. Person) with its attributes (e.g. name, age) to a JSON string using POCO C++ libraries.
Maybe I should create my models using Poco::Dynamic and Poco::Dynamic::Var in order to use POCO::JSON::Stringifier? I can't imagine how to do this...
Thanks in advance!

Unlike Java or C#, C++ doesn't have an introspection/reflection feature outside of Run-time type information (RTTI), which has a different focus and is limited to polymorphic objects. That means outside of a non-standard pre-compiler, you'll have to tell the serialisation framework one way or another how your object is structured and how you would eventually like to map it to a hierarchy of int, std::string and other basic data types. I usually differentiate between three different approaches to do so: pre-compiler, inline specification, property conversion.
Pre-compiler: A good example of the pre-compiler approach is Google Protocol Buffers: https://developers.google.com/protocol-buffers/docs/cpptutorial. You define your entities in a separate .proto file, which is transformed using a proprietary compiler to .c and .h entity classes. These classes can be used like regular POCO entities and can be serialised using Protocol Buffers.
Inline specification: Boost serialization (https://www.boost.org/doc/libs/1_67_0/libs/serialization/doc/index.html), s11n (www.s11n.net) and restc-cpp (https://github.com/jgaa/restc-cpp) are examples of explicitly specifying the structure of your POCOs for the framework inside your own code. The API to do so may be more or less sophisticated, but the principle behind it is always the same: You provide the framework serialise/deserialise implementations for your classes or you register metadata information which allows the framework to generate them. The example below is from restc-cpp:
struct Post {
int userId = 0;
int id = 0;
string title;
string body;
};
BOOST_FUSION_ADAPT_STRUCT(
Post,
(int, userId)
(int, id)
(string, title)
(string, body)
)
Property conversion: The last kind of serialisation that I don't want to miss mentioning is the explicit conversion to a framework-provided intermediate data type. Boost property tree (https://www.boost.org/doc/libs/1_67_0/doc/html/property_tree.html) and JsonCpp (http://open-source-parsers.github.io/jsoncpp-docs/doxygen/index.html) are good examples of this approach. You are responsible for implementing a conversion from your own types to ptree, which Boost can serialise to and from any format you like (XML, JSON).
Having had my share of experience with all three approaches in C++, I would recommend option 3 as your default. It seems to map nicely to POCO C++'s Parser and Var model for JSON. One option is to have all your entity POCO classes implement a to_var or from_var function, or you can keep these serialisation functions in a different namespace for each POCO class, so that you only have to include them when necessary.
If you are working on projects with a significant number of objects to serialise (e.g. messages in communication libraries), the pre-compiler option may be worth the initial setup effort and additional build complexity, but that depends, as always, on the specific project you're dealing with.

Related

How could you recreate UE4 serialization of user-defined types?

The problem
The Unreal Engine 4 Editor allows you to add objects of your own types to the scene.
Doing so requires minimal work from the user - to make a class visible in the editor you only need to add some macros, like UCLASS()
UCLASS()
class MyInputComponent: public UInputComponent //you can instantiate it in the editor!
{
UPROPERTY(EditAnywhere)
bool IsSomethingEnabled;
};
This is enough to allow the editor to serialize the created-in-editor object's data (remember: the class is user-defined but the user doesn't have to hardcode loading specific fields. Also note that the UPROPERTY variable can be of user-defined type as well). It is then deserialized while loading the actual game. So how is it handled so painlessly?
My attempt - hardcoded loading for every new class
class Component //abstract class
{
public:
virtual void LoadFromStream(std::stringstream& str) = 0;
//virtual void SaveIntoStream(std::stringstream& str) = 0;
};
class UserCreatedComponent: public Component
{
std::string Name;
int SomeInteger;
vec3 SomeVector; //example of user-defined type
public:
virtual void LoadFromStream(std::stringstream& str) override //you have to write a function like this every time you create a new class
{
str >> Name >> SomeInteger >> SomeVector.x >> SomeVector.y >> SomeVector.z;
}
};
std::vector<Component*> ComponentsFromStream(std::stringstream& str)
{
std::vector<Component*> components;
std::string type;
while (str >> type)
{
if (type == "UserCreatedComponent") //do this for every user-defined type...
components.push_back(new UserCreatedComponent);
else
continue;
components.back()->LoadFromStream(str);
}
return components;
}
Example of an UserCreatedComponent object stream representation:
UserCreatedComponent MyComponent 5 0.707 0.707 0.707
The engine user has to do these things every time he creates a new class:
1. Modify ComponentsFromStream by adding another if
2. Add two methods, one which loads from stream and another which saves to stream.
We want to simplify it so the user only has to use a macro like UPROPERTY.
Our goal is to free the user from all this work and create a more extensible solution, like UE4's (described above).
Attempt at simplifying 1: Using type-int mapping
This section is based on the following: https://stackoverflow.com/a/17409442/12703830
The idea is that for every new class we map an integer, so when we create an object we can just pass the integer given in the stream to the factory.
Example of an UserCreatedComponent object stream representation:
1 MyComponent 5 0.707 0.707 0.707
This solves the problem of working out the type of created object but also seems to create two new problems:
How should we map classes to integers? What would happen if we include two libraries containing classes that map themselves to the same number?
What will initializing e.g. components that need vectors for construction look like? We don't always use strings and ints for object construction (and streams give us pretty much only that).
So how is it handled so painlessly?
C++ language does not provide features which would allow to implement such simple de/serialization of class instances as it works in the Unreal Engine. There are various ways how to workaround the language limitations, the Unreal uses a code generator.
The general idea is following:
When you start project compilation, a code generator is executed.
The code generator parses your header files and searches for macros which has special meaning, like UCLASS, USTRUCT, UENUM, UPROPERTY, etc.
Based on collected data, it generates not only code for de/serialization, but also for other purposes, like reflection (ability to iterate certain members), information about inheritance, etc.
After that, your code is finally compiled along with the generated code.
Note: this is also why you have to include "MyClass.generated.h" in all header files which declare UCLASS, USTRUCT and similar.
In other words, someone must write the de/serialization code in some form. The Unreal solution is that the author of such code is an application.
If you want to implement such system yourself, be aware that it's lots of work. I'm no expert in this field, so I'll just provide general information:
The primary idea of code-generators is to automatize repetitive work, nothing more - in other words, there's no other special magic. That means that "how objects are de/serialized" (how they're transformed from memory to file) and "how the code which de/serializes is created" (whether it's written by a person or generated by an application) are two separate topics.
First, it should be established how objects are de/serialized. For example, std::stringstream can be used, or objects can be de/serialized from/to generally known formats like XML, json, bson, yaml, etc., or a custom solution can be defined.
Establish what's the source of data for generated de/serialization code. In case of Unreal Engine, it's user code itself. But it's not the only way - for example Protobuffers use a simple language which is used only to define data structure and the generator creates code which you can include and use.
If the source of data should be C++ code itself, do not write you own C++ parser! (The only exceptions to this rule are: educational purpose or if you want to spend rest of your life with working on the parser.) Luckily, there are projects which you can use - for example there's clang AST.
How should we map classes to integers? What would happen if we include two libraries containing classes that map themselves to the same number?
There's one fundamental problem with mapping classes to integers: it's not possible to uniquely map every possible class name to an integer.
Proof: create classes named Foo_[integer] and map it to the [integer], i.e. Foo_0 -> 0, Foo_1 -> 1, Foo_2 -> 2, etc. After you use biggest integer value, how do you map Bar_0?
You can start assigning the numbers sequentially as they're added to a project, but as you correctly pin-pointed, what if you include new library? You could start counting from some big number, like 1.000.000, but how do you determine what should be first number for each library? It doesn't have a clear solution.
Some of solutions to this problem are:
Define clear subset of classes which can be de/serialized and assign sequential integers to these classes. The subset can be, for example, "only classes in my project, no library support".
Identify classes with two integers - one for class, one for library. This means you have to have some central register which assigns library integers uniquely (e.g. in order they're registered).
Use string which uniquely identifies the class including library name. This is what Unreal uses.
Generate a hash from class and library name. There's risk of hash collision - the better hash you use, the lower risk there is. For example git (the version control application) uses SHA-1 (which is considered unsafe today) to identify it's objects (files, directories, commits) and the program is used worldwide without bigger issues.
Generate UUID, a 128-bit random number (with special rules). There's also risk of collision, but it's generally considered highly improbable. Used by Java and Unity the game engine.
What would happen if we include two libraries containing classes that map themselves to the same number?
That's called a collision. How it's handled depends on design of de/serialization code, there are mainly two approaches to this problem:
Detect that. For example if your class identifier contains library identifier, don't allow loading/registering library with ID which is already identified. In case of ID which doesn't include library ID (e.g. hash/UUID variant), don't allow registering such classes. Throw an exception or exit the application.
Assume there's no collision. If actual collision happens, it's so-called UB, an undefined behaviour. The application will probably crash or act weirdly. It might corrupt stored data.
What will initializing e.g. components that need vectors for construction look like? We don't always use strings and ints for object construction (and streams give us pretty much only that).
This depends on what it's required from de/serializing code.
The simplest solution is actually to use string of values separated by space.
For example, let's define following structure:
struct Person
{
std::string Name;
float Age;
};
A vector of Person instances could look like: 3 Adam 22.2 Bob 34.5 Cecil 19.0 (i.e. first serialize number of items (vector size), then individual items).
However, what if you add, remove or rename a member? The serialized data would become unreadable. If you want more robust solution, it might be better to use more structured data, for example YAML:
persons:
- name: Adam
age: 22.2
- name: Bob
age: 34.5
- name: Cecil
age: 19.0
Final notes
The problem of de/serializing objects (in C++) is actually big, various systems uses various solutions. That's why this answer is so generic and it doesn't provide exact code - there's not single silver bullet. Every solution has it's advantages and disadvantages. Even detailed description of just Unreal Engine's serialization system would become a book.
So this answer assumes that reader is able to search for various mentioned topic, like yaml file format, Protobuffers, UUID, etc.
Every mentioned solution to a sub-problem has lots of it's own problems which weren't explored. For example de/serialization of string with spaces or new lines from/to simple string stream. If it's needed to solve such problems, it's recommended to first search for more specialized questions or write one if there's nothing to be found.
Also, C++ is constantly evolving. For example, better support for reflection is added, which might, one day, provide enough features to implement high-quality de/serializer. However, if it should be done in compile-time, it would heavily depend on templates which slow down compilation process significantly and decrease code readibility. That's why code generators might be still considered a better choice.

How to pass a C++ Map into a RTI DDS connext Publisher and receive at RTI Subscriber

I am new to RTI DDS connext. I tried running some C++ examples(Hello_dynamic,Hello_simple) of rti and they where working fine.Then i thought of passing a C++ map as Topic type from publisher to Subscriber.But i guess their is no documentation and example codes are available for doing this. Please help me here ..??
The C++ standard map type can not natively be used as a Topic type. DDS can distribute any type that can be expressed by a defined subset of OMG's IDL (Interface Definition Language) and the map type is not among that.
The two code examples that you are referring to are not your typical situation, because they rely on the built-in string type (Hello_simple) or the proprietary dynamic data API (Hello_dynamic). To get a better idea how you would normally define your own datatypes, check out the Hello_idl example. It shows a user-defined type, defined in IDL, that is translated into a C++ type to be used by your application.
It would be fairly easy to create a Topic type to achieve functionality similar to a C++ map. Suppose your map-items have string keys and long values, then you could use a structure in IDL to express a single item in your map, for example using the following type:
struct mapItem {
unsigned long m_mapId; //#key
string m_key; //#key
long m_value;
};
m-mapId indicates to which map this item belongs. Your map is the collection of all mapItems with the same m_mapId value. The fields m_key and m_value are obviously the key-value pairs.
On the publisher side, your application can write the map elements to DDS one-by-one. Values with the same values for m_mapId and m_key will overwrite each other, resulting in the same behavior as expected for standard maps. On the subscriber side, it is possible to build up the complete map by querying the datareader for all mapItems with the same m_mapId.
Your application code will not be using standard maps when using this approach. In order to achieve that, you will have to create wrapper functions that translate a map(-like) API into the corresponding write and read actions.
In case you are familiar with regular data-base designs, you will notice the similarity to what you would be doing when designing this in a relational data model. Indeed, DDS can be considered a distributed data management infrastructure with a lot of similarities to regular DBMS-es.

Options for parsing/processing C++ files

So I have a need to be able to parse some relatively simple C++ files with annotations and generate additional source files from that.
As an example, I may have something like this:
//# service
struct MyService
{
int getVal() const;
};
I will need to find the //# service annotation, and get a description of the structure that follows it.
I am looking at possibly leveraging LLVM/Clang since it seems to have library support for embedding compiler/parsing functionality in third-party applications. But I'm really pretty clueless as far as parsing source code goes, so I'm not sure what exactly I would need to look for, or where to start.
I understand that ASTs are at the core of language representations, and there is library support for generating an AST from source files in Clang. But comments would not really be part of an AST right? So what would be a good way of finding the representation of a structure that follows a specific comment annotation?
I'm not too worried about handling cases where the annotation would appear in an inappropriate place as it will only be used to parse C++ files that are specifically written for this application. But of course the more robust I can make it, the better.
One way I've been doing this is annotating identifiers of:
classes
base classes
class members
enumerations
enumerators
E.g.:
class /* #ann-class */ MyClass
: /* #ann-base-class */ MyBaseClass
{
int /* #ann-member */ member_;
};
Such annotation makes it easy to write a python or perl script that reads the header line by line and extracts the annotation and the associated identifier.
The annotation and the associated identifier make it possible to generate C++ reflection in the form of function templates that traverse objects passing base classes and members to a functor, e.g:
template<class Functor>
void reflect(MyClass& obj, Functor f) {
f.on_object_start(obj);
f.on_base_subobject(static_cast<MyBaseClass&>(obj));
f.on_member(obj.member_);
f.on_object_end(obj);
}
It is also handy to generate numeric ids (enumeration) for each base class and member and pass that to the functor, e.g:
f.on_base_subobject(static_cast<MyBaseClass&>(obj), BaseClassIndex<MyClass>::MyBaseClass);
f.on_member(obj.member_, MemberIndex<MyClass>::member_);
Such reflection code allows to write functors that serialize and de-serialize any object type to/from a number of different formats. Functors use function overloading and/or type deduction to treat different types appropriately.
Parsing C++ code is an extremely complex task. Leveraging a C++ compiler might help but it could be beneficial to restrict yourself to a more domain-specific less-powerful format i.e., to generate the source and additional C++ files from a simpler representation something like protobufs proto files or SOAP's WSDL or even simpler in your specific case.
I did some very similar work recently. The research I did indicated that there wasn't any out-of-the-box solutions available already, so I ended up hand-rolling one.
The other answers are dead-on regarding parsing C++ code. I needed something that could get ~90% of C++ code parsed correctly; I ended up using srcML. This tool takes C++ or Java source code and converts it to an XML document, which makes it easier for you to parse. It keeps the comments in-tact. Furthermore, if you need to do a source code transformation, it comes with an reverse tool which will take the XML document and produce source code.
It works in 90% of the cases correctly, but it trips on complicated template metaprogramming and the darkest corners of C++ parsing. Fortunately, my input source code is fairly consistent in design (not a lot of C++ trickery), so it works for us.
Other items to look at include gcc-xml and reflex (which actually uses gcc-xml). I'm not sure if GCC-XML preserves comments or not, but it does preserve GCC attributes and pragmas.
One last item to look at is this blog on writing GCC plugins, written by the author of the CodeSynthesis ODB tool.
Good luck!

C++ 'wrapper class' for XML library

So I've been attempting to create some classes around the xerces XML library so I can 'hide' it from the rest of my project the underlying xml library stays independent from the rest of my project.
This was supposed to be a fairly easy task, however it seems entirely impossible to hide a library from the rest of a project by writing some classes around it.
Have I got the wrong approach or is my 'wrapper' idea completely silly?
I end up with something like this:
DOMElement* root(); //in my 'wrapper' class, however this DOMElement is part of the xerces library, at this point my 'wrapper' is broken. Now I have to use the xerces library everywhere I want to use this function.
Where is my thinking gone wrong?
I would recommend avoiding the wrapper in the first stage. Just make sure that the layers and their borders are clear, i.e. the network layer takes care of serializing/deserializing the XML, and from there on you only use your internal types. If you do this, and at a later stage you need to replace xerces with any other library, just replace the serialization layer. That is, instead of wrapping each XML object, just wrap the overall operation: serialize/deserialize.
Writing your own abstract interface for a library is not a silly idea IF you have plan to change or to have the possibility to change the library you are using.
You should not rely on your library object to implement your wrapper interface. Implement your own structure and your own function interface. It will ease a lot of work when you will want to change how xml is implemented (eg: change library).
One example of implementation:
class XmlElement
{
private:
DOMElement element; // point to the element of your library
public:
// Here you define how its public interface.
// There should be enough method/parameter to interact
// with any xml interface you will use in the future
XmlElement getSubElement(param)
{
// Create the Xmlelement
// Set the DOMElement wanted
// return it
}
}
In your program you will see:
void function()
{
XmlElement root();
root.getSubElement("value"); // for example
}
Like that no DOMElement or their function appear in the project.
As I mentioned in my comments, I would take a slightly different approach. I would not want my codebase to be dependent on the particular messaging format (xml) that I am using (what if for example you decide to change the xml to something else later?) Instead I would work with a well defined object model and have a simple encoder/decoder to handle the conversion to XML string and vice versa. This encode/decoder would then be the bit that I would replace if the underlying wire format changed.
The decoder would take in the data read from the socket, and produce a suitable object (with nested objects to represent the request) and the decoder would take a similar object and generate the XML from it. If performance is not a primary concern, I would use a library such as TinyXML which is quite lightweight - heck, you can strip that down even further and make it more light weight...

Parsing huge data with c++

In my job, i need to parse different kind of data files from different data sources.Sometimes i parse them by writing directly c++ code (with the help of qt and boost:D), sometimes manually with a helper program.
I must note that data types are so different from each other it is so hard to create common a interface for all of them. But i want to do this job in a more generic way.I am planning to write a library to convert them and it should be easy to add new parser utility in future.I am also planning to use other helper programs inside my program, not manually.
My question is what kind of an architecture or pattern do you suggest, Basic condition is library must be extendable via new classes or dll's and also configurable.
By the way data can be in text, ascii or something like CSV(comma seperated values) and most of them are specific for a certain data.
Not to blow my own trumpet, but my small Open Source utility CSVfix has an extensible architecture based on deriving new C++ classes with a very simple interface. I did consider using a plugin-architecture with DLLs but it seemed like overkill for such a simple utility . If interested, you can get the binaries & sources here.
I'd suggest a 3-part model, where the common data-format is a String which should be able to contain every value:
Reader: In this layer the values are read from the source (ie. CSV-file) using some sort of file-format-descriptor. The values are then stored in some sort of intermediate data structure.
Connector/Converter: This layer is responsible for mapping the reader-data to the writer-fields.
Writer: This layer is responsible for writing a specific data structure to the target (ie. another file-format or a database).
This way you can write different Readers for different input files.
I think the hardest part would be creating the definition of the intermediate storage format/structure so that it is future-proof and flexible.
One method I used for defining data structure in my datafile read/write classes is to use std::map<std::string, std::vector<std::string>, string_compare> where the key is the variable name and the vector of strings is the data. While this is expensive in memory, it does not lock me down to only numeric data. And, this method allows for different lengths of data within the same file.
I had the base class implement this generic storage, while the derived classes implemented the reader/writer capability. I then used a factory to get to the desired handler, using another class that determined the file format.