boost serialization omit version for a wrapper - c++

How can I tell boost that for a particular structure it should not write/read a class "version" identifier?
I am writing some wrapper classes for serializing some types in a smaller fashion (like a variable length integer). If the wrapper gets a class version written the whole point of the size reduction is lost (it'll end up bigger in most cases).
For example, given integer a I'll be replacing this code:
ar & a;
with this:
ar & wrapper(a);
I see the is_wrapper trait, but I can't really find any docs on what that does, or if it might help.

Add
BOOST_CLASS_IMPLEMENTATION(wrapper, boost::serialization::object_serializable)
It's the documented way.

Related

C++ serialization of data-structures

I'm studying serializations in C++. What's the advantage/difference of boost::serialization if compared to something like:
ifstream_obj.read(reinterpret_cast<char *>(&obj), sizeof(obj)); // read
// or
ofstream_obj.write(reinterpret_cast<char *>(&obj), sizeof(obj)); // write
// ?
and, which one is better to use?
The big advantages of Boost Serialization are:
it actually works for non-trivial (POD) data types (C++ is not C)
it allows you to decouple serialization code from archive backend, thereby giving you text, xml, binary serialization
If you use the proper archive you can even have portability (try that with your sample). This means you can send on one machine/OS/version and receive on another without problems.
Lastly, it adds (a) layer(s) of abstraction which make things a lot less error prone. Granted, you could have done the same for your suggested serialization approach without much issue.
Here's an answer that does the kind of serialization you suggest but safely:
How to pass class template argument to boost::variant?
Note that Boost Serialization is fully aware of bitwise serializable types and you can tell it about your own too:
Boost serialization bitwise serializability

Serializing C++ objects

I would like to implement a Serialization Class which takes in an object and converts it into a binary stream and stored in a file. Later, the object should be reconstructed from a file.
Though this functionality is provided by BinaryFormatter in C#, I would like to design my
own Serialization class from scratch.
Can someone point to some resources ?
Thanks in advance
I would like to give you a negative answer. It is less useful but it still may be.
I have been using boost serialization for several years and it was one of the greatest strategic mistakes of my company. It produces very large output, it is very slow, it propagates a whole bunch of dependencies making everything impossibly slow to compile, and then it is hard to get out because you have existing serialized formats. Further, it behaves differently on different compilers, thus upgrade from VS2005 to 2010 actually caused us to write a compatibility layer, which is also hard coz the code is very hard to understand.
Here are 2 solutions for C++ serialization:
Stephan Beal's s11n serialization library
boost serialization library
I personally only have experience with the 1st one and actually only used text based serializers, but i know that it's easy to define binary serializers for use with s11n.
I have been using boost::serialization library for a while now and I think it is very good. You just need to create the serialization code like this:
class X {
private:
std::string value_;
public:
template void serialize(Archive &ar, const unsigned int version) {
ar & value_;
};
}
No need to create the de-serialization code ( that's why they used the & operator ). But if you prefer you can still use the << and >> operators.
Also it's possible to write the serialization method for a class with no modification ( ex.: if you need to serialize an object that comes from a library ). In this case, you should do something like:
namespace boost { namespace serialization {
template
void serialize(Archive &ar, X &x const unsigned int version) {
ar & x.getValue();
};
}}
The C++ Middleware Writer may be of interest. It has performance advantages over the serialization library in Boost. It also automates the creation of serialization functions.

Datastructure Storage in Filesystem

I am trying to write a persistent datastructure in C++ , however I feel that I should be able to make it binary compatible with various other implementations of my datastructure readers, and hence, my current idea is to declare datastructure in the native memory without any abstraction.
For example, I would specify a linear block of memory as a datastructure (using new keyword) and then describe what the first byte means, what the second byte means and so on. I know I can do this using struct but then, the datastructure would be bound to one language and other languages will have to then use this structure. Also, the implementation might then change from compiler to compiler. I would instead like it as a memory standard.
Is what I am trying to do somewhat sensible? Or I am trying to over-simplify things and should really proceed with a struct data structure? Now onto the C++ part, if you believe that I should be using a struct data structure, then what are the disadvantages of using a full-fledged class?
(I am using a class anyway to wrap around the memory structure and provide functions to it since the datastructure is anyway persistent.)
EDIT
As justin as suggested, I do not need any such advanced interface wrapper around the memory structure, so my last point about class wrapper is not stated properly. What I mean is I would like to have a class interface for the memory representation, it does not necessarily have to be a wrapper.
Several file formats I have read/worked with do exactly that -- define a memory standard or layout, then typically back it up with a demonstration in C-like pseudo-structure. Sometimes they will provide struct or class representations, and some are completely abstracted by a library. Of course, these formats go on to document all fields, their sizes, the endianness of the data and so on.
I figure endian related issues, padding, complexity (e.g. introduced by variations in the data structures) and proper versioning are the biggest sources of errors. Another issue I find is the use of data structures of yesteryear and inconsistency of data structures used to represent similar functionalities -- You may receive a spec, and realize it contains several different string representations -- all of which are archaic, and somebody has to go on to support all of these (bidirectionally).
Proceeding that route:
You should not commit to a binary representation (or compilable program) if you don't want to support it (and attempts of long-lived formats fail/stumble along the way, as platforms and toolsets change). Just commit to a formal memory standard at first, then build on top of that with tests and example input files to verify the representation is properly serialized and deserialized correctly. A very basic test suite will help ensure your model is portable on all the systems you need, and can point out potential pitfalls or platform specific considerations you may not have been aware of.
If you really want to provide a compilable representation, I'd stick with a very compliant struct representation -- clients can take that (in memory) representation and turn it into any C++ abstraction/representation they like. That is to say, a serialized representation should probably not reflect that of a representation in memory, apart from trivially simple representations and the intermediate storage of such a representation (flattened and packed structs).
One of the important parts is that you should have tests which confirm your in memory object graph which you create with these structs are forward and backwards serializable and de-serializable, and support proper versioning -- so it often takes a bit of work to make a complex serialized representation compatible. So you see this approach just introduces one abstraction layer on top of another. In this regard, you may want give C++ abstraction the ability to create itself from the packed in memory representation, and to ensure that that representation can also correctly populate the packed structure without data loss.
Beyond that, is there any need to have a more advanced interface? If there is, then you may want to provide that information.
So yes, the memory standard is the part that you must get correct and stable, and to which all implementations should refer to and test against -- regardless of platform/architecture differences. IOW, you're on the right track ;)
In C++ there's no practical difference between struct and class (besides the default accessibility being public in struct). Traditionally, struct is used when a type only has (public) member variables and no member functions but this is only a convention, not a rule enforced by the compiler.
I'd certainly use a struct/class to describe the data. If someone wants to write a reader of your data structure, they can either import your header file or implement the data structure in their language of choice - in most programming languages this should be pretty simple.
I recommend you start your structure something like this:
typedef struct
{
int Version; // struct layout version
int ByteSize; // byte size of structure for validation
...
} MYDATA;
This way when your data structure is being passed around, your code can verify that the allocated structure size matches with how many bytes you'd expect for a given version of your structure. You could then easily introduce new versions of your structure by simply updating the version field and checking for the new size.
When you save your data to disk, make sure that you write it out field-by-field, rather than through a single write (using a pointer and sizeof() to ensure that other languages won't have to deal with potential padding that your C++ compiler may decide to put in. It's possible to manually lay out fields in the structure so that there's no padding but you have to be very, very careful while doing that and it's easy to make mistakes.

Options for parsing/processing C++ files

So I have a need to be able to parse some relatively simple C++ files with annotations and generate additional source files from that.
As an example, I may have something like this:
//# service
struct MyService
{
int getVal() const;
};
I will need to find the //# service annotation, and get a description of the structure that follows it.
I am looking at possibly leveraging LLVM/Clang since it seems to have library support for embedding compiler/parsing functionality in third-party applications. But I'm really pretty clueless as far as parsing source code goes, so I'm not sure what exactly I would need to look for, or where to start.
I understand that ASTs are at the core of language representations, and there is library support for generating an AST from source files in Clang. But comments would not really be part of an AST right? So what would be a good way of finding the representation of a structure that follows a specific comment annotation?
I'm not too worried about handling cases where the annotation would appear in an inappropriate place as it will only be used to parse C++ files that are specifically written for this application. But of course the more robust I can make it, the better.
One way I've been doing this is annotating identifiers of:
classes
base classes
class members
enumerations
enumerators
E.g.:
class /* #ann-class */ MyClass
: /* #ann-base-class */ MyBaseClass
{
int /* #ann-member */ member_;
};
Such annotation makes it easy to write a python or perl script that reads the header line by line and extracts the annotation and the associated identifier.
The annotation and the associated identifier make it possible to generate C++ reflection in the form of function templates that traverse objects passing base classes and members to a functor, e.g:
template<class Functor>
void reflect(MyClass& obj, Functor f) {
f.on_object_start(obj);
f.on_base_subobject(static_cast<MyBaseClass&>(obj));
f.on_member(obj.member_);
f.on_object_end(obj);
}
It is also handy to generate numeric ids (enumeration) for each base class and member and pass that to the functor, e.g:
f.on_base_subobject(static_cast<MyBaseClass&>(obj), BaseClassIndex<MyClass>::MyBaseClass);
f.on_member(obj.member_, MemberIndex<MyClass>::member_);
Such reflection code allows to write functors that serialize and de-serialize any object type to/from a number of different formats. Functors use function overloading and/or type deduction to treat different types appropriately.
Parsing C++ code is an extremely complex task. Leveraging a C++ compiler might help but it could be beneficial to restrict yourself to a more domain-specific less-powerful format i.e., to generate the source and additional C++ files from a simpler representation something like protobufs proto files or SOAP's WSDL or even simpler in your specific case.
I did some very similar work recently. The research I did indicated that there wasn't any out-of-the-box solutions available already, so I ended up hand-rolling one.
The other answers are dead-on regarding parsing C++ code. I needed something that could get ~90% of C++ code parsed correctly; I ended up using srcML. This tool takes C++ or Java source code and converts it to an XML document, which makes it easier for you to parse. It keeps the comments in-tact. Furthermore, if you need to do a source code transformation, it comes with an reverse tool which will take the XML document and produce source code.
It works in 90% of the cases correctly, but it trips on complicated template metaprogramming and the darkest corners of C++ parsing. Fortunately, my input source code is fairly consistent in design (not a lot of C++ trickery), so it works for us.
Other items to look at include gcc-xml and reflex (which actually uses gcc-xml). I'm not sure if GCC-XML preserves comments or not, but it does preserve GCC attributes and pragmas.
One last item to look at is this blog on writing GCC plugins, written by the author of the CodeSynthesis ODB tool.
Good luck!

How to load a serialized boost::variant?

I'm not able to use boost::serialization because it has library dependencies so I'm trying to figure out a way to do it myself. It doesn't matter if that means copying from boost::serialization.
After reading this answer to a similar question, I had a look at boost/serialization/variant.hpp and found save() function which is straight forward and understandable for me.
However the load() function looks more complicated: There is a recursion involving load() and variant_impl<types>::load() and a decremented which parameter.
So apparently the code iterates over each type of the variant in order to convert the int which into a type.
The rest is beyond me.
I know that boost has lot of code to make it portable so maybe there is a less-portable but easier way to do this?
If you were to remove the serialization stuff from a copy of boost/serialization/variant.hpp (apart from the Archive template parameter) - i.e. get throw your own exception types and change e.g.
ar >> BOOST_SERIALIZATION_NVP(which);
// to:
ar >> which;
Then it looks like you should be able to replace Archive with std::ostream or std::istream in the save/load functions, respectively.
Not tried it, but at a glance it looks like it should work.
I guess it does depend on what you are actually using to serialize the data if not using boots::serialization?