C++ Boost.serialization vs simple load/save - c++

I am computational scientist that work with large amount of simulation data and often times I find myself saving/loading data into/from the disk. For simple tasks, like a vector, this is usually as simple as dumping bunch of numbers into a file and that's it.
For more complex stuff, life objects and such, I have save/load member functions. Now, I'm not a computer scientist, and thus often times I see terminologies here on SO that I just do not understand (but I love to). One of these that I've came across recently is the subject of serialization and Boost.Serialization library.
From what I understand serialization is the simply the process of converting your objects into something that can be saved/loaded from dist or be transmitted over a network and such. Considering that at most I need to save/load my objects into/from disk, is there any reason I should switch from the simple load/save functions into Boost.Serialization? What would Boost.Serialization give me other than what I'm already doing?

That library takes into accounts many details that could be non very apparent from a purely 'applicative' point of view.
For instance, data portability WRT big/little numeric endianess, pointed data life time, structured containers, versioning, non intrusive extensions, and more. Moreover, it handles the right way the interaction with other std or boost infrastructure, and dictates a way of code structuring that will reward you with easier code maintenance. You will find ready to use serializers for many (all std & boost ?) containers.
And consider if you need to share your data with someone other, there are chances that referring to a published, mantained, and debugged schema will make things much easier.

Related

Script to binary conversion & Object serialization

All the code I write (C++ or AS3) is heavily scripted (JSON or XML). My problem is that parsing can be very slow at times, especially with less powerful devices like mobiles.
Here is an example of a Flash script of mine:
<players class="fanlib.gfx.TSprite" vars="x=0|y=-50|visible=Bool:true">
<player0 class="fanlib.gfx.TSprite" vars="x=131|y=138">
<name class="fanlib.text.TTextField" format="Myriad Pro,18,0xffffff,true,,,,,center,,,,0" alignX="center" alignY="bottom" filters="DropShadow,2" vars="background=Bool:false|backgroundColor=0|embedFonts=Bool:true|multiline=Bool:false|mouseEnabled=Bool:false|autoSize=center|text=fae skata|y=-40"/>
<avatar class="fanlib.gfx.FBitmap" alignX="center" alignY="center" image="userDefault.png"/>
<chip class="fanlib.gfx.FBitmap" alignX="center" alignY="center" image="chip1.png" vars="x=87|y=68"/>
<info class="fanlib.text.TTextField" format="Myriad Pro,18,0xffffff,true,,,,,center,,,,0" alignX="center" alignY="top" filters="DropShadow,2" css=".win {color: #40ff40}" vars="y=40|background=Bool:false|backgroundColor=0|embedFonts=Bool:true|multiline=Bool:false|mouseEnabled=Bool:false|autoSize=center"/>
</player0>
<player1 class="Copy:player0" vars="x=430|y=70">
<chip class="Child:chip" image="chip2.png" vars="x=-82|y=102"/>
</player1>
<player2 class="Copy:player0" vars="x=778|y=70">
<chip class="Child:chip" image="chip3.png" vars="x=88|y=103"/>
</player2>
<player3 class="Copy:player0" vars="x=1088|y=137">
<chip class="Child:chip" image="chip4.png" vars="x=-111|y=65"/>
</player3>
<player4 class="Copy:player0" vars="x=1088|y=533">
<chip class="Child:chip" image="chip5.png" vars="x=-88|y=-23"/>
</player4>
<player5 class="Copy:player0" vars="x=585|y=585">
<chip class="Child:chip" image="chip6.png" vars="x=82|y=-54"/>
</player5>
<player6 class="Copy:player0" vars="x=117|y=533">
<chip class="Child:chip" image="chip7.png" vars="x=85|y=-26"/>
</player6>
</players>
The script above creates "native" (as in "non-dynamic") Flash objects. TSprite is a Sprite descendant, FBitmap inherits from Bitmap etc. At 71KBs, it takes tens of seconds to be parsed on my Sony XPeria.
Instead of optimizing the parser (which wouldn't probably gain too much anyway) I am contemplating converting my scripts to binaries, so that scripts will be used for debugging and the finalized binaries for release-built code.
One question is, how does one handle pointers from one object to another when serializing them? How are pointers translated from memory to disk file-friendly format,then back to memory?
Another question is, what about "nested" objects? In Flash for example, an object can be a graphics container of other objects. Could such a state be serialized? Or must objects be saved separately and, when loaded from disk, added to their parents through the nesting functions (i.e. addChild etc...)?
If possible, I would prefer generic guidelines that could apply to languages as different as C++ or AS3.
As far as I've understood your idea is to save some time by replacing creation of the objects from some fixture script (xml/json) to deserializing (from binary) previously serialized objects. If that's the case I beleive that you've taken wrong approach to solve this issue.
Since you've asked for general guidelines I'll try to explain my reasoning, not delving too deeply into language-specific details. Note that I'll talk about the common case and there may be exceptions from it. There is no silver bullet and you should analyze your scenario to pick the best solution for it.
From one point of view creating set of objects based on fixture/script is not that different from deserializing objects from binary. In the end they both are about turning some "condensed" state into objects that you can use. Yes, it is true that binaries are usually smaller in size but formats like JSON have not that many overhead in commomon case (XML is usually more redundant though). In general you won't save much time/memory on deserializing this state from binary instead of parsing it from script. Here is a real world example from somthing I worked with: Mental Ray is a de-facto standard for rendering 3d scenes\special effects in a movie industry. It uses textual file format to represent scenes that is somewhat similar to JSON in many aspects. Mental Ray is heavy on computations, performance is one of the key issues here, yet it lives perfectly fine without binary scene file format. So, analyzing this aspect you can say that there is no substantial difference between these 2 approaches.
From another point of view, there is a difference that may come into play. While deserializing object implies only creating object and loading state into it's fields, creating object from script may also include some extra initialization on top of that. So, in some cases there may be benefit from using deserialization approach.
However, in the end I would argue that it is not a good idea to simply replase your scripted objects to serialized objects because scripting and serialization are conceptually different things and have different purposes (though they do have something in common). Using serialization approach you'll loose flexibility in modifying your fixtured state (it is usually much harder for humans to edit binaries instead of JSON/XML) as well as ability to do initialization work.
So, think about what you actually need in your scenario and stick with it. That's the best way.
Now, if it happen to be that you actually need your objects to be scripted but this approach is not fast enough I would investigate speeding it up in one of 2 ways:
Analyze whether it is possible to restructure your data the way it will take less time to load. This is not always possible, however, it might be worth trying it.
Analyze what else your scripting engine does to init object except of simply creating them and load state into their fields and try to optimize it. This approach is actually has the most potential since this is the only part that has substantional difference in terms of performance (between scripting and deserialization approach) and does not lead to misuse of concepts. Try to see whether you are able to reduce this amount of work, needed to init object. It may be a good idea to tailor something more specific for your needs if you are currently using some generic scripting engine\framework at the moment.
Now, answering your original questions...
how does one handle pointers from one object to another when serializing them?
References are headache most of serialization implementations does not mess with.
One of the approaches is to use something to identify object during serialization (pointer, for example), serialize object preserving this this identity, and store references from another objects to this object not as primitive type, but as a reference type (basically saving identity). When deserializing - keep track of all deserialized objects and reuse them when deseializing reference typed field.
How are pointers translated from memory to disk file-friendly format,then back to memory?
It is a rarity when serialization deals with raw memory. This approach is only good for primitive types and does not work with poiners/references well. Languages that support reflection\introspection usually use it to inspect values of the fields to serialize objects. If we talk about pure C++, where reflection is poor - there is no other reliable way except to make object itself define means to serialize itself to bytestream and use these methods.
Another question is, what about "nested" objects? In Flash for example, an object can be a graphics container of other objects. Could such a state be serialized? Or must objects be saved separately and, when loaded from disk, added to their parents through the nesting functions (i.e. addChild etc...)?
If we talk about deserialization - these should probably be treated like references (see answer above). Using methods like addChild is not a good idea since it can contain some logic that will mess things.
Hope this answers your questions.
You should realty take a look at Adobe Remote Object.
Typically using serialization could cost you problems, such as:
I have a serialized object from application version 2.3 and now at the new version 2.4 it been modified: property removed / property added. This makes my serialized object unparsable.
While developing a serialization protocol that will support cross platform, you may actually wish to kill yourself while debugging. I remember myself doing this and I spent hours to find out that my flash using Big Indian and C# using Small Indian.
Adobe solved those problems for you, they created a nice binary protocol called AMF - Action Message Format. It have many implementations on various platforms that can communicate with your actions script.
Here you may find some C++ implementations.

Efficient ways to save and load data from C++ simulation

I would like to know which are the best way to save and load C++ data.
I am mostly interested in saving classes and matrices (not sparse) I use in my simulations.
Now I just save them as txt files, but if I add a member to a class I then have to modify the function that loads the data (it has to parse and check for the value in the txt file),
that I think is not ideal.
What would you recommend in general? (p.s. as I'd like to release my code I'd really like to use only standard c++ or libraries that can be redistributed).
In this case, there is no "best." What is best for you is highly dependent upon your situation. But, lets have an example to get you thinking about your details and how deep this rabbit hole can go.
If you absolutely positively must have the fastest save possible without question (and you're willing to pay the price), you can define your own memory management to put all objects into a contiguous array of a common type (such as integers). This allows you to write that array to disk as binary data very rapidly. You might need this in a simulation that uses threads efficiently to load every core/processor to run at real time.
Why is a rather horrible solution? Because it takes a LOT of work and runs many risks for problems in the name of "optimization."
It requires you to build your own memory management (operator new() and operator delete()) which may need to be thread safe.
If you try to load from this array, you will have to placement new all objects with a unique non-modifying constructor in order to ensure all virtual pointers are set properly. Oh, and you have to track the type of each address to now how to do this.
For portability with other systems and between versions of the binary, you will need to have utilities to convert from the binary format to something generic enough to be cross platform (including repopulating pointers to other objects).
I have done this. It was highly unpleasant. I have no doubt there are still problems with it and I have only listed a few here. But, it was very, very fast and very, very, very problematic.
You must design to your needs. Generally, the first need is "Make it work." Don't care about efficiency, just about something that accurately persists and that you have the information known and accessible at some point to do it. Also, you should encapsulate the process of saving and loading. Then, if the need "Make it better" steps in, you should be able to change that one bit of code and the rest should work. You might even make the saving format selectable on user needs instead of your needs which you must assume for all users.
Given all the assumptions, pros and cons listed, you should be able to elaborate your particular needs for this question.
Given that performance is not your concern -- which is a critical part of the answer -- the Boost Serialization library is a great answer.
The link in the comment leads to the documentation. Read the tutorial (which is overkill for what you are initially wanting, but well worth it).
Finally, since you have mostly array matrices, try to encapsulate the entire process of save and load so that should you need to change it later, you are writing a new implementatio and choosing between the exisiting. I expend the eddedmtime for the smarts of Boost Serialization would not be great; however, you might find a future requirement moves you to something else or multiple something elses.
The C++ Middleware Writer automates the creation of marshalling functions. When you add a member to a class, it updates the marshalling functions for you.

Good Design for C++ Serialization

i^m currenty searching for a good OO design to serialize a C++/Qt Application.
Imagine the classes of the application organized based on a tree structure, implemented with the Composite-Pattern, like in the following picture.
The two possible principles i thought of:
1.)
Put the save()/load() function in every class which has to be serializable.
If have seen this many times, usually implemented with boost.
Somewhere in the class you will find something like this:
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & m_meber1;
}
You could also seperate this into save() and load().
But the disadvantage of this approach is :
If you wannt to change the serialization two months later (to XML, HTML or something very curious, which boost not supports) you have to adopt all the thousands of classes.
Which in my opinion is not a good OO-Design.
And if you wannt to support different serializations (XML, Binary, ASCII, whatever....) at the same time than 80% of the cpp exists just for serialization functions.
2.)
I know boost also provides a Non intrusive Version of the Serialization
"http://www.boost.org/doc/libs/1_49_0/libs/serialization/doc/tutorial.html"
So another way is to implement an Iterator which iterates over the composite tree structure and serializes every object (and one iterator for the deserialization)
(I think this is what the XmlSerializer-Class of the .NET-Framework does, but i^m not realy familiar with .NET)
This sounds better because seperate save() and load() and only one spot to change if serialization changes.
So this sounds better, BUT:
- You have to provide a setter() and a getter() for every parameter you wannt to serialize. (So, there is no private Data anymore. (Is this good/bad?))
- You could have a long inheritance hirarchy (more than 5 Classes) hanging on the composite tree.
So how do you call the setter()/getter() of the derived classes?
When you can only call a interface function of the base Composite-Component.
Another way is to serialize the objects data into a seperate abstract format.
From which all the possible following serializations (XML, TEXT, whatever is possible) get their data.
One Idea was to serialize it to QDomNode.
But i think the extra abstraction will cost performance.
So my question is:
Does anyone know a good OO-Design for serialization?
Maybe from other programming languages like JAVA, Python, C#, whatever....
Thank you.
Beware of serialization.
Serialization is about taking a snapshot of your in-memory representation and restoring it later on.
This is all great, except that it starts fraying at the seams when you think about loading a previously stored snapshot with a newer version of the software (Backward Compatibility) or (god forbid) a recently stored snapshot with an older version of the software (Forward Compatibility).
Many structures can easily deal with backward compatibility, however forward compatibility requires that your newer format is very close to its previous iteration: basically, just add/remove some fields but keeps the same overall structure.
The problem is that serialization, for performance reasons, tends to tie the on-disk structure to the in-memory representation; changing the in-memory representation then requires either the deprecation of the old archives (and/or a migration utility).
On the other hand, messaging systems (and this is what google protobuf is) are about decoupling the exchanged messages structures from the in-memory representation so that your application remains flexible.
Therefore, you first need to choose whether you will implement serialization or messaging.
Now you are right that you can either write the save/load code within the class or outside it. This is once again a trade-off:
in-class code has immediate access to all-members, usually more efficient and straightforward, but less flexible, so it goes hand in hand with serialization
out-of-class code requires indirect access (getters, visitors hierarchy), less efficient, but more flexible, so it goes hand in hand with messaging
Note that there is no drawback about hidden state. A class has no (truly) hidden state:
caches (mutable values) are just that, they can be lost without worry
hidden types (think FILE* or other handle) are normally recoverable through other ways (serializing the name of the file for example)
...
Personally I use a mix of both.
Caches are written for the current version of the program and use fast (de)serialization in v1. New code is written to work with both v1 and v2, and writes in v1 by default until the previous version disappears; then switches to writing v2 (assuming it's easy). Occasionally, massive refactoring make backward compatibility too painful, we drop it on the floor at this point (and increment the major digit).
On the other hand, exchanges with other applications/services and more durable storage (blobs in database or in files) use messaging because I don't want to tie myself down to a particular code structure for the next 10 years.
Note: I am working on server applications, so my advices reflect the particulars of such an environment. I imagine client-side apps have to support old versions forever...

Loading and saving a class to a binary file

I don't know if this is possible but, I have a class and I'v made an instance of it. I also put things in it. It has vectors and other things. I was wondering if I could save its contents (the instance) to a binary file, then reload it and cast it in from the file. Thanks
Yes, sometimes, kinda...
Serialization is a tricky problem. Don't solve it yourself (i.e. don't reinvent the wheel... plenty of smart people have already done this). What you've described works in a constrained environment:
Your reading and writing machines have the same endianness.
Your class contains data only within its footprint (no pointers or objects with pointers).
This isn't for the real world
the real world usually needs something better
the real world usually wants backwards compatible against changes
the real world usually can't anticipate hardware changes
You probably want to look into different serialization schemes. They have their own pluses and minuses, which you'll find plenty of information detailing on StackOverflow.
To get you started, look into Google's protocol buffers, boost serialization and XML.
Whenever there's a C++ question, the answer is likely to be Boost. Check Boost Serialization.
An alternative to boost, if you want is s11n

boost serialization vs google protocol buffers? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Does anyone with experience with these libraries have any comment on which one they preferred? Were there any performance differences or difficulties in using?
I've been using Boost Serialization for a long time and just dug into protocol buffers, and I think they don't have the exact same purpose. BS (didn't see that coming) saves your C++ objects to a stream, whereas PB is an interchange format that you read to/from.
PB's datamodel is way simpler: you get all kinds of ints and floats, strings, arrays, basic structure and that's pretty much it. BS allows you to directly save all of your objects in one step.
That means with BS you get more data on the wire but you don't have to rebuild all of your objects structure, whereas protocol buffers is more compact but there is more work to be done after reading the archive. As the name says, one is for protocols (language-agnostic, space efficient data passing), the other is for serialization (no-brainer objects saving).
So what is more important to you: speed/space efficiency or clean code?
I've played around a little with both systems, nothing serious, just some simple hackish stuff, but I felt that there's a real difference in how you're supposed to use the libraries.
With boost::serialization, you write your own structs/classes first, and then add the archiving methods, but you're still left with some pretty "slim" classes, that can be used as data members, inherited, whatever.
With protocol buffers, the amount of code generated for even a simple structure is pretty substantial, and the structs and code that's generated is more meant for operating on, and that you use protocol buffers' functionality to transport data to and from your own internal structures.
There are a couple of additional concerns with boost.serialization that I'll add to the mix. Caveat: I don't have any direct experience with protocol buffers beyond skimming the docs.
Note that while I think boost, and boost.serialization, is great at what it does, I have come to the conclusion that the default archive formats it comes with are not a great choice for a wire format.
It's important to distinguish between versions of your class (as mentioned in other answers, boost.serialization has some support for data versioning) and compatibility between different versions of the serialization library.
Newer versions of boost.serialization may not generate archives that older versions can deserialize. (the reverse is not true: newer versions are always intended to deserialize archives made by older versions). This has led to the following problems for us:
Both our client & server software create serialized objects that the other consumes, so we can only move to a newer boost.serialization if we upgrade both client and server in lockstep. (This is quite a challenge in an environment where you don't have full control of your clients).
Boost comes bundled as one big library with shared parts, and both the serialization code and the other parts of the boost library (e.g. shared_ptr) may be in use in the same file, I can't upgrade any parts of boost because I can't upgrade boost.serialization. I'm not sure if it's possible/safe/sane to attempt to link multiple versions of boost into a single executable, or if we have the budget/energy to refactor out bits that need to remain on an older version of boost into a separate executable (DLL in our case).
The old version of boost we're stuck on doesn't support the latest version of the compiler we use, so we're stuck on an old version of the compiler too.
Google seem to actually publish the protocol buffers wire format, and Wikipedia describes them as forwards-compatible, backwards-compatible (although I think Wikipedia is referring to data versioning rather than protocol buffer library versioning). Whilst neither of these is a guarantee of forwards-compatibility, it seems like a stronger indication to me.
In summary, I would prefer a well-known, published wire format like protocol buffers when I don't have the ability to upgrade client & server in lockstep.
Footnote: shameless plug for a related answer by me.
Boost Serialisation
is a library for writing data into a stream.
does not compress data.
does not support data versioning automatically.
supports STL containers.
properties of data written depend on streams chosen (e.g. endian, compressed).
Protocol Buffers
generates code from interface description (supports C++, Python and Java by default. C, C# and others by 3rd party).
optionally compresses data.
handles data versioning automatically.
handles endian swapping between platforms.
does not support STL containers.
Boost serialisation is a library for converting an object into a serialised stream of data. Protocol Buffers do the same thing, but also do other work for you (like versioning and endian swapping). Boost serialisation is simpler for "small simple tasks". Protocol Buffers are probably better for "larger infrastructure".
EDIT:24-11-10: Added "automatically" to BS versioning.
I have no experience with boost serialization, but I have used protocol buffers. I like protocol buffers a lot. Keep the following in mind (I say this with no knowledge of boost).
Protocol buffers are very efficient so I don't imagine that being a serious issue vs. boost.
Protocol buffers provide an intermediate representation that works with other languages (Python and Java... and more in the works). If you know you're only using C++, maybe boost is better, but the option to use other languages is nice.
Protocol buffers are more like data containers... there is no object oriented nature, such as inheritance. Think about the structure of what you want to serialize.
Protocol buffers are flexible because you can add "optional" fields. This basically means you can change the structure of protocol buffer without breaking compatibility.
Hope this helps.
boost.serialization just needs the C++ compiler and gives you some syntax sugar like
serialize_obj >> archive;
// ...
unserialize_obj << archive;
for saving and loading. If C++ is the only language you use you should give boost.serialization a serious shot.
I took a fast look at google protocol buffers. From what I see I'd say its not directly comparable to boost.serialization. You have to add a compiler for the .proto files to your toolchain and maintain the .proto files itself. The API doesn't integrate into C++ as boost.serialization does.
boost.serialization does the job its designed for very well: to serialize C++ objects :)
OTOH an query-API like google protocol buffers has gives you more flexibility.
Since I only used boost.serialization so far I cannot comment on performance comparison.
Correction to above (guess this is that answer) about Boost Serialization :
It DOES allow supporting data versioning.
If you need compression - use a compressed stream.
Can handle endian swapping between platforms as encoding can be text, binary or XML.
I never implemented anything using boost's library, but I found Google protobuff's to be more thought-out, and the code is much cleaner and easier to read. I would suggest having a look at the various languages you want to use it with and have a read through the code and the documentation and make up your mind.
The one difficulty I had with protobufs was they named a very commonly used function in their generated code GetMessage(), which of course conflicts with the Win32 GetMessage macro.
I would still highly recommend protobufs. They're very useful.
I know that this is an older question now, but I thought I'd throw my 2 pence in!
With boost you get the opportunity to I'm write some data validation in your classes; this is good because the data definition and the checks for validity are all in one place.
With GPB the best you can do is to put comments in the .proto file and hope against all hope that whoever is using it reads it, pays attention to it, and implements the validity checks themselves.
Needless to say this is unlikely and unreliable if your relying on someone else at the other end of a network stream to do this with the same vigour as oneself. Plus if the constraints on validity change, multiple code changes need to be planned, coordinated and done.
Thus I consider GPB to be inappropriate for developments where there is little opportunity to regularly meet and talk with all team members.
==EDIT==
The kind of thing I mean is this:
message Foo
{
int32 bearing = 1;
}
Now who's to say what the valid range of bearing is? We can have
message Foo
{
int32 bearing = 1; // Valid between 0 and 359
}
But that depends on someone else reading this and writing code for it. For example, if you edit it and the constraint becomes:
message Foo
{
int32 bearing = 1; // Valid between -180 and +180
}
you are completely dependent on everyone who has used this .proto updating their code. That is unreliable and expensive.
At least with Boost serialisation you're distributing a single C++ class, and that can have data validity checks built right into it. If those constraints change, then no one else need do any work other than making sure they're using the same version of the source code as you.
Alternative
There is an alternative: ASN.1. This is ancient, but has some really, really, handy things:
Foo ::= SEQUENCE
{
bearing INTEGER (0..359)
}
Note the constraint. So whenever anyone consumes this .asn file, generates code, they end up with code that will automatically check that bearing is somewhere between 0 and 359. If you update the .asn file,
Foo ::= SEQUENCE
{
bearing INTEGER (-180..180)
}
all they need to do is recompile. No other code changes are required.
You can also do:
bearingMin INTEGER ::= 0
bearingMax INTEGER ::= 360
Foo ::= SEQUENCE
{
bearing INTEGER (bearingMin..<bearingMax)
}
Note the <. And also in most tools the bearingMin and bearingMax can appear as constants in the generated code. That's extremely useful.
Constraints can be quite elaborate:
Garr ::= INTEGER (0..10 | 25..32)
Look at Chapter 13 in this PDF; it's amazing what you can do;
Arrays can be constrained too:
Bar ::= SEQUENCE (SIZE(1..5)) OF Foo
Sna ::= SEQUENCE (SIZE(5)) OF Foo
Fee ::= SEQUENCE
{
boo SEQUENCE (SIZE(1..<6)) OF INTEGER (-180<..<180)
}
ASN.1 is old fashioned, but still actively developed, widely used (your mobile phone uses it a lot), and far more flexible than most other serialisation technologies. About the only deficiency that I can see is that there is no decent code generator for Python. If you're using C/C++, C#, Java, ADA then you are well served by a mixture of free (C/C++, ADA) and commercial (C/C++, C#, JAVA) tools.
I especially like the wide choice of binary and text based wireformats. This makes it extremely convenient in some projects. The wireformat list currently includes:
BER (binary)
PER (binary, aligned and unaligned. This is ultra bit efficient. For example, and INTEGER constrained between 0 and 15 will take up only 4 bits on the wire)
OER
DER (another binary)
XML (also XER)
JSON (brand new, tool support is still developing)
plus others.
Note the last two? Yes, you can define data structures in ASN.1, generate code, and emit / consume messages in XML and JSON. Not bad for a technology that started off back in the 1980s.
Versioning is done differently to GPB. You can allow for extensions:
Foo ::= SEQUENCE
{
bearing INTEGER (-180..180),
...
}
This means that at a later date I can add to Foo, and older systems that have this version can still work (but can only access the bearing field).
I rate ASN.1 very highly. It can be a pain to deal with (tools might cost money, the generated code isn't necessarily beautiful, etc). But the constraints are a truly fantastic feature that has saved me a whole ton of heart ache time and time again. Makes developers whinge a lot when the encoders / decoders report that they've generated duff data.
Other links:
Good intro
Open source C/C++ compiler
Open source compiler, does ADA too AFAIK
Commercial, good
Commercial, good
Try it yourself online
Observations
To share data:
Code first approaches (e.g. Boost serialisation) restrict you to the original language (e.g. C++), or force you to do a lot of extra work in another language
Schema first is better, but
A lot of these leave big gaps in the sharing contract (i.e. no constraints). GPB is annoying in this regard, because it is otherwise very good.
Some have constraints (e.g. XSD, JSON), but suffer patchy tool support.
For example, Microsoft's xsd.exe actively ignores constraints in xsd files (MS's excuse is truly feeble). XSD is good (from the constraints point of view), but if you cannot trust the other guy to use a good XSD tool that enforces them for him/her then the worth of XSD is diminished
JSON validators are ok, but they do nothing to help you form the JSON in the first place, and aren't automatically called. There's no guarantee that someone sending you JSON message have run it through a validator. You have to remember to validate it yourself.
ASN.1 tools all seem to implement the constraints checking.
So for me, ASN.1 does it. It's the one that is least likely to result in someone else making a mistake, because it's the one with the right features and where the tools all seemingly endeavour to fully implement those features, and it is language neutral enough for most purposes.
To be honest, if GPB added a constraints mechanism that'd be the winner. XSD is close but the tools are almost universally rubbish. If there were decent code generators of other languages, JSON schema would be pretty good.
If GPB had constraints added (note: this would not change any of the wire formats), that'd be the one I'd recommend to everyone for almost every purpose. Though ASN.1's uPER is very useful for radio links.
As with almost everything in engineering, my answer is... "it depends."
Both are well tested, vetted technologies. Both will take your data and turn it into something friendly for sending someplace. Both will probably be fast enough, and if you're really counting a byte here or there, you're probably not going to be happy with either (let's face it both created packets will be a small fraction of XML or JSON).
For me, it really comes down to workflow and whether or not you need something other than C++ on the other end.
If you want to figure out your message contents first and you're building a system from scratch, use Protocol Buffers. You can think of the message in an abstract way and then auto-generate the code in whatever language you want (3rd party plugins are available for just about everything). Also, I find collaboration simplified with Protocol Buffers. I just send over a .proto file and then the other team has a clear idea of what data is being transfered. I also don't impose anything on them. If they want to use Java, go ahead!
If I already have built a class in C++ (and this has happened more often than not) and I want to send that data over the wire now, Boost Serialization obviously makes a ton of sense (especially where I already have a Boost dependency somewhere else).
You can use boost serialization in tight conjunction with your "real" domain objects, and serialize the complete object hierarchy (inheritance). Protobuf does not support inheritance, so you will have to use aggregation. People argue that Protobuf should be used for DTOs (data transfer objects), and not for core domain objects themselves. I have used both boost::serialization and protobuf. The Performance of boost::serialization should be taken into account, cereal might be an alternative.