XML vs YAML vs JSON for a 2D RPG [duplicate] - c++

Is there a C++ library to read JSON documents into C++ objects? [duplicate]
(4 answers)
Closed 8 years ago.
I can't figure out whether or not to use XML, YAML, or JSON for a C++ 2D RPG.
Here are my thoughts:
I need something which is simple to save not just player data, but environment data, such as object (x, y) coordinates; load times; dates; graphics configurations, etc.
I need something flexible, easy to use, and definitely light weight, but powerful to handle the above.
Which is the best choice? I have experience with JSON in JavaScript, but not C++. Are there any good references for parsing JSON in C++ if this is the route to go?
Honestly, if a text file seems like the simplest and most effective solution for something like this (especially if I can just write it to binary), then I'm all ears.
Edit 2
Feel free to provide other suggestions as well.

I would use the simplest thing that satisfies your requirements.
If you don't need hierarchical storage, then flat tabular files are so much easier to deal with than anything else. All you have to do is read lines off disk and split on tab.
If you are looking at more of key/value pair type storage (as opposed to lists of things), then INI files can be reasonable. This format has a lot of flexibility, though reasoning about it can less approachable when you start doing more complicated things than it was designed for.
If you need hierarchical, it's possible that JSON would be simpler. There are JSON libraries in wide range of languages, and it sounds like you already familiar.
sqlite may be another option. There be dragons in SQL, but with a nice C++ wrapper around sqlite, it can be manageable. The primary benefit would be ACID, in my opinion.
The YAML spec looks somewhat lengthy, so I can guess that it has more kitchen sinks. Just skimming the libyaml docs, the API looks somewhat like SAX interfaces that I've used in the past. I have no a posteriori knowledge of it, but I would be reticent to start using it without a good reason.
XML sucks to deal with, don't opt in to it. There lots of reasons for this. I think the most relevant one in my mind is that it's prone to make the code that uses it more complicated than it would be otherwise. Any system I've seen designed with XML, reasoning about the XML is more complicated than the design interests that its trying to support. There are valid uses for it, though it's rare that another storage system wouldn't have been just as adequate.
Regardless of which one you choose, write as little code as you can managing it. You really want to write the classes your engine will use first. Then worry about serializing them. If you let your serialization influence your class design, you'll probably regret it. :)


Sharing data structures between perl and cpp

I have a perl script which generates a very large data structure (which starts life as an array of array references). This is then written to a text file using some weird home-brew serialisation scheme.
The data from the text file is stored as the value in a key-value store db.
A c++ file then retrieves the data, and deserializes it (into a hashmap, although can potentially be flexible on how this data is structured).
What I'm interested in is finding if there are any good ways of sharing a data structure between perl and c++ (something like Storable, but that is meant for perl->perl not perl->c++). The current method is a headache to maintain, and may not have the best performance.
The most important factors are speed of deserialisation, and the size of the serialized structure in that order. Anyone know of something that might do the trick?
Storable is one way to dump and load perl data structures. I wouldn't actually recommend it for general usage though - it's handy in that it's part of core and easy to use.
But for multi-platform (and language) portability, it's far better to use a standard data representation. Which you choose is probably a matter of what sort of data you're holding in your structure, but core contenders are:
JSON - good for arrays and hashes (key-value).
YAML - Excellent for 'config file' style data (but extends in ways similar to JSON)
And if you must, XML - but bear in mind that XML is designed for documents-with-metadata, and so IMO isn't suitable for most of the applications it's used for.
As standards, they've got documented formatting and parsers are widely available. And implementing your own isn't too hard, if that's the route you want to go. Just make sure you follow the spec and you're good.
Note - that because XML and JSON (and I think YAML?) are recursive, you can parse as a stream, rather than a standalone object. (Trap, process and discard as you hit 'close brackets' in JSON, or 'close tags' in XML).
easy job.
I like perl , and I also like C/C++. To make the best of both,
I wrote a github project to solve this issue.
please see:
a short example is here :
Int("a") ; // a= 1024

Tiny C++ YAML reader/writer

I'm writing an embedded C++ program, and need to add serialization/deserialization. The format should be human readable and writeable, and I would much prefer to use (a subset of) a standard format like YAML. I also prefer YAML to JSON since it is more concise.
While yaml-cpp has the exact functionality I'd like, the source code is almost 300K and would almost double my code size, which seems excessive to me just in order to add human readable serialization/deserialization.
Before I start writing my own reader/writer for a subset of YAML, I'd like to first check whether this already exists? I have not been able to find one, but would much prefer to use existing code rather than rolling my own. Are there any C or C++ YAML readers/writers out there of, say, 50K code or less? I only need functionality for the basic data structures (scalar, array, hash), not any advanced stuff.
With many thanks in advance.
The Oops library is doing what you are looking for. It is written for serialization using reflection and supports YAML format as well.

C++ Boost.serialization vs simple load/save

I am computational scientist that work with large amount of simulation data and often times I find myself saving/loading data into/from the disk. For simple tasks, like a vector, this is usually as simple as dumping bunch of numbers into a file and that's it.
For more complex stuff, life objects and such, I have save/load member functions. Now, I'm not a computer scientist, and thus often times I see terminologies here on SO that I just do not understand (but I love to). One of these that I've came across recently is the subject of serialization and Boost.Serialization library.
From what I understand serialization is the simply the process of converting your objects into something that can be saved/loaded from dist or be transmitted over a network and such. Considering that at most I need to save/load my objects into/from disk, is there any reason I should switch from the simple load/save functions into Boost.Serialization? What would Boost.Serialization give me other than what I'm already doing?
That library takes into accounts many details that could be non very apparent from a purely 'applicative' point of view.
For instance, data portability WRT big/little numeric endianess, pointed data life time, structured containers, versioning, non intrusive extensions, and more. Moreover, it handles the right way the interaction with other std or boost infrastructure, and dictates a way of code structuring that will reward you with easier code maintenance. You will find ready to use serializers for many (all std & boost ?) containers.
And consider if you need to share your data with someone other, there are chances that referring to a published, mantained, and debugged schema will make things much easier.

Converting from XML to a C++ Object

I'm working on a C++ project, and wanted to get some inputs from developers with similar experience.
The task is to connect to a web service which gives the results in an XML form. My role in the task is once I receive the XML form, I need to convert the XML into a C++ object and parse the XML data to the C++ object.
Following are my clarifications.
a) One way is to handcraft the whole thing but I need to do this for around hundreds of web services. I am aware there are simpler tools for C# and Java to do the same.
Is there a tool/utility for C++ too?
Any suggestions, would be helpful.
In the past, I've used TinyXML for my XML parsing needs. My parsing code operated under the assumption that all XML input conforms to a particular XSD schema I wrote. It worked fairly well but the ripple effects were annoying - if I wanted to change the XSD, I had to update all my XML test files as well as my parsing code. While it's not so bad in the case of parsing one schema, I'd hate to have to do it for hundreds of them.
I'm not sure what the common solution is, but CodeSynthesis XSD sounds pretty promising. I haven't used it, but it appears that it generates a data layer, a parser and serialisation code for you. Could save you a lot of time.
If you're asking if there's a way to dynamically create an object representation of an XML data stream (such that you can access it like topLevel.subObject.value), it's not possible. C++ is a statically-typed language, which means all objects need to be defined a compile time. The best you could do is something like: xmlData.getSubObject("objectName").getValue().
As for toolsets for parsing into something usable dynamically (as per my later example), there are several. For Windows, for example, you could use the "built-in" MSXML objects. There's nothing in the base C++ libraries to do so, however, as far as I am aware.
Hope that helps.

boost serialization vs google protocol buffers? [closed]

Does anyone with experience with these libraries have any comment on which one they preferred? Were there any performance differences or difficulties in using?
I've been using Boost Serialization for a long time and just dug into protocol buffers, and I think they don't have the exact same purpose. BS (didn't see that coming) saves your C++ objects to a stream, whereas PB is an interchange format that you read to/from.
PB's datamodel is way simpler: you get all kinds of ints and floats, strings, arrays, basic structure and that's pretty much it. BS allows you to directly save all of your objects in one step.
That means with BS you get more data on the wire but you don't have to rebuild all of your objects structure, whereas protocol buffers is more compact but there is more work to be done after reading the archive. As the name says, one is for protocols (language-agnostic, space efficient data passing), the other is for serialization (no-brainer objects saving).
So what is more important to you: speed/space efficiency or clean code?
I've played around a little with both systems, nothing serious, just some simple hackish stuff, but I felt that there's a real difference in how you're supposed to use the libraries.
With boost::serialization, you write your own structs/classes first, and then add the archiving methods, but you're still left with some pretty "slim" classes, that can be used as data members, inherited, whatever.
With protocol buffers, the amount of code generated for even a simple structure is pretty substantial, and the structs and code that's generated is more meant for operating on, and that you use protocol buffers' functionality to transport data to and from your own internal structures.
There are a couple of additional concerns with boost.serialization that I'll add to the mix. Caveat: I don't have any direct experience with protocol buffers beyond skimming the docs.
Note that while I think boost, and boost.serialization, is great at what it does, I have come to the conclusion that the default archive formats it comes with are not a great choice for a wire format.
It's important to distinguish between versions of your class (as mentioned in other answers, boost.serialization has some support for data versioning) and compatibility between different versions of the serialization library.
Newer versions of boost.serialization may not generate archives that older versions can deserialize. (the reverse is not true: newer versions are always intended to deserialize archives made by older versions). This has led to the following problems for us:
Both our client & server software create serialized objects that the other consumes, so we can only move to a newer boost.serialization if we upgrade both client and server in lockstep. (This is quite a challenge in an environment where you don't have full control of your clients).
Boost comes bundled as one big library with shared parts, and both the serialization code and the other parts of the boost library (e.g. shared_ptr) may be in use in the same file, I can't upgrade any parts of boost because I can't upgrade boost.serialization. I'm not sure if it's possible/safe/sane to attempt to link multiple versions of boost into a single executable, or if we have the budget/energy to refactor out bits that need to remain on an older version of boost into a separate executable (DLL in our case).
The old version of boost we're stuck on doesn't support the latest version of the compiler we use, so we're stuck on an old version of the compiler too.
Google seem to actually publish the protocol buffers wire format, and Wikipedia describes them as forwards-compatible, backwards-compatible (although I think Wikipedia is referring to data versioning rather than protocol buffer library versioning). Whilst neither of these is a guarantee of forwards-compatibility, it seems like a stronger indication to me.
In summary, I would prefer a well-known, published wire format like protocol buffers when I don't have the ability to upgrade client & server in lockstep.
Footnote: shameless plug for a related answer by me.
Boost Serialisation
is a library for writing data into a stream.
does not compress data.
does not support data versioning automatically.
supports STL containers.
properties of data written depend on streams chosen (e.g. endian, compressed).
Protocol Buffers
generates code from interface description (supports C++, Python and Java by default. C, C# and others by 3rd party).
optionally compresses data.
handles data versioning automatically.
handles endian swapping between platforms.
does not support STL containers.
Boost serialisation is a library for converting an object into a serialised stream of data. Protocol Buffers do the same thing, but also do other work for you (like versioning and endian swapping). Boost serialisation is simpler for "small simple tasks". Protocol Buffers are probably better for "larger infrastructure".
EDIT:24-11-10: Added "automatically" to BS versioning.
I have no experience with boost serialization, but I have used protocol buffers. I like protocol buffers a lot. Keep the following in mind (I say this with no knowledge of boost).
Protocol buffers are very efficient so I don't imagine that being a serious issue vs. boost.
Protocol buffers provide an intermediate representation that works with other languages (Python and Java... and more in the works). If you know you're only using C++, maybe boost is better, but the option to use other languages is nice.
Protocol buffers are more like data containers... there is no object oriented nature, such as inheritance. Think about the structure of what you want to serialize.
Protocol buffers are flexible because you can add "optional" fields. This basically means you can change the structure of protocol buffer without breaking compatibility.
Hope this helps.
boost.serialization just needs the C++ compiler and gives you some syntax sugar like
serialize_obj >> archive;
// ...
unserialize_obj << archive;
for saving and loading. If C++ is the only language you use you should give boost.serialization a serious shot.
I took a fast look at google protocol buffers. From what I see I'd say its not directly comparable to boost.serialization. You have to add a compiler for the .proto files to your toolchain and maintain the .proto files itself. The API doesn't integrate into C++ as boost.serialization does.
boost.serialization does the job its designed for very well: to serialize C++ objects :)
OTOH an query-API like google protocol buffers has gives you more flexibility.
Since I only used boost.serialization so far I cannot comment on performance comparison.
Correction to above (guess this is that answer) about Boost Serialization :
It DOES allow supporting data versioning.
If you need compression - use a compressed stream.
Can handle endian swapping between platforms as encoding can be text, binary or XML.
I never implemented anything using boost's library, but I found Google protobuff's to be more thought-out, and the code is much cleaner and easier to read. I would suggest having a look at the various languages you want to use it with and have a read through the code and the documentation and make up your mind.
The one difficulty I had with protobufs was they named a very commonly used function in their generated code GetMessage(), which of course conflicts with the Win32 GetMessage macro.
I would still highly recommend protobufs. They're very useful.
I know that this is an older question now, but I thought I'd throw my 2 pence in!
With boost you get the opportunity to I'm write some data validation in your classes; this is good because the data definition and the checks for validity are all in one place.
With GPB the best you can do is to put comments in the .proto file and hope against all hope that whoever is using it reads it, pays attention to it, and implements the validity checks themselves.
Needless to say this is unlikely and unreliable if your relying on someone else at the other end of a network stream to do this with the same vigour as oneself. Plus if the constraints on validity change, multiple code changes need to be planned, coordinated and done.
Thus I consider GPB to be inappropriate for developments where there is little opportunity to regularly meet and talk with all team members.
The kind of thing I mean is this:
message Foo
int32 bearing = 1;
Now who's to say what the valid range of bearing is? We can have
message Foo
int32 bearing = 1; // Valid between 0 and 359
But that depends on someone else reading this and writing code for it. For example, if you edit it and the constraint becomes:
message Foo
int32 bearing = 1; // Valid between -180 and +180
you are completely dependent on everyone who has used this .proto updating their code. That is unreliable and expensive.
At least with Boost serialisation you're distributing a single C++ class, and that can have data validity checks built right into it. If those constraints change, then no one else need do any work other than making sure they're using the same version of the source code as you.
There is an alternative: ASN.1. This is ancient, but has some really, really, handy things:
bearing INTEGER (0..359)
Note the constraint. So whenever anyone consumes this .asn file, generates code, they end up with code that will automatically check that bearing is somewhere between 0 and 359. If you update the .asn file,
bearing INTEGER (-180..180)
all they need to do is recompile. No other code changes are required.
You can also do:
bearingMin INTEGER ::= 0
bearingMax INTEGER ::= 360
bearing INTEGER (bearingMin..<bearingMax)
Note the <. And also in most tools the bearingMin and bearingMax can appear as constants in the generated code. That's extremely useful.
Constraints can be quite elaborate:
Garr ::= INTEGER (0..10 | 25..32)
Look at Chapter 13 in this PDF; it's amazing what you can do;
Arrays can be constrained too:
Bar ::= SEQUENCE (SIZE(1..5)) OF Foo
Sna ::= SEQUENCE (SIZE(5)) OF Foo
boo SEQUENCE (SIZE(1..<6)) OF INTEGER (-180<..<180)
ASN.1 is old fashioned, but still actively developed, widely used (your mobile phone uses it a lot), and far more flexible than most other serialisation technologies. About the only deficiency that I can see is that there is no decent code generator for Python. If you're using C/C++, C#, Java, ADA then you are well served by a mixture of free (C/C++, ADA) and commercial (C/C++, C#, JAVA) tools.
I especially like the wide choice of binary and text based wireformats. This makes it extremely convenient in some projects. The wireformat list currently includes:
BER (binary)
PER (binary, aligned and unaligned. This is ultra bit efficient. For example, and INTEGER constrained between 0 and 15 will take up only 4 bits on the wire)
DER (another binary)
XML (also XER)
JSON (brand new, tool support is still developing)
plus others.
Note the last two? Yes, you can define data structures in ASN.1, generate code, and emit / consume messages in XML and JSON. Not bad for a technology that started off back in the 1980s.
Versioning is done differently to GPB. You can allow for extensions:
bearing INTEGER (-180..180),
This means that at a later date I can add to Foo, and older systems that have this version can still work (but can only access the bearing field).
I rate ASN.1 very highly. It can be a pain to deal with (tools might cost money, the generated code isn't necessarily beautiful, etc). But the constraints are a truly fantastic feature that has saved me a whole ton of heart ache time and time again. Makes developers whinge a lot when the encoders / decoders report that they've generated duff data.
Other links:
Good intro
Open source C/C++ compiler
Open source compiler, does ADA too AFAIK
Commercial, good
Commercial, good
Try it yourself online
To share data:
Code first approaches (e.g. Boost serialisation) restrict you to the original language (e.g. C++), or force you to do a lot of extra work in another language
Schema first is better, but
A lot of these leave big gaps in the sharing contract (i.e. no constraints). GPB is annoying in this regard, because it is otherwise very good.
Some have constraints (e.g. XSD, JSON), but suffer patchy tool support.
For example, Microsoft's xsd.exe actively ignores constraints in xsd files (MS's excuse is truly feeble). XSD is good (from the constraints point of view), but if you cannot trust the other guy to use a good XSD tool that enforces them for him/her then the worth of XSD is diminished
JSON validators are ok, but they do nothing to help you form the JSON in the first place, and aren't automatically called. There's no guarantee that someone sending you JSON message have run it through a validator. You have to remember to validate it yourself.
ASN.1 tools all seem to implement the constraints checking.
So for me, ASN.1 does it. It's the one that is least likely to result in someone else making a mistake, because it's the one with the right features and where the tools all seemingly endeavour to fully implement those features, and it is language neutral enough for most purposes.
To be honest, if GPB added a constraints mechanism that'd be the winner. XSD is close but the tools are almost universally rubbish. If there were decent code generators of other languages, JSON schema would be pretty good.
If GPB had constraints added (note: this would not change any of the wire formats), that'd be the one I'd recommend to everyone for almost every purpose. Though ASN.1's uPER is very useful for radio links.
As with almost everything in engineering, my answer is... "it depends."
Both are well tested, vetted technologies. Both will take your data and turn it into something friendly for sending someplace. Both will probably be fast enough, and if you're really counting a byte here or there, you're probably not going to be happy with either (let's face it both created packets will be a small fraction of XML or JSON).
For me, it really comes down to workflow and whether or not you need something other than C++ on the other end.
If you want to figure out your message contents first and you're building a system from scratch, use Protocol Buffers. You can think of the message in an abstract way and then auto-generate the code in whatever language you want (3rd party plugins are available for just about everything). Also, I find collaboration simplified with Protocol Buffers. I just send over a .proto file and then the other team has a clear idea of what data is being transfered. I also don't impose anything on them. If they want to use Java, go ahead!
If I already have built a class in C++ (and this has happened more often than not) and I want to send that data over the wire now, Boost Serialization obviously makes a ton of sense (especially where I already have a Boost dependency somewhere else).
You can use boost serialization in tight conjunction with your "real" domain objects, and serialize the complete object hierarchy (inheritance). Protobuf does not support inheritance, so you will have to use aggregation. People argue that Protobuf should be used for DTOs (data transfer objects), and not for core domain objects themselves. I have used both boost::serialization and protobuf. The Performance of boost::serialization should be taken into account, cereal might be an alternative.