Serialize a data structure in text format in C++/Qt - c++

I need to store some data structure into a SQL database. This data may vary so I could not know which fields the database must have. I am about to do it encoding the data structure into a XML or JSON object and then put it into the SQL database. But it could not work, so I need serialization, since the problem is about encoding that structure.
Which library/tool/method may I use to serialize/deserialize a data structure into a text? Let's say a data structure composed by some Integers, some Unicode strings, and some Booleans.
Many thanks in advance!

since the problem is about encoding that structure.
Qt can't automatically serialize things, so you'll have to write some kind of routine to save/load your data. Exact solution depends on your requirements and how are you going to use data.
Things you should consider:
Is human-readability required? (loading text file can be slower, due to parsing)
Is communication/data exchange (with some other program) required?
What is least-expensive development solution?
Recommendation:
If data is simple, used only by your app, binary(non-human readable), then use QDataStream + QByteArray, serialize by hand.
If data is complex, used only by your app, binary(non-human readable), then use QDataStream + QByteArray + QVariantMap. (dump data into qvariant map, then serialize it into QByteArray using QDataStream)
If data is simple, used only by your app, text(should be human readable) then plain text, json or xml.
If data is complex, used only by your app, text(human readable), then use json or xml - depending on whichever is available/whichever is shortest.
If data is supposed to be read by some 3rd party tool, then use whatever format this tool uses.
Details:
If you're communicating with something, you're stuck with whatever format the thing you communicate with is using. That'll be probably either json or xml, because various scripting langauges frequently can easily read either of them
If data is supposed to be human-readable, then you have following options: plain text, json or xml, depending on data format/complexity. (ini won't work in your scenario, because it is supported by QSettings that doesn't really serialize to random in-memory location. xml/json) can store tree-like structures, but json is available in Qt 4 only via external dependency, and xml reader/writer requires effort to get hang of it.
If you want to store private data (used only by your application) that is not human readable and is only used by your application, you can use pretty much whatever you want, including binary format.
The simplest way would be to dump it into QByteArray + QDataStream manually, because QDataStream guarantees that binary data will be correctly loaded on any platform regardless of endianness (as long as you don't dump it as a raw memory block, that is). That'll work fine if data is simple array of similar structures that has fixed number of components that are always present.
If the data is tree-like and has optional nodes (i.e. tree of key-value pairs, where value can be another tree) and keys may or may not be present, you'll library or routine that deals with those key/value pair trees.
In Qt 4 that's QVariantMap and xml (via QXmlStreamWriter/QXmlStreamReader). Qt 5 also has json support. In case when multiple solutions are available, built-in solution that takes least amount of effort to implement, wins. Reading a named field from QVariantMap takes 1 line of code per value + a helper function that is roughly 10 lines of code, it also supports all Qt 4 types.
QVariantMap has significant advantage other json in a sense that it supports all Qt types natively as values. I.e. you can easily dump QColor, QPoint, plus you types you registered into it. Anything you can store in QVariant, you can store within QVariantMap, which can then be serialized to/from QDataStream in a single line of code.
Json, on other hand, has advantage of being "standard" data format that can be loaded within scripting language. That is at cost of having only 6 basic types it supports.
Json is not supported natively by Qt 4, and although there is QJson, adding external dependencies is not always a good idea, because you're going to babysit them. If you're using Qt 4 and really need json support, it might make sense to upgrade to Qt 5.

What do you do with this data in the database? If you want to use it in another program or language, then your data need to be readable and you can use QXmlStreamWriter/Reader or QJsonDocument (both with QByteArray).
If you don't want to use your data outside your program, you can write them in QByteArray with QDataStream.
It would be something like the code below.
QByteArray data;
QDataStream stream(&data);
stream << yourClasses;
sqlQuery.bindValue(data);
You just need to add stream operators in your classes you want to serialize:
QDataStream &operator<<(QDataStream &, A const&);
QDataStream &operator>>(QDataStream &, A &);

This is a long and detailed answer, but here goes the point. Qt 5.2 has an example how you could achieve all this with Qt based on the "SaveGame" example which is also doing the serialization and deserialization. It is basically doing that for the non-playable game character. It is also using QSaveFile for saving the information safely into the desired file. Here you can find the url to the documentation of the example for your convenience:
http://doc-snapshot.qt-project.org/qdoc/qtcore-savegame-example.html
I have just solved this issue a few days ago with Qt5's json parser. I would suggest to take a look into the following classes:
QJsonDocument: http://qt-project.org/doc/qt-5.1/qtcore/qjsondocument.html
QJsonObject: http://qt-project.org/doc/qt-5.1/qtcore/qjsonobject.html
QJsonArray: http://qt-project.org/doc/qt-5.1/qtcore/qjsonarray.html
QJsonParseError: http://qt-project.org/doc/qt-5.1/qtcore/qjsonparseerror.html
QJsonValue: http://qt-project.org/doc/qt-5.1/qtcore/qjsonvalue.html
If you are planning to use Qt 4, you will need to use the QJson library for instance, but mind you, it is a lot slower than the Qt 5 json parser based on the performance. Here you can find Thiago's benchmark:
https://plus.google.com/108138837678270193032/posts/7EVTACgwtxK
The json format supports strings, integers, and bool just as you wish, so that should work for you. Here you can find the signatures how to serialize and deserialize such types:
Serialize
bool toBool(bool defaultValue = false) const
double toDouble(double defaultValue = 0) const
QString toString(const QString & defaultValue = QString()) const
Deserialize
QJsonValue(bool b)
QJsonValue(double n)
QJsonValue(const QString & s)
Here you can find a summary page about the json support in Qt 5:
http://doc-snapshot.qt-project.org/qt5-nosubdir/json.html
Note that, Qt 5's json is saved internally as a very efficient binary representation, so essentially you could also use that rather than text representation. It depends on your exact scenario. Actually, for Qt 4, you could even backport these classes if needed since they are so much tied to Qt 5. It may have actually been so for BlackBerry development since we were struggling with Qt 4 there, and we badly needed the json parser from Qt.

Related

Best practice for mixed binary/text logging in C++

In C++, is there a logging framework for logging both binary data (e.g. POD messages with pre-defined format), as well as text data (for informational purposes)?
As an example, consider we have a POD type
struct EmployeeInfo
{
unsigned int age;
char name[80];
}
At startup (say at 00:00:00.001), we may want to emit a text log entry saying "Employee DB app started".
Then at 00:00:00.002, we received a new EmployeeInfo, so we may want to emit a binary log entry containing the EmployeeInfo data.
There's some benefit using a single log file for both types of events, in that order relationship among the events is maintained. The format of entries in the log file does not matter (it doesn't need to be human readable), as long as given a log file, it's easy to write two separate utitlity programs, one for processing (e.g. pretty printing) all EmployeeInfo info in the file, one for processing (e.g. printing to cout) all the text entries in it.
It appears that most existing logging frameworks in C++ (e.g. g2log, glog, spdlog etc.) are for generating human readable text log files only, and the usage is typically similar to printf or outputting to a stream, e.g.:
LOGD << "Hello %s!" << "World";
An obvious way to achieve the "one-file" requirement is to simply design a common message format for both events, e.g. timestamp + length + type + real data, and then simply write to a binary file. The drawbacks are: 1) The format of logging statement may not be as natural as in existing logging frameworks, 2) we need extra code if e.g. we want some features offered by existing logging frameworks, such as automatic rotation of log files every day.
I thought mixed binary/text logging should be a relatively common scenario, but I cannot seem to find any existing C++ libraries for it. Any suggestions are welcome. Thanks.
I am not sure that mixed binary & text logging is common. I never heard of that.
What you might consider is logging only text, but emit in that text some (printable) "identifiers" (perhaps inspired by UUIDs) which refer to some other binary file (for example, an sqlite database). So you would emit Hello from _9oXtCgAbkqv and store in another database some binary data related to _9oXtCgAbkqv. BTW that "identifier" might even be some file offset inside some other binary file.
BTW, if you emit any kind of binary log-like data, you need to have an utility to inspect that binary data. (For textual files, this is not an issue, since standard textual utilities like Linux commands less, grep,  awk, tail, head, split are enough).
And your issue is not C++ specific (you could have it in Ocaml, Python, Rust, Common Lisp, etc...). It is a matter of habits, conventions, operating systems, etc... Notice that log files are mostly conventional, and that utilities like logrotate can manage several log files.

Emitting avro format from pipes in Hadoop

I have to program in C++ for Hadoop and I deal with a complex structure of output value.
Unfortunately I can't figure out how to emit this structure in Avro format in MapReduce.
There are some writers like DataFileWriter and they work well for me. But it all doesn't make sense in terms of HDFS.
How I emit the structure now:
IOSerializer serializer;
context.emit(key, serializer.toString(output));
This custom toString method I wrote myself (sorry for the name, I'm totally from the Java world).
This is just a custom serialization into String. I really want some interoperability here and decided to use Avro.
This is the code to write Avro into the file:
avro::DataFileWriter<fusion_solve::graph> dfw("test.bin", schema);
dfw.write(output);
dfw.close();
What I want to be able to do is something like this:
IOSerializer serializer;
context.emit(serializer.toAvro(key, output));
For the moment I will be happy to get just plain JSON string as output, to convert it later.
The other option for me is writing custom RecordWriter in Java. But which type of input data should I use in this case, JSON?

"Best" Input File Formats for C++? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am starting work on a new piece of software that will end up needing some robust and expandable file IO. There are a lot of formats out there. XML, JSON, INI, etc. However, there are always plusses and minuses so I thought I would ask for some community input.
Here are some rough requirements:
The format is a "standard"...I don't want to reinvent the wheel if I don't have to. It doesn't have to be a formal IEEE standard, but something you could Google and get some information on as a new user, may have some support tools (editors) beyond vi. (Though the software users will generally be computer savvy and happy to use vi.)
Easily integrates with C++. I don't want to have to pull along a 100mb library and three different compilers to get it up and running.
Supports tabular input (2d, n-dimensional)
Supports POD types
Can expand as more inputs are required, binds well to variables, etc.
Parsing speed is not terribly important
Ideally, as easy to write (reflect) as it is to read
Works well on Windows and Linux
Supports compositing (one file referencing another file to read, and so on.)
Human Readable
In a perfect world, I would use a header-only library or some clean STL implementation, but I'm fine with leveraging Boost or some small external library if it works well.
So, what are your thoughts on various formats? Drawbacks? Advantages?
Edit
Options to consider? Anything else to add?
XML
YAML
SQLite
Google Protocol Buffers
Boost Serialization
INI
JSON
There is one excellent format that meets all your criteria:
SQLite!
Please read article about using SQLite as an application file format. Also, please watch Google Tech Talk by D. Richard Hipp (SQLite author) about this very topic.
Now, lets see how SQLite meets your requirements:
The format is a "standard"
SQLite has become format of choice for most mobile environments, and for many desktop apps (Firefox, Thunderbird, Google Chrome, Adobe Reader, you name it).
Easily integrates with C++
SQLite has standard C interface, which is only one source file and one header file. There are C++ wrappers too.
Supports tabular input (2d, n-dimensional)
SQLite table is as tabular as you could possibly imagine. To represent say 3-dimensional data, create table with columns x,y,z,value and store your data as a set of rows like this:
x1,y1,z1,value1
x2,y2,z2,value2
...
Supports POD types
I assume by POD you meant Plain Old Data, or BLOB. SQLite lets you store BLOB fields as is.
Can expand as more inputs are required, binds well to variables
This is where it really shines.
Parsing speed is not terribly important
But SQLite speed is superb. In fact, parsing is basically transparent.
Ideally, as easy to write (reflect) as it is to read
Just use INSERT to write and SELECT to read - what could be easier?
Works well on Windows and Linux
You bet, and all other platforms as well.
Supports compositing (one file referencing another file to read)
You can ATTACH one database to another.
Human Readable
Not in binary, but there are many excellent SQLite browsers/editors out there. I like SQLite Expert Personal on Windows and sqliteman on Linux. There is also SQLite editor plugin for Firefox.
There are other advantages that SQLite gives you for free:
Data is indexable which makes it very fast to search. You just cannot do this using XML, JSON or any other text-only formats.
Data can be edited partially, even when amount of data is very large. You do not have to rewrite few gigabytes just to edit one value.
SQLite is fully transactional: it guarantees that your data is consistent at all times. Even if your application (or whole computer) crashes, your data will be automatically restored to last known consistent state on next first attempt to connect to the database.
SQLite stores your data verbatim: you do not need to worry about escaping junk characters in your data (including zero bytes embedded in your strings) - simply always use prepared statements, that's all it takes to make it transparent. This can be big and annoying problem when dealing with text data formats, XML in particular.
SQLite stores all strings in Unicode: UTF-8 (default) or UTF-16. In other words, you do not need to worry about text encodings or international support for your data format.
SQLite allows you to process data in small chunks (row by row in fact), thus it works well in low memory conditions. This can be a problem for any text based formats, because often they need to load all text into memory to parse it. Granted, there are few efficient stream-based XML parsers out there, but in general any XML parser will be quite memory greedy compared to SQLite.
Having worked quite a bit with both XML and json, here's my rather subjective opinion of both as extendable serialization formats:
The format is a "standard": Yes for both
Easily integrates with C++: Yes for both. In each case you'll probably wind up with some kind of library to handle it. On Linux, libxml2 is a standard, and libxml++ is a C++ wrapper for it; you should be able to get both of those from your distro's package manager. It will take some small effort to get those working on Windows. There appears to be some support in Boost for json, but I haven't used it; I've always dealt with json using libraries. Really, the library route is not very onerous for either.
Supports tabular input (2d, n-dimensional): Yes for both
Supports POD types: Yes for both
Can expand as more inputs are required: Yes for both - that's one big advantage to both of them.
Binds well to variables: If what you mean is some way inside the file itself to say "This piece of data must be automatically deserialized into this variable in my program", then no for both.
As easy to write (reflect) as it is to read: Depends on the library you use, but in my experience yes for both. (You can actually do a tolerable job of writing json using printf().)
Works well on Windows and Linux: Yes for both, and ditto Mac OS X for that matter.
Supports one file referencing another file to read: If you mean something akin to a C #include, then XML has some ability to do this (e.g. document entities), while json doesn't.
Human readable: Both are typically written in UTF-8, and permit line breaks and indentation, and thus can be human-readable. However, I've just been working with a 479 KB XML file that's all on one line, so I had to run it through a prettyprinter to make sense of it. json can also be pretty unreadable, but in my experience is often formatted better than XML.
When starting new projects, I generally prefer json; it's more compact and more human-readable. The main reason I might select XML over json would be if I were worried about receiving badly-formed documents, since XML supports automated document format validation, while you have to write your own validation code with json.
Check out google buffers. This handles most of your requirements.
From their documentation, the high level steps are:
Define message formats in a .proto file.
Use the protocol buffer compiler.
Use the C++ protocol buffer API to write and read messages.
For my purposes, I think the way to go is XML.
The format is a standard, but allows for modification and flexibility for the schema to change as the program requirements evolve.
There are several library options. Some are larger (Xerces-C) some are smaller (ezxml), but there are many options, so we won't be locked in to a single provider or very specific solution.
It can supports tabular input (2d, n-dimensional). This requires more parsing work on "our" end, and is likely the weakest point for XML.
Supports POD types: Absolutely.
Can expand as more inputs are required, binds well to variables, etc. through schema modifications and parser modifications.
Parsing speed is not terribly important, so processing a text file or files is not an issue.
XML can be programmatically written just as easily as read.
Works well on Windows and Linux or any other OS that supports C and text files.
Supports compositing (one file referencing another file to read, and so on.)
Human Readable with many text editors (Sublime, vi, etc.) supporting syntax highlighting out of the box. Many web browsers display the data well.
Thanks for all the great feedback! I think if we wanted a purely binary solution, Protocol Buffers or boost::serialization is likely the way that we would go.

XML Serialization/Deserialization in C++

I am using C++ from Mingw, which is the windows version of GNC C++.
What I want to do is: serialize C++ object into an XML file and deserialize object from XML file on the fly. I check TinyXML. It's pretty useful, and (please correct me if I misunderstand it) it basically add all the nodes during processing, and finally put them into a file in one chunk using TixmlDocument::saveToFile(filename) function.
I am working on real-time processing, and how can I write to a file on the fly and append the following result to the file?
Thanks.
BOOST has a very nice Serialization/Deserialization lib BOOST.Serialization.
If you stream your objects to a boost xml archive it will stream them in xml format.
If xml is to big or to slow you only need to change the archive in a text or binary archive to change the streaming format.
Here is a better example of C++ object serialization:
http://www.codeproject.com/KB/XML/XMLFoundation.aspx
I notice that each TiXmlBase Class has a Print method and also supports streaming to strings and streams.
You could walk the new parts of the document in sequence and output those parts as they are added, maybe?
Give it a try.....
Tony
I've been using gSOAP for this purpose. It is probably too powerful for just XML serialization, but knowing it can do much more means I do not have to consider other solutions for more advanced projects since it also supports WSDL, SOAP, XML-RPC, and JSON. Also suitable for embedded and small devices, since XML is simply a transient wire format and not kept in a DOM or something memory intensive.

How do I deal with "Project Files" in my Qt application?

My Qt application should be able to create/open/save a single "Project" at once. What is the painless way to store project's settings in a file? Should it be XML or something less horrible?
Of course data to be stored in a file is a subject to change over time.
What I need is something like QSettings but bounded to a project in my application rather than to the whole application.
You can use QSettings to store data in a specific .ini file.
From the docs:
Sometimes you do want to access settings stored in a specific file or registry path. On all platforms, if you want to read an INI file directly, you can use the QSettings constructor that takes a file name as first argument and pass QSettings::IniFormat as second argument. For example:
QSettings settings("/home/petra/misc/myapp.ini",
QSettings::IniFormat);
I order to make it user editable, I would stick to plain text with one key = values by line, like in most of the Linux apps.
However this is only for the settings, not for the complete project's data which I suppose requires more complex structures.
So maybe JSON ?
Pro XML:
You can have a look at it in an editor
You can store any kind of string in any language in it (unicode support)
It's simple to learn
More than one program can read the same XML
It's easy to structure your data with XML. When you use key/value lists, for example, you'll run into problems when you need to save tree-like structures.
Contra XML
The result is somewhat bloated
Most programming languages (especially old ones like C++) have no good support for XML. The old XML APIs were designed in such a way that they could be implemented in any language (smallest common denominator). 'nuff said.
You need to understand the concept of "encoding" (or "charset"). While this may look trivial at first glance, there are some hidden issues which can bite you. So always test your code with some umlauts and even kanji (Japanese characters) to make sure you got it right.