A Cross-Platform Localization Resource Format - c++

I want to keep my replacement strings (German, French, etc) in a file format that is standard-ish and useable across Windows and Linux platforms. Thus VC++ resource files are ruled out right away.
What file format do others prefer to use for keeping these l10n resources? Two more features I'd like the format to support are:
the "key" for indexing l10n strings is itself an English string, rather than an enum.
the format can carry a message digest, so I could verify there has been no tampering.
My intent would be to use a function (e.g. wstring foo = GetString(L"I am %1% years old");) that feeds the boost::format or boost::wformat functions. Notice that the key fed to GetString is a string, not an enum.
Obviously I can use whatever XML format (or otherwise) I'd like to dream up. But I'd rather use something that is somewhat standard.

For a standard format, use gettext and msgfmt to make binary .mo files. The format comes from Unix, but is usable cross platform. Audacity, which is Linux/Mac/Windows, uses it.
1) The key is the English string.
2) The standard format doesn't come with an anti-tamper approach, so you will need to cook up your own.
There is also an editor, poEdit, and an emacs mode for working with the translations in the intermediate textual .po format.

We used a library called I18N (I'm sure there are a ton of implementations named the same way all over). The keys and translations were stored in a .txt file, and used some hash for faster lookups. We modified it some to improve the context - it used a context of filename, so you could not use multiple translations of the same string literal in the same file.
Usage was something like Translated = String::Format(I18N("I am %d years old"), years);
We would periodically run a separately executable against our sourcecode to parse out all the various I18N entries, rehash them, and update the file with any new additions.
Unfortunately I can't find any attributes to the author in the source.

Not sure if there is one, but if there is chances are it is mentioned somewhere at lisa.org: http://www.lisa.org/Standards.30.0.html

Related

Qt : Avoid Rebuilding Applications incase strings change

I wanted to know what my options are for storing strings in a QT application.
One of my major requirements in not re-building the entire project or any file in-case one string changes and also have all the strings in one place.In short I would like to have the strings in one place and extract them during Application startup
I've used all of the elements talked about in above answers.
XML, JSON, QSettings w/ Ini files, tr()
All of them can do it just fine. I've put together some notes on the different options:
QTranslator
Qt Linguist and the tr() tags are designed to take your stock language and translate it into another language. Keeping track of multiple versions of the english translation and modifying/releasing without Qt Linguist is almost impossible. Qt Linguist is required to "release" your translations and convert them from a TS file (translation source) to an optimized QM file.
The QM file format is a compact binary format that is used by the localized application. It provides extremely fast lookups for translations.
Here is what using a translation file looks like:
QTranslator translator;
translator.load("hellotr_la");
app.installTranslator(&translator);
http://qt-project.org/doc/qt-4.8/qtranslator.html#details
I think using QTranslator for a few string changes may be a weird use case, unless you are using for localizing your program. But like the docs say, it is optimized for very fast look ups of string replacements.
QXMLStreamReader
The stream reader is "recommended" way to access XML files, or at least with better support. You write your own files for organizing it, or you write code to generate the XML.
<STRING_1>Some string</STRING_1>
Here is what it looks like to navigate into xml.
QXmlStreamReader xml;
...
while (!xml.atEnd()) {
xml.readNext();
... // do processing
}
if (xml.hasError()) {
... // do error handling
}
XML is very similar to Json, but with larger files and the start and end tags are longer. There are a lot more stock readers out there for XML. It is also a lot more human readable in many cases because so many people know html and they are very similar.
QJsonDocument
The JSON suppport in Qt 5 looks really good. I haven't built a project with it quite yet It is as easy as it looks, and as far as accessing and setting, it looks just like using a dictionary or a map or a vector.
UPDATE: You just pass around a pointer into your QJsonDocument or your QJsonObject or your QJsonArray as you are navigating deeper or appending more onto your Json file. And when you are done you can save it as a binary file, or as a clear text, human readable file, with proper indentation and everything!
How to create/read/write JSon files in Qt5
Json seems to be turning into the replacement for XML for many people. I like the example of using Json to save and load the state of a role playing game.
http://qt-project.org/doc/qt-5/qtcore-savegame-example.html
QSettings
QSettings is one of my favorites, just because it has been supported for so long, and it is how most persistent settings should be saved and accessed.
When I use it, to take advantage of the defaults and fall back mechanisms, I put this in my main.cpp:
QCoreApplication::setOrganizationName("MySoft");
QCoreApplication::setOrganizationDomain("mysoft.com");
QCoreApplication::setApplicationName("Star Runner");
And because I sometimes find a need to edit these setting by hand in windows, I use the Ini format.
QSettings::setDefaultFormat(QSettings::IniFormat); // also in main.cpp
Then when I deploy my exe, and I want to have particular value loaded instead of the hardcoded defaults, the installer drops the main fallback into
C:/ProgramData/MySoft/Star Runner.ini
And when the program saves a change at runtime, it gets saved to:
C:/Users/<username>/AppData/Roaming/MySoft/Star Runner.ini
And then throughout my program if I need to get a setting or set a setting, it takes 3 lines of code or less.
// setting the value
QSettings s;
s.setValue("Strings/string_1", "new string");
// getting the value
QString str;
QSettings s;
str = s.value("Strings/string_1", "default string").toString();
And here is what your ini file would look like:
[Strings]
string_1=default string
QSettings is the way to go if you are storing a few strings you want to change on deployment or at runtime. (or if a checkbox is now checked, or your window size and position, or the recent files list or whatever).
QSettings has been optimized quite a bit and is well thought out. The ini support is awesome, with the exception that it sometimes reorders groups and keys (usually alphabetically), and it may drop any comments you put in it. I think ini comments are either started with a ; or a #.
Hope that helps.
One way to do this would be to put it in a shared library. This way you can only recompile the shared library, but not the whole project. Another approach would be to put it in a file or a database and load it at runtime.
And of course you have to check your include dependencies. If you are including the headers everywhere, the compiler will rebuild everything that depends on it, even if the header is not really needed.
Another possible solution is to replace all strings with default ones inside tr() calls and use Qt Linguist to manage all the strings.
You'll also be able to load all the "translations" from external .qm file on startup.
It is simple: Store your volatile strings in QSettings http://qt-project.org/doc/qt-4.8/qsettings.html (you can use ini files or registry) or an XML file which will make you application configurable.
Edit: Well, after thinking about it a few more minutes, Guilherme's comment is right. QSettings will need to be initialized somehow (either manually or from some default values in your code)... and to be honest manual editing registry to change a few strings is not the brightest idea. So I conclude that XML or JSON is definitely better. It has advantages, for example you can keep several config files which allow you for switching languages at runtime etc.

"Best" Input File Formats for C++? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am starting work on a new piece of software that will end up needing some robust and expandable file IO. There are a lot of formats out there. XML, JSON, INI, etc. However, there are always plusses and minuses so I thought I would ask for some community input.
Here are some rough requirements:
The format is a "standard"...I don't want to reinvent the wheel if I don't have to. It doesn't have to be a formal IEEE standard, but something you could Google and get some information on as a new user, may have some support tools (editors) beyond vi. (Though the software users will generally be computer savvy and happy to use vi.)
Easily integrates with C++. I don't want to have to pull along a 100mb library and three different compilers to get it up and running.
Supports tabular input (2d, n-dimensional)
Supports POD types
Can expand as more inputs are required, binds well to variables, etc.
Parsing speed is not terribly important
Ideally, as easy to write (reflect) as it is to read
Works well on Windows and Linux
Supports compositing (one file referencing another file to read, and so on.)
Human Readable
In a perfect world, I would use a header-only library or some clean STL implementation, but I'm fine with leveraging Boost or some small external library if it works well.
So, what are your thoughts on various formats? Drawbacks? Advantages?
Edit
Options to consider? Anything else to add?
XML
YAML
SQLite
Google Protocol Buffers
Boost Serialization
INI
JSON
There is one excellent format that meets all your criteria:
SQLite!
Please read article about using SQLite as an application file format. Also, please watch Google Tech Talk by D. Richard Hipp (SQLite author) about this very topic.
Now, lets see how SQLite meets your requirements:
The format is a "standard"
SQLite has become format of choice for most mobile environments, and for many desktop apps (Firefox, Thunderbird, Google Chrome, Adobe Reader, you name it).
Easily integrates with C++
SQLite has standard C interface, which is only one source file and one header file. There are C++ wrappers too.
Supports tabular input (2d, n-dimensional)
SQLite table is as tabular as you could possibly imagine. To represent say 3-dimensional data, create table with columns x,y,z,value and store your data as a set of rows like this:
x1,y1,z1,value1
x2,y2,z2,value2
...
Supports POD types
I assume by POD you meant Plain Old Data, or BLOB. SQLite lets you store BLOB fields as is.
Can expand as more inputs are required, binds well to variables
This is where it really shines.
Parsing speed is not terribly important
But SQLite speed is superb. In fact, parsing is basically transparent.
Ideally, as easy to write (reflect) as it is to read
Just use INSERT to write and SELECT to read - what could be easier?
Works well on Windows and Linux
You bet, and all other platforms as well.
Supports compositing (one file referencing another file to read)
You can ATTACH one database to another.
Human Readable
Not in binary, but there are many excellent SQLite browsers/editors out there. I like SQLite Expert Personal on Windows and sqliteman on Linux. There is also SQLite editor plugin for Firefox.
There are other advantages that SQLite gives you for free:
Data is indexable which makes it very fast to search. You just cannot do this using XML, JSON or any other text-only formats.
Data can be edited partially, even when amount of data is very large. You do not have to rewrite few gigabytes just to edit one value.
SQLite is fully transactional: it guarantees that your data is consistent at all times. Even if your application (or whole computer) crashes, your data will be automatically restored to last known consistent state on next first attempt to connect to the database.
SQLite stores your data verbatim: you do not need to worry about escaping junk characters in your data (including zero bytes embedded in your strings) - simply always use prepared statements, that's all it takes to make it transparent. This can be big and annoying problem when dealing with text data formats, XML in particular.
SQLite stores all strings in Unicode: UTF-8 (default) or UTF-16. In other words, you do not need to worry about text encodings or international support for your data format.
SQLite allows you to process data in small chunks (row by row in fact), thus it works well in low memory conditions. This can be a problem for any text based formats, because often they need to load all text into memory to parse it. Granted, there are few efficient stream-based XML parsers out there, but in general any XML parser will be quite memory greedy compared to SQLite.
Having worked quite a bit with both XML and json, here's my rather subjective opinion of both as extendable serialization formats:
The format is a "standard": Yes for both
Easily integrates with C++: Yes for both. In each case you'll probably wind up with some kind of library to handle it. On Linux, libxml2 is a standard, and libxml++ is a C++ wrapper for it; you should be able to get both of those from your distro's package manager. It will take some small effort to get those working on Windows. There appears to be some support in Boost for json, but I haven't used it; I've always dealt with json using libraries. Really, the library route is not very onerous for either.
Supports tabular input (2d, n-dimensional): Yes for both
Supports POD types: Yes for both
Can expand as more inputs are required: Yes for both - that's one big advantage to both of them.
Binds well to variables: If what you mean is some way inside the file itself to say "This piece of data must be automatically deserialized into this variable in my program", then no for both.
As easy to write (reflect) as it is to read: Depends on the library you use, but in my experience yes for both. (You can actually do a tolerable job of writing json using printf().)
Works well on Windows and Linux: Yes for both, and ditto Mac OS X for that matter.
Supports one file referencing another file to read: If you mean something akin to a C #include, then XML has some ability to do this (e.g. document entities), while json doesn't.
Human readable: Both are typically written in UTF-8, and permit line breaks and indentation, and thus can be human-readable. However, I've just been working with a 479 KB XML file that's all on one line, so I had to run it through a prettyprinter to make sense of it. json can also be pretty unreadable, but in my experience is often formatted better than XML.
When starting new projects, I generally prefer json; it's more compact and more human-readable. The main reason I might select XML over json would be if I were worried about receiving badly-formed documents, since XML supports automated document format validation, while you have to write your own validation code with json.
Check out google buffers. This handles most of your requirements.
From their documentation, the high level steps are:
Define message formats in a .proto file.
Use the protocol buffer compiler.
Use the C++ protocol buffer API to write and read messages.
For my purposes, I think the way to go is XML.
The format is a standard, but allows for modification and flexibility for the schema to change as the program requirements evolve.
There are several library options. Some are larger (Xerces-C) some are smaller (ezxml), but there are many options, so we won't be locked in to a single provider or very specific solution.
It can supports tabular input (2d, n-dimensional). This requires more parsing work on "our" end, and is likely the weakest point for XML.
Supports POD types: Absolutely.
Can expand as more inputs are required, binds well to variables, etc. through schema modifications and parser modifications.
Parsing speed is not terribly important, so processing a text file or files is not an issue.
XML can be programmatically written just as easily as read.
Works well on Windows and Linux or any other OS that supports C and text files.
Supports compositing (one file referencing another file to read, and so on.)
Human Readable with many text editors (Sublime, vi, etc.) supporting syntax highlighting out of the box. Many web browsers display the data well.
Thanks for all the great feedback! I think if we wanted a purely binary solution, Protocol Buffers or boost::serialization is likely the way that we would go.

Effective and simple way of storing multiline strings of text?

I'm trying to find a way to do this.
My game is multilingual.
I have English.txt, French.txt, etc..
I'm wondering what would be a good way to store it in the file for example:
<sendbutton.tooltip>
Use this button to send text.
The text can be as long as you like!
</sendbutton.tooltip>
or
sendbutton.tooltip = Use this button to send text.\n\nThe text can be as long as you like!
I then will map these strings to their element name for runtime use.
Other than using a standard like XML, what is usually done to do this?
Thanks
Usually this kind of localization tasks is done with GNU gettext.
It depends when are you going to load the file.
For standard translation stuff, I recommend you take a look at gettext. It provides translation tools and easy way to include it. You can store English text a C strings enclosed with translation macro () or T() or whatever, and gettext would provide you with strings that need translating. It also tracks the translations that need to be updated when original English text changes. You store all translations for specific language in separate files.
Not sure for C++ but maybe resource files where you create a separate file for each language and have key/value pairs for the lookup with each langauge using the same key but message text is in the correct language e.g.
resources.de
LOGOUT, Abmelden
resources.en
LOGOUT, Logout
then depending on users language choice, you load the appropriate resource file to display the correct text. Think you would just store the mutlilines with /n as in your second example.

Methods of storing application data/settings without the registry?

I need some methods of storing and getting data from a file (in WIN32 api c++ application, not MFC or .NET)
e.g. saving the x, y, width and height of the window when you close it, and loading the data when you open the window.
I have tried .ini files, with the functions -- WritePrivateProfileString and ReadPrivateProfileString/Int, but on MSDN it says
"This function is provided only for compatibility with 16-bit Windows-based applications. Applications should store initialization information in the registry."
and when i tried on my Windows7 64bit machine to read a ini file, i got blue screen! (in debug mode with visual studio) O.O
I notice that most other application use XML to store data, but I don't have a clue how to read/write xml data in c++, are there any libraries or windows functions which will allow me to use xml data?
Any other suggestions would be good too, thanks.
There is nothing wrong with .ini files, the only problem with them is where to write them. CIniFile from CodeProject is good enough class. Ini file should be placed in %APPDATA%/<Name Of Your Application> (or %LOCALAPPDATA%\<Same Name Here>, as described below).
EDIT: If we are talking about Windows family of operating systems from Windows 2000 onward then function SHGetFolderPath is portable way to retrieve user specific folder where application configuration files should be stored. To store data in romaing folder use CSIDL_APPDATA with SHGetFolderPath. To store data to local folder use CSIDL_LOCAL_APPDATA.
The difference between local and roaming folder is in the nature of the data to be stored. If data is too large or machine specific then store it in local folder. Your data (coordinates and size of the window) are local in nature (on other machine you may have different resolution), so you should actually use CSIDL_LOCAL_APPDATA.
Windows Vista and later have extended function SHGetKnownFolderPath with its own set of constants, but if you seek compatibility stick to the former SHGetFolderPath.
TinyXML is a popular and simple XML parser for C++.
Apart from that, you can really use any format you want to store your settings, though it's considered good practice to keep settings in text format so that they can be hand-edited if necessary.
It's fairly simple to write your own functions for reading/writing a file in INI or similar format. The format is entirely up to you, as long as it's easily comprehensible to humans. Some possibilities are:
; Comment
# Comment
Key = Value (standard INI format)
Key Value
Key: Value
You could use Boost.PropertyTree for this.
Property trees are versatile data
structures, but are particularly
suited for holding configuration data.
The tree provides its own,
tree-specific interface, and each node
is also an STL-compatible Sequence for
its child nodes.
It supports serialization, and so is well-suited to managing and persisting changeable configuration data. There is an example here on how to load and save using the XML data format that this library supports.
The library uses RapidXML internally but hides the parsing and encoding details, which would save you some implementation time (XML libraries all have their idiosyncracies), while still allowing you to use XML as the data representation on disk.
libxml2. I have seen quite a lot places where it is used. Easy to use and loads of examples to get you started and not a vast library as such. And in C, take it wherever you want.
pugixml is another good (and well documented) XML parser library. And If you like portability XML is a better option.
While INI files may not be the best format, I think you can safely ignore the warning MSDN puts on WritePrivateProfileString and ReadPrivateProfileString.
Those two functions are NEVER going away. It would break THOUSANDS of applications.
That warning has been there for years and I suspect was added when the registry was all the rage and someone naively thought it would one day completely replace INI files.
I might be wrong but it would be very unlike Microsoft to break so many existing apps like this for no good reasons. (Not that they do not occasionally break backwards compatibility, but this would cause huge problems for zero benefit.)
Ohhh My GOD? Have you ever thought of stright-forward solution rather then thinking of Super-Duper-all-can-do framework way?
Sorry...
You want to store two numbers between restarts???
Save: Open a file, write these two numbers, close the file:
std::ifstream out(file_name);
out << x << ' ' << y;
out.close();
Load: Open a file, read these two numbers, close the file:
std::ifstream in(file_name);
if(!in) return error...
in >> x >> y;
if(!in) return error...
in.close();
Libconfig is the best solution in C++ as far as I have tried.
Works multi platform with minimum coding.
You must try that!
I like the TinyXML solution suggested.
But for Windows, I like .ini even more.
So I'll suggest the inih library, free and open source on GitHub here. Very simple and easy to use - 1 header file library iirc.

How do I deal with "Project Files" in my Qt application?

My Qt application should be able to create/open/save a single "Project" at once. What is the painless way to store project's settings in a file? Should it be XML or something less horrible?
Of course data to be stored in a file is a subject to change over time.
What I need is something like QSettings but bounded to a project in my application rather than to the whole application.
You can use QSettings to store data in a specific .ini file.
From the docs:
Sometimes you do want to access settings stored in a specific file or registry path. On all platforms, if you want to read an INI file directly, you can use the QSettings constructor that takes a file name as first argument and pass QSettings::IniFormat as second argument. For example:
QSettings settings("/home/petra/misc/myapp.ini",
QSettings::IniFormat);
I order to make it user editable, I would stick to plain text with one key = values by line, like in most of the Linux apps.
However this is only for the settings, not for the complete project's data which I suppose requires more complex structures.
So maybe JSON ?
Pro XML:
You can have a look at it in an editor
You can store any kind of string in any language in it (unicode support)
It's simple to learn
More than one program can read the same XML
It's easy to structure your data with XML. When you use key/value lists, for example, you'll run into problems when you need to save tree-like structures.
Contra XML
The result is somewhat bloated
Most programming languages (especially old ones like C++) have no good support for XML. The old XML APIs were designed in such a way that they could be implemented in any language (smallest common denominator). 'nuff said.
You need to understand the concept of "encoding" (or "charset"). While this may look trivial at first glance, there are some hidden issues which can bite you. So always test your code with some umlauts and even kanji (Japanese characters) to make sure you got it right.