Let's say that a long time ago I began a project which wasn't supposed to be shared or translated. To make it simple all strings and GUI were in my native language in the code, in French.
Then the project became more and more important so that several translations are now available.
The problem is that all translations start from the French strings in the code. For the guy speaking XXX and willing to translate the software into XXX, using Qt Linguist implies opening 2 files:
the translation from French to English,
the translation from French to XXX.
This way it is possible to translate from English to XXX.
Is there a tool somewhere that could replace all French strings of the sources with their corresponding English translation? Another tool that would change all .ts files so that they are not translating French anymore but English?
If you open 2 files at the same time with Qt Linguist you will see translations for both. So if you provide the complete translation from French to English, and the incomplete translation from French to XXX, your translator only need to open the 2 files at once.
Related
I am testing a QT application I am internationalizing. I think I have found all the strings that need processing through tr() or ::Translate(). I think I have handled all occurrences.
I have gotten back the first rounds of translations (done through Qt linguist tool).
Through QA testing I want to make sure all strings are translated and if possible I would like to easily identify any that are not translated and why.
What tools or methods are available for this?
I have discovered
lrelease -markuntranslated <prefix>
This takes any strings in the .ts file that are untranslated and prepends prefix
If the label "do it now" is in the .ts file and it is not translated in Qt linguist, then if you run lrelease with
-markuntranslated NT-
then you will see the label presented when you run the application as 'NT-do it now'.
This makes it obvious for whomever is testing that the problem is a string that has been processed via ::Translate() or tr(), but which has not been translated in QT linguist.
And any wholly original strings in the UI of the running application must thus be fully untouched by any of the QT translation machinery.
see lrelease -help for more info.
I am using python 2.7 & django 1.7.
When I use Google Translator Toolkit to machine translate my .po files to another language (English to German), there are many errors due to the use of different django template variables in my translation tags.
I understand that machine translation is not so great, but I am wanting to only test my translation strings on my test pages.
Here is an example of a typical error of the machine-translated .po file translated from English (en) to German (de).
#. Translators: {{ site_name_lowercase }} is a variable that does not require translation.
#: .\templates\users\reset_password_email_html.txt:47
#: .\templates\users\reset_password_email_txt.txt:18
#, python-format
msgid ""
"Once you've returned to %(site_name_lowercase)s.com, we will give you "
"instructions to reset your password."
msgstr "Sobald du mit% (site_name_lowercase) s.com zurückgegeben haben, geben wir Ihnen Anweisungen, um Ihr Passwort zurückzusetzen."
The %(site_name_lowercase)s is machine translated to % (site_name_lowercase) s and is often concatenated to the precedding word, as shown above.
I have hundreds of these type of errors and I estimate that a find & replace would take at least 7 hours. Plus if I makemessages and then translate the .po file again I would have to go through the find and replace again.
I am hoping that there is some type of undocumented rule in the Google Translator Toolkit that will allow the machine-translation to ignore the variables. I have read the Google Translator Toolkit docs and searched SO & Google, but did not find anything that would assist me.
Does anyone have any suggestions?
The %(site_name_lowercase)s is machine translated to % (site_name_lowercase) s and is often concatenated to the precedding word, as shown above.
This is caused by tokenization prior to translation, followed by detokenization after translation, i.e. Google Translate tries to split the input before translation to re-merge it after translation. The variables you use are typically composed of characters that are used by tokenizers to detect token boundaries. To avoid this sort of problem, you can pre-process your file and replace the offending variables by placeholders that do not have this issue - I suggest you try out a couple of things, e.g. _VAR_PLACE_HOLDER_. It is important that you do not use any punctuation characters that may cause the tokenizer to split. After pre-processing, translate your newly generated file and post-process by replacing the placeholders by their original value. Typically, your placeholder will be picked up as an Out-Of-Vocabulary (OOV) item and it will be preserved during translation. Try to experiment with including a sequence number (to keep track of your placeholders during post-processing), since word reordering may occur. There used to be a scientific API for Google Translate that gives you the token alignments. You could use these for post-processing as well.
Note that this procedure will not give you the best translation output possible, as the language model will not recognize the placeholder. You can see this illustrated here (with placeholder, the token "gelezen" is in the wrong place):
https://translate.google.com/#en/nl/I%20have%20read%20SOME_VARIABLE_1%20some%20time%20ago%0AI%20have%20read%20a%20book%20some%20time%20ago
If you just want to test the system for your variables, and you do not care about the translation quality, this is the fastest way to go.
Should you decide to go for a better solution, you can solve this issue yourself by developing your own machine translation system (it's fun, by the way, see http://www.statmt.org/moses/) and apply the procedure explained above, but then with, for example Part-Of-Speech-Tags to improve the language model. Note that you can use the alignment information as well.
I am creating an OpenGL game and I would like to make it open to more languages than just English for obvious reasons. From looking around and fiddling around with the games installed on my computer I can see that locales play a big part in this and that .lang files, such as en-US.lang that is shipped with minecraft, are basically text documents with a language code, "item.iron.ingot" for example, an equal sign, and then what it means for that given language, English as per en-US, so in this case would be, "Iron Ingot". Well I created a file that I named en-US.lang and this is its contents:
item.iron.ingot=Iron Ingot
In my C++ main method I put:
setlocale(LC_ALL, "en-US");
After including the locale header file. So I suppose the part that I am confused by is how to use the locales to read from the .lang file? Please help SO and some example code would be appreciated.
C++ Does not come with a built-in support for resource files / internationalization. However there is a huge variety of solutions.
To support multi-language messages, you should have some basic understanding of how such strings are encoded in files and read to memory. Here is a basic introduction if you are not familiar:
"http://www.joelonsoftware.com/articles/Unicode.html"
To keep and load the correct text at runtime you need to use a third party library: GNU gettext http://www.gnu.org/software/gettext/ is one such example. However there are other solutions out there.
Our main language is English, so we use tr("Some english text") all over the source code.
We also plan to translate it to several different languages - no problem with that.
Our customer wants to get all phrases from the source code and proofread them.
Of course, we should put those phrases back after proofreading.
How can we accomplish that in a proper way? Maybe Qt Linguist allow to export/import embedded localizable texts?
I guess the customer can just translate English into English and then we can use that English translation, but it's weird.
I would go with Qt's lupdate utility (could be found in Qt's bin directory) that will extract all string literals from your sources into a xml (ts) file. The file can be opened with Linguist tool.
Note, that the utility considers only strings surrounded with tr() macro. Here is luptdate description:
lupdate is part of Qt's Linguist tool chain. It extracts translatable
messages from Qt UI files, C++, Java and JavaScript/QtScript source
code. Extracted messages are stored in textual translation source
files (typically Qt TS XML). New and modified messages can be merged
into existing TS files.
UPDATE:
Another alternative is keeping all string literals definitions in a separate source file and update it once customer has corrected all strings. I believe this happens not so frequently, or even only once, so it would not be worth of much effort with translations etc.
Finally, it looks like I will have to update phrases (embedded into source code) by hand. Actually, it shouldn't take too much time. If I have time to write a script on Python I will update this post.
UPDATE
So, I made everything "by hand" with a little help from Sublime Text 3.
Find all matches in repository folder using the following regular expression (.*)(tr\((\"(.+?)\")\))(.*)
Copy the search results into new document
Using the same regular expression do the search again and replace each match with \4 - this capture group represents text in tr("").
After receiving phrases from the customer after proofreading, it took 3-5 minutes to find differences with diff tool and update phrases in code.
Not a true-programmer way of resolving problems but worked for me and worked pretty fast!
I am working on Ubuntu 9.10 aka Karmic Kola and latest version of gcc, Qt 4.6.2. I have installed the french fonts and hindi fonts for ubuntu. I changed the language and Keyboard layout accordingly so that I could type in the abovementioned languages. It worked fine. I then made a sample application and added appropriate translations in Hindi and French. The linguist tool worked fine then. I was able to type in Hindi in Linguist. This was around a week back.
Today I was making a different application with Hindi translations, with the steps that I did earlier using Qt Linguist. But now when I type in Hindi in Qt Linguist it gives only one character(for any keypress) which is like "=" with more space between the two horizontal bars in "equal to" sign. In the .ts file generated by lrelease the translations are displayed perfectly but on execution again characters in the form of squares appear as the translated text. I have tried umpteen times, even changing the codecrtr in .pro file but to no effect.
Can somebody point out why Qt Linguist is interpreting hindi characters as "=" but when typed in other applications like openoffice writer and browsers its perfect hindi fonts? I have torn my hair the whole day on this seemingly annoying problem. Didn't try for french though :).
Thanks
After banging my head many times on this problem, I decided to try translations on a newer version of Qt and
VOILA it did the trick. Probably there is a bug in Qt translations module in the version I was using. Notified Qt guys about the same. Hope others will find this information valuable.
Cheers!!!