Proofread strings with Qt Linguist - c++

Our main language is English, so we use tr("Some english text") all over the source code.
We also plan to translate it to several different languages - no problem with that.
Our customer wants to get all phrases from the source code and proofread them.
Of course, we should put those phrases back after proofreading.
How can we accomplish that in a proper way? Maybe Qt Linguist allow to export/import embedded localizable texts?
I guess the customer can just translate English into English and then we can use that English translation, but it's weird.

I would go with Qt's lupdate utility (could be found in Qt's bin directory) that will extract all string literals from your sources into a xml (ts) file. The file can be opened with Linguist tool.
Note, that the utility considers only strings surrounded with tr() macro. Here is luptdate description:
lupdate is part of Qt's Linguist tool chain. It extracts translatable
messages from Qt UI files, C++, Java and JavaScript/QtScript source
code. Extracted messages are stored in textual translation source
files (typically Qt TS XML). New and modified messages can be merged
into existing TS files.
UPDATE:
Another alternative is keeping all string literals definitions in a separate source file and update it once customer has corrected all strings. I believe this happens not so frequently, or even only once, so it would not be worth of much effort with translations etc.

Finally, it looks like I will have to update phrases (embedded into source code) by hand. Actually, it shouldn't take too much time. If I have time to write a script on Python I will update this post.
UPDATE
So, I made everything "by hand" with a little help from Sublime Text 3.
Find all matches in repository folder using the following regular expression (.*)(tr\((\"(.+?)\")\))(.*)
Copy the search results into new document
Using the same regular expression do the search again and replace each match with \4 - this capture group represents text in tr("").
After receiving phrases from the customer after proofreading, it took 3-5 minutes to find differences with diff tool and update phrases in code.
Not a true-programmer way of resolving problems but worked for me and worked pretty fast!

Related

Is is possible to translate texts in xml files used in a c++ program?

I am working on a multilingual project with wxWidgets, coding in c++. For now, the texts I was translating were all in the cpp files using gnu gettext. Now, I want to use xml files for specific configuration parameters. Among those are different strings to be displayed in multilingual.
Current solution
I design the xml as follows. I declare the strings to be displayed, and I also declare the respective translated strings in the xml. I can then upload them in my program while reading the xml and look in the resulting container for a string depending on the language. But I think this solution is not as elegant as with using gettext.
Do you have better ideas? Am I able to somehow use gettext with xml files on specific nodes?
I thank you in advance for your help.

Extract comment written in Chinese and translate them into English using some script

I have a C++ project in which comments of source code are in Chinese language, now I want to convert them into English.
I tried to solve using google translator but got an Issue: Whole CPP files or header didn't get converted, also I have found the name of the struct, class etc gets changed. Sometimes code also gets modified.
Note: Each .cpp or .h file is less than 1000 lines of code.But there are multiple C++ projects each having around 10 files. Thus I have around 50 files for which I need to translate Chinese text to English.
Well, what did you expect? Google Translate doesn't know what a CPP file is and how to treat it. You'll have to write your own program that extracts comments from them (not that hard), runs just those through Google Translate, and then puts them back in.
Mind you, if there is commented out code, or the comments reference variable names, those will get translated too. Detecting and handling these cases is a lot harder already.
Extracting comments is a lexical issue, and mostly a quite simple one.
In a few hours, you could write (e.g. with flex) some simple command line program extracting them. And a good editor (such as GNU emacs) could even be configured to run that filter on selected code chunks.
(handling a few corner cases, such as raw string literals, might be slightly more difficult, but these don't happen often and you might handle them manually)
BTW, if you are assigned to work on that code, you'll need to understand it, and that takes much more time than copy&pasting or editing each comments manually.
At last, I am not sure of the quality of automatic translation of code comments. You might be disappointed. Also, the code names (of functions, of classes, of variables, etc...) matter a lot more.
Perhaps adding your comments in English could be wiser.
Don't forget to use some version control system. You really need one (e.g. git)
(I am not convinced that extracting comments for automatic translation would help your work)
First separate both comment and code part in different file using python script as below,
import sys
file=sys.argv[1]
f=open(file,"r")
lines=f.readlines()
f.close()
comment=open("comment.txt","w+")
code=open("code.txt","w+")
for l in lines:
if "//" in l:
comment.write(l)
code.write("\n")
else:
code.write(l)
comment.write("\n")
comment.close()
code.close()
Now translate comment.txt with google translator and then use
paste code.txt comment_en > source
where comment_en is translated comment in english.

Pentaho reports localization into not supported languages

I am from Slovakia, I wouldn't be surprised if most of you haven't heard about it.
However, that causes me a troubles when it comes to reports. We need to have 3 (soon 4) language versions of each report: Slovak is main language, than, Polish and English.
Since pentaho does not support Polish nor Slovak, it is really pain for me to keep these localized.
What I do is:
Create report in Slovak language
Write down all phrases from report
Send phrases to one of our partners to translate
Create its copy in either pl/en directory
Open it in Report Designer and edit every phrase accordingly
Save as another language version
As you can imagine, the process is very time consuming, and error prone. Plus, every time I add new parameter to report or change its data source (which is BeanShell script), I need to do it in 3 separated files. As a result of this, language mutations are usually out of date, way behind main language version.
I have tried to automate it with OneSky and did a python script that does 2 stages:
Stage 1 (extract and upload):
Change *.prpt files sufix to *.zip
Extract phrases from files: ~/datadefinition.xml, ~/layout.xml, ~/styles.xml, ~/datasources/inline-ds.xml
Put those phrases into *.po file
Export *.po file into OneSky
Stage 2 (download and import):
Change *.prpt files sufix to *.zip
Download translated *.po file from OneSky
Run through ~/datadefinition.xml, ~/layout.xml, ~/styles.xml, ~/datasources/inline-ds.xml files and replace original phrases by translated
While this aproach works fine, it doe not translate everything. There are still flaws of this process. I need to go through it every time I do even slightest change in data source of report or fix small mistakes. Even if I just do a small six in SQL code, I need to do it in 3 files. That of course increases chance to mistake be made.
Soo, I was wondering, how are you guys solving this issue with translating of your reports?
I will share very simple method which we are following.
1)create a properties file with key value format for each language for resource labels(for static values)
2)put it into resources folder(report-designer/resources/)
3)Based on the parameter you can specify which properties file to select and you can specify keys into value field so that it can understand which value to display in which language.
4)if you need to convert the data which are coming from database,you have to design data warehouse specify all the mappings,accordingly it can fetch the data.
5)For converting dates and currency symbols or number format you can use inbuilt functions which can handle all this things,i am using mysql and mysql has translation functions which can handle all such things.
it is difficult to explain entire process here, but if you can get and idea from this it can be useful to you.

File system regular expression search tool

What is the best tool to make complex (multi-line) regular expression file contents searches with good reporting capabilities?
I need to make a report over large Java/JSP code base and I have to make some charts afterward.
Eclipse is rather good at searches, but it does not provide good report of what is found. It just shows the tree of files, but I would like to see a table with columns corresponding to full match, each group, file name, file path, file date, may some version control information etc. Then I can transfer this table to Excel and make some graphs that I want.
Is there some generic file system search tool that has such capabilities? Or maybe there is some Eclispe plugin that can give better reports (note that I'm stuck on eclipse 3.1.2)?
Agent Ransack, TextPad, and UltraEdit allow you to perform regular expression searches against the file system. My favorite is Agent Ransack as you can specify regular expressions for the file names and for the content.
PowerGREP (on Windows) can be used to do (most of) that. You can define the format of your search results quite freely. I haven't tried yet to also add file meta information to the search results, but that should work. Not sure if you can add version control information (where would that come from?) - perhaps if you could be a bit more specific, I could check.
Other than that, why not write a small Python/Ruby/Perl script like JasonTrue suggested?
For searches over code bases with queries that understand the language structure, look at SD Search Engine. This tool indexes larges source base to provide very fast query response.
Queries are stated in terms of langauge elements (identifiers, operators, strings, ...) with constraints over the language elements (including wildcards and regexps on identifiers, strings and comments, as well as range constraints on numbers). Language whitespace and linebreaks (and comments unless you insist) are ignored.
If you want to do a plain regexp search on file character content, you can do that too but you don't get the speed advantage of the index, runs more like regular grep.
The interactive query result is shown in a hit window with other hits; by clicking, you can go to window containg the full source code of a hit.
In logging mode, all hits found are written to a log file with N lines of context, where you configure N. That's probably the report you want.
um... grep -r ?
Or ruby/perl/python, if you want to have more control over the final output; it sounds like what you're after would only be a few lines.

Do you know of a good program for editing/translating resource (.rc) files?

I'm building a C++/MFC program in a multilingual environment. I have one main (national) language and three international languages. Every time I add a feature to the program I have to keep the international languages up-to-date with the national one. The resource editor in Visual Studio is not very helpful because I frequently end up leaving a string, dialog box, etc., untranslated.
I wonder if you guys know of a program that can edit resource (.rc) files and
Build a file that includes only the strings to be translated and their respective IDs and accepts the same (or similar) file in another language (this would be helpful since usually the translation is done by someone else), or
Handle the translations itself, allowing to view the same string in different languages at the same time.
In my experience, internationalization requires a little more than translating strings. Many strings when translated, require more space on a dialog. Because of this it's useful to be able to customize the dialogs for each language. Otherwise you have to create dialogs with extra space for the translated strings which then looks less than optimal when displayed in English.
Quite a while back I was using a translation tool for an MFC application but the company that produced the software stopped selling it. When I tried to find a reasonably priced replacement I did not find one.
Check out Lingobit Localizer. Expensive, but well worth it.
Here's a script I use to generate resource files for testing in different languages. It just parses a response from babelfish so clearly the translation will be about as high quality as that done by a drunken monkey, but it's useful for testing and such
for i in $trfile
do
key=`echo $i | sed 's/^\(.*\)=\(.*\)$/\1/g'`
value=`echo $i | sed 's/^\(.*\)=\(.*\)$/\2/g'`
url="http://babelfish.altavista.com/tr?doit=done&intl=1&tt=urltext&lp=$langs&btnTrTxt=Translate&trtext=$value"
wget -O foo.html -A "$agent" "$url" *&> /dev/null
tx=`grep "<td bgcolor=white class=s><div style=padding:10px;>" foo.html`
tx=`echo $tx | iconv -f latin1 -t utf-8 | sed 's/<td bgcolor=white class=s><div style=padding:10px;>\(.*\)<\/div><\/td>/\1/g'`
echo $key=$tx
done
rm foo.html
Check out appTranslator, its relatively cheap and works rather well. The guy developing it is really responsive to enhancement requests and bug report, so you get really good support.
You might take a look at Sisulizer http://www.sisulizer.com. Expensive though. We're evaluating it for use at my company to manage the headache of ongoing translation. I read on their About page that the company was founded by people who left Multilizer and other similar companies.
If there isn't one, it would be pretty easy to loop through all the strings in a resource a compare them to the international resources. You could probably do this with a simple grid.
In the end we have ended up building our own external tools to manage this. Our devs work in the english string table and every automated build sends our strings that have been added/changed and deleted to translation manager. He can also run a report at anytime from an old build to determine what is required for translation.
Check out RC-WinTrans. Its a commercial tool that my company uses. It basically imports our .RC files (or .resx files) into a database which we send to a different office for translation. The tool can then export a translated .RC file (or .resx file) for each language from the database. It even has a basic dialog box editor so the translator can adjust the size of various controls in the dialog box to be sure the translated text fits.
It also accepts a number of command line arguments and has a COM automation interface so you can integrate it into a build process more easily. It works quite well for us and we literally have thousands and thousands of strings and dialog boxes, etc.
(We currently have version 7 so what I've said might be a little bit different than their latest version 8.)
Also try AppTranslator: http://www.apptranslator.com/. It has a build-in resource editor so that translators can, for example, enlargen a text box when need bo. It has separate versions for developers and translators and much more.
We are using Multilizer (http://www.multilizer.com/) and although sometimes it's a bit tricky to use, at the end with a bit of patient it works pretty well.
We even have a translation web site where translators can download our projects and then upload the translations using Multilizer command-line features.
Managing localization and translations using .rc files and Visual Studio is not a good idea. It's much smarter (though counter-intuitive) to start localization through the exe. Read here why: http://www.apptranslator.com/misconceptions.html
I've written this recently, which integrates into VS:
https://github.com/ekkis/Powershell/blob/master/MT.ps1
largely because I was unsatisfied with the solutions out there. you'll need to get a client id from M$ (but they give you 2M words/month translation free - not bad)
ResxCrunch will be out soon, it will edit multiple resource files in multiple languages in one single table.