Qt internationalization from native language - c++

I am going to write software in Qt. Its string literals should be written in native (non-English) language and they should support internationalization. The Qt docs advice to use tr() function for this http://doc.qt.io/qt-5/i18n-source-translation.html
So I try to write:
edit->setText(tr("Фильтр"));
and and I can see only question marks in running app
I replace it with QString::fromStdWString
edit->setText(QString::fromStdWString(L"Фильтр"));
and I can see correct text in my language
So the question is: How should I write non-ASCII strings to be able to correctly display them and translate using Qt Linguist
PS: I use UTF8 encoding for all source files, compiler is vs2013
PS2: I have found QTextCodec::setCodecForTr() function.. but It was removed from Qt 5.4

I think that the best option is to use some kind of Latin1 transliteration inside program source. Then, it's possible to implement both Russian and English versions as normal Qt translations.
BTW. It's possible, with some additional work, to use even plain numbers as translation-placeholders. Just like MFC did.

I found the strange solution for my problem:
By default VS saves files in UTF8 with BOM. In File -> Advanced save options I choose to save file in UTF without BOM and everything works like a charm:
edit->setText(tr("Фильтр"));
It looks like a VS compiler bug.. Interestingly MS claims that its compiler support Unicode only for UTF8 with BOM https://msdn.microsoft.com/en-us/library/xwy0e8f2.aspx
PS: length of "Фильтр" is 12 bytes, so it is really utf8 string

Related

Qt C++ write to / read from "IBM037 / CP037"

I was looking for a way to write to and read from IBM037 encoding in Qt. I was able to achieve that in C# by using
Encoding.GetEncoding("IBM037")
However, I am currently porting an application from C# to C++ using Qt, and I wasn't able to find a way to do so.
Thanks in advance.
Edit: I am aware of QTextCodec but it does not contain a definition for IBM 037. Using it returns a normal text (non-encoded).
You can implement your own class derived from QTextCodec and use tables (like the ones available here ) to perform the translation character by character.
As suggested in the comments, check what stated in the QTextCodec documentation here.
With tables like these you can translate to ASCII 8 bit. Then convert the ASCII characters to Unicode using the functions already provided by the Qt framework.

Japanese characters are not written correctly when saving to a file

I have a .NET based Excel addin that uses a C++/CLI library to read/write proprietary files. The C++/CLI library links to some core C++ libraries that provide classes to read and write these files. The core classes use std::string and std::i/ofstream to read/write data in proprietary files.
So when saving data, it goes from:
Excel >> .NET AddIn (string) >> C++/CLI Lib (System::String) >> C++ Core Lib (std::string)
All works fine with simple text (ASCII) files. Now I have a text file (ANSI encoding) with some Japanese characters in it saved on a Japanese machine. I think it uses the SHIFT-JIS encoding by default. This file LOADS fine (I see the characters in Excel same as I see in Notepad) but if I save it back unmodified then the character changes to ??. I think its because the std::string and std::ofstream classes are writing it incorrectly as simple ASCII stream.
I use the following syntax while reading the file to convert them to .NET strings:
%String(mystring.c_str());
and the following while converting them from .NET strings to std::strings while writing:
msclr::interop::marshal_as<std::string>(mydotnetstring)
The problem seems to me with encoding but I am not crystal clear on what exactly is happening. I want to understand WHY the file is READ CORRECTLY but not written correctly?
I have modified my application to read/write UTF-8 and that solves the problem but I still want to know the underlying problem.
Okay, I think I have found the underlying problem. The problem is that the msclr::interop::marshal_as< std::string > method calls WideCharToMultiByte API internally with CP_THREAD_ACP option which means that the CodePage of active THREAD is used. This .NET addin runs inside the Excel process and the current thread has a different CodePage (952 on Japanese system) than the Default CodePage (1252). I verified this by checking the return value of marshal_as call in a sample application vs the .NET addin on a Japanese machine. The sample application was converting a two Japanese character string to 4 bytes whereas the addin was just converting it to 2 unknown '?' bytes.
SOLUTION
marshal_as does not provide an option to change this option so the solution is to marshal .NET strings by directly using the WideCharToMultiByte API with CP_ACP option. It worked for me.

Which steps are needed for an Unicode program in C++

I want to write a C++ program which can support typing Unicode characters in text editors like LibreOffice, MS Office, Notepad, (because I'm a Vietnamese and my mother tongue language includes Unicode characters such as: đ, â, à ế, ẹ, ẻ, ...). That means when I use a text editor like those above or any applications which can support text editing such as Browsers (in address bar or search bar), Chat applications like Yahoo or Skype, ... and when I type a key or a group of keys in keyboard, my C++ program will notice that and convert it into Unicode character and send it back to text editor.
For example, when I type double 'e' key in text editor, C++ program with notice that and make it as 'ê' in text editor. Please tell me steps needed or mechanism to do a such application. I don't know where to start.
Use a solid library like Qt, wxWidgets, or if you don't need extra ballast, plain old ICU
As far as I understood you want to write an IME (input method editor). There are plenty of them available already for Vietnamese, supporting various input methods.
You did not specify the platform. However for both Windows and Linux there are quite a many Vietnamese IMEs available - practically all are open source for Linux, and Unikey, which to my knowledge is one of the most popular IMEs for Windows, is also an open source program, and thus would provide an easy start for hacking your own favourite options to an IME.

How to correctly display characters from different languages?

I am finishing application in Visual C++/Windows API and I am using MySql C Connector.
Whole application code uses ANSI, MySql C Connector is in ANSI too.
This program will be used on Polish and German computers with Windows XP/Vista/7 or 8.
I want to correcly display german umlauts and polish accent characters on:
DialogBox controls (strings are loaded from language files)
Generated XHTML documents
Strings retrieved from MySql database displayed on controls and in XHTML documents
I have heard about MultiByteToWideChar and Unicode functions (MessageBoxW etc.), but application code is nearly finished, converting is a lot of work...
How to make character encoding correctly with the least work and time?
Maybe changing system code page for non-Unicode program?
First, of course: what code set is MySQL returning? Or perhaps:
what code set was used when writing the data into the data base?
Other than that, I don't think you'll be able to avoid using
either wide characters or multibyte characters: for single byte
characters, German would use ISO 8859-1 (code page 1252) or
ISO 8859-15, Polish ISO 8859-2 (code page 1250). But what are
you doing with the characters in your own code? You may be able
to get away with UTF-8 (code page 65001), without many changes.
The real question is where the characters originally come from
(although it might not be too difficult to translate them into
UTF-8 immediately at the source); I don't think that Windows
respects the code page for input.
Although it doesn't help you much to know it, you're dealing
with an almost impossible problem, since so much depends on
things outside your program: things like the encoding of the
display font, or the keyboard driver, for example. In fact,
it's not rare for programs to display one thing on the screen,
and something different when outputting to the printer, or to
display one thing on the screen, but something different if the
data is written to a file, and read with another program. The
situation is improving—modern Unix and the Internet are
gradually (very gradually) standardizing on UTF-8, everywhere
and for everything, and Windows normally uses UTF-16 for
everything that is pure Windows (but needs to support UTF-8 for
the Internet). But even using the platform standard won't help
if the human client has installed (and is using) fonts which
don't have the characters you need.

Rendering unicode characters correctly on textbox

I am working on a translation application in which users are allowed to give English input and I need to convert to a target language and display on a text box. I am facing problems in displaying unicode characters.
Complex characters are not rendering correctly. I know windows uses Uniscribe for rendering complex characters. So do I need to use that explicitly to get the correct rendering? What is the equivalent of Uniscribe in LINUX and MAC?
I am using C++ with wxWidgets framework and trying to display unicode characters on a text box. Any help would be great!
Considering that Uniscribe support in wxWidgets was merely a Google Summer of code idea this year, it seems unlikely that it's working today.
There's no trivial Linux or Mac equivalent for Uniscribe
Read up on Pango. It's the library that supports full OpenType rendering on Linux. Mac's another story.