How can I implement multi-language support in my C++ project? - c++

I'll give a small explanation of my project and my problem:
I have a big C++ project (Old project) which is built on WINAPI with MFC/ATL and DirectX 9 (Windows).
My project is separated to 8 solutions which are 7 Servers and a Client.
The Project uses CString, TCHAR and char*, uses Multi-Byte Character Set.
I would like to add RTL and LTR support for multi language purpose.
My problem:
The source uses CodePage
The source uses CString class for String manipulations.
there is a chat system which add a character at a time to CString class
there are functions like:
void func(TCHAR* str)
{ int x = strlen(str); }
if I type in Hebrew inside client (I set CodePage to 1255), Client render the text as Hebrew, but the string itself shows as
My questions:
How can I get rid of the codepage so I can use multi languages freely?
I was able to fix RTL by playing with Offset, but when there is a mix of RTL and LTR in the same string I have a problem, how can I fix that ?
How can I fix the string watch in the debugger to display text as the right language?
Things worth mentioning:
I am aware that it's a difficult and huge task to do.
I know there is a lib for that called ICU, but I don't know if it's even possible to implement and use with the current situation, if you can explain to me how to implement it, I would be appreciate it.
Thank you for helping me!

I was able to fix RTL by playing with Offset, but when there is a mix of RTL and LTR in the same string I have a problem, how can I fix that ?
The RTL setting is applied to the entire control, as far as I know.
One way to mix those is to render HTML, as in here: https://www.w3.org/International/questions/qa-html-dir.
RichEdit control?

Related

MFC CEdit converts non-ascii characters to ascii

We have an MFC Windows Application, written originally in VC++ 6 and over the years updated for newer IDE, currently developed in VS2017.
The application is built with MBCS (not unicode). Trying to switch to Unicode causes 3806 compile errors, and that is probably just a tip of an iceberg.
However we want to be able to run the application with different code page, ie. 1250 (Central European).
I tried to build a small test application, and managed to get it to work with special characters (čćšđž). I did this by setting dialog font to Microsoft Sans Serif with code page 1250.
The same approach in our application does not work. Note: dialogs in our application are created dynamically, and font is set using SetFont.
There is a difference how the special characters are treated in these two applications.
In test application, the special characters are displayed in the edit control, and GetWindowsText retrieves the right bytes. However, trying to write some characters from other languages, renders them as "????".
In our application, all special characters are rendered properly, but GetWindowText (or WM_GETTEXT) convert the special characters to the similar ascii counterpart (čćđ -> ccd).
I believe that Edit control in our application displays Unicode text, but GetWindowText converts it to ascii.
Does anyone have any idea what is happening here, and how I might solve it?
Note: I know how to convert project to Unicode. We are choosing not to commit resources to it at the moment, as it would probably take weeks or months to implement. The question is how I might get it to work with MBSC and why is edit control converting Č to C.
I believe it is absolutely possible to port the application to other languages/codepages, you only need to modify the .rc (resource) files, basically having one resource file for each language, which you may rather want to do anyway, as strings in menus and/or string-tables would be in a different language. And this is actually the only change needed, as far as the application part is concerned.
The other part is the system you are running it on. A window can be unicode or non-unicode. You can see this with the Spyxx utility, it tells you whether a window (procedure) is unicode or not (Window properties, General tab). And while unicode windows do work properly, non-unicode ones have to change encoding from/to unicode and mbcs when getting or setting the text. The conversion is based on the system (default) code-page. This can only be set globally (for the whole machine), and not per application or window. And of course, setting the font's codepage is not enough (and imo it's not needed at all, if you are runnign the application on a machine with the "correct" codepage). That is, for non-unicode applications, only one codepage will be working properly, the others won't.
I can see two options:
If you only need to update a small number of controls, it may be possible to change only these controls to unicode, and use the "wide" versions of the get/set window-test functions or messages - you will have to convert the text between unicode and your desired codepage. It requires writing some code, but has the advantage of the conversion being independent from the system default codepage, eg you can have the codepage in some configuration file, in the registry, or as a command-line option (in the application's shortcut). Some control types can be changed to unicode, some others not, so pls check the documentation. Used this technique successfully for a mbcs application displaying/editing translated strings in many different languages, but I only had one control, a List-View, which btw offers the LVM_SETUNICODEFORMAT message, thus allowing for unicode texts, even in a mbcs application.
The easiest method is simply run the application as is, but it will only be working on machines with the proper default codepage, as most non-unicode applications do.
The system default codepage can be changed by setting the "Language for non-Unicode programs" option, available in the regional settings, Administrative tab, and requires a reboot. Changing the Windows UI language will change this option as well, but by setting this option you don't need to change the UI language, eg you can have English UI and East-European codepage.
See a very similar post here.
Late to the party:
In our application, all special characters are rendered properly, but GetWindowText (or WM_GETTEXT) convert the special characters to the similar ascii counterpart (čćđ -> ccd).
That sounds like the ES_OEMCONVERT flag has been set for the control:
Converts text entered in the edit control. The text is converted from the Windows character set to the OEM character set and then back to the Windows character set. This ensures proper character conversion when the application calls the CharToOem function to convert a Windows string in the edit control to OEM characters. This style is most useful for edit controls that contain file names that will be used on file systems that do not support Unicode.
To change this style after the control has been created, use SetWindowLong.

Qt C++ write to / read from "IBM037 / CP037"

I was looking for a way to write to and read from IBM037 encoding in Qt. I was able to achieve that in C# by using
Encoding.GetEncoding("IBM037")
However, I am currently porting an application from C# to C++ using Qt, and I wasn't able to find a way to do so.
Thanks in advance.
Edit: I am aware of QTextCodec but it does not contain a definition for IBM 037. Using it returns a normal text (non-encoded).
You can implement your own class derived from QTextCodec and use tables (like the ones available here ) to perform the translation character by character.
As suggested in the comments, check what stated in the QTextCodec documentation here.
With tables like these you can translate to ASCII 8 bit. Then convert the ASCII characters to Unicode using the functions already provided by the Qt framework.

Visual C++/MFC: getting Japanese characters to work without UNICODE

I have software originally developed 20 years ago in Visual C++ using MFC without UNICODE. Currently strings are held either in char[] or CString, and it works on English and Japanese Windows PCs until Japanese characters are used, as these tend to get converted to strange characters or empty boxes.
Setting UNICODE is presumably the way forward but will require a massive code change, whereas quite a lot seems to work simply by setting System Locale to Japan (in “Window’s Language for non-Unicode programs” setting). I have no idea how Windows does this, but some Japanese character things now work on my English Windows PC, e.g. I can open and save Japanese filenames with no code changes. And in Japan they set System Locale to English and again much works, but not everything.
I get the impression the problems are due to using a font that doesn’t include Japanese characters. Currently I am using Arial / MS Sans Serif and charset set to ANSI_CHARSET or DEFAULT_CHARSET. Is there a different font I should be using, or can I extend these fonts to include Japanese characters? Or am I barking up the wrong tree in which case what do I do next? Am very new to all this unfortunately …
That's a common question (OK I guess not so common any more in 2015, as MBCS programs luckily are a dying breed - I still maintain several though...)
Either way, I'm afraid that, depending on your definition of 'working', to get this working you'll have to bite the bullet and convert to a Unicode build. If you can't make a business case for that, then you'll have to set the right locale (well, worse, have the user set the 'right' one) and test what works and what doesn't, and ask more specific questions on what doesn't.
If your goal is to make one application that correctly displays strings in various encodings in the 'right' way regardless of the locale settings on the computer, and compatible with every input data set / database content without the user having to be aware of encoding issues, then you're out of luck with an MBCS build.
The font missing characters is most likely not the problem. Before you go any further and/or ask further questions, you should read http://www.joelonsoftware.com/articles/Unicode.html, read it again, sleep on it, read it again, explain to somebody else what the relationship is between 'encoding', 'locale', 'character set', 'font' and 'Unicode code point', because only after you can do that, you can decide on how to progress with your application. Sorry, it's not what you want to hear, but it's the reality if you've been tasked with handling internationalization.

Which steps are needed for an Unicode program in C++

I want to write a C++ program which can support typing Unicode characters in text editors like LibreOffice, MS Office, Notepad, (because I'm a Vietnamese and my mother tongue language includes Unicode characters such as: đ, â, à ế, ẹ, ẻ, ...). That means when I use a text editor like those above or any applications which can support text editing such as Browsers (in address bar or search bar), Chat applications like Yahoo or Skype, ... and when I type a key or a group of keys in keyboard, my C++ program will notice that and convert it into Unicode character and send it back to text editor.
For example, when I type double 'e' key in text editor, C++ program with notice that and make it as 'ê' in text editor. Please tell me steps needed or mechanism to do a such application. I don't know where to start.
Use a solid library like Qt, wxWidgets, or if you don't need extra ballast, plain old ICU
As far as I understood you want to write an IME (input method editor). There are plenty of them available already for Vietnamese, supporting various input methods.
You did not specify the platform. However for both Windows and Linux there are quite a many Vietnamese IMEs available - practically all are open source for Linux, and Unikey, which to my knowledge is one of the most popular IMEs for Windows, is also an open source program, and thus would provide an easy start for hacking your own favourite options to an IME.

Rendering unicode characters correctly on textbox

I am working on a translation application in which users are allowed to give English input and I need to convert to a target language and display on a text box. I am facing problems in displaying unicode characters.
Complex characters are not rendering correctly. I know windows uses Uniscribe for rendering complex characters. So do I need to use that explicitly to get the correct rendering? What is the equivalent of Uniscribe in LINUX and MAC?
I am using C++ with wxWidgets framework and trying to display unicode characters on a text box. Any help would be great!
Considering that Uniscribe support in wxWidgets was merely a Google Summer of code idea this year, it seems unlikely that it's working today.
There's no trivial Linux or Mac equivalent for Uniscribe
Read up on Pango. It's the library that supports full OpenType rendering on Linux. Mac's another story.