Language for non-unicode programs change ini reading - c++

I've a non-unicode application which is using unicode versions of the ini reading functions like GetPrivateProfileSectionW and GetPrivateProfileStringW. The program is working well when "Language for non-unicode programs" is set to English.
When I change this setting to Chinese (PRC), the functions GetPrivateProfileSectionW and GetPrivateProfileStringW return null.
I must keep this setting at Chinese, because when English is selected for "Language for non-unicode programs", CComBSTR.LoadString is not working as expected, it loads the Chinese characters in a resource DLL as question marks.
Any ideas?
Thanks.

Michael Kaplan explains. The solution is to use Unicode INI files, which don't depend on the "Language for non-unicode programs".

The "Language for non-Unicode programs" also selects the default code page used for files. US English is usually Windows-1252. Chinese will be something different, like GB2312 or GBK. Open your .INI file with Notepad and save it with the "ANSI" format, which will be whatever Microsoft's default for the non-Unicode language selected.

Related

MFC CEdit converts non-ascii characters to ascii

We have an MFC Windows Application, written originally in VC++ 6 and over the years updated for newer IDE, currently developed in VS2017.
The application is built with MBCS (not unicode). Trying to switch to Unicode causes 3806 compile errors, and that is probably just a tip of an iceberg.
However we want to be able to run the application with different code page, ie. 1250 (Central European).
I tried to build a small test application, and managed to get it to work with special characters (čćšđž). I did this by setting dialog font to Microsoft Sans Serif with code page 1250.
The same approach in our application does not work. Note: dialogs in our application are created dynamically, and font is set using SetFont.
There is a difference how the special characters are treated in these two applications.
In test application, the special characters are displayed in the edit control, and GetWindowsText retrieves the right bytes. However, trying to write some characters from other languages, renders them as "????".
In our application, all special characters are rendered properly, but GetWindowText (or WM_GETTEXT) convert the special characters to the similar ascii counterpart (čćđ -> ccd).
I believe that Edit control in our application displays Unicode text, but GetWindowText converts it to ascii.
Does anyone have any idea what is happening here, and how I might solve it?
Note: I know how to convert project to Unicode. We are choosing not to commit resources to it at the moment, as it would probably take weeks or months to implement. The question is how I might get it to work with MBSC and why is edit control converting Č to C.
I believe it is absolutely possible to port the application to other languages/codepages, you only need to modify the .rc (resource) files, basically having one resource file for each language, which you may rather want to do anyway, as strings in menus and/or string-tables would be in a different language. And this is actually the only change needed, as far as the application part is concerned.
The other part is the system you are running it on. A window can be unicode or non-unicode. You can see this with the Spyxx utility, it tells you whether a window (procedure) is unicode or not (Window properties, General tab). And while unicode windows do work properly, non-unicode ones have to change encoding from/to unicode and mbcs when getting or setting the text. The conversion is based on the system (default) code-page. This can only be set globally (for the whole machine), and not per application or window. And of course, setting the font's codepage is not enough (and imo it's not needed at all, if you are runnign the application on a machine with the "correct" codepage). That is, for non-unicode applications, only one codepage will be working properly, the others won't.
I can see two options:
If you only need to update a small number of controls, it may be possible to change only these controls to unicode, and use the "wide" versions of the get/set window-test functions or messages - you will have to convert the text between unicode and your desired codepage. It requires writing some code, but has the advantage of the conversion being independent from the system default codepage, eg you can have the codepage in some configuration file, in the registry, or as a command-line option (in the application's shortcut). Some control types can be changed to unicode, some others not, so pls check the documentation. Used this technique successfully for a mbcs application displaying/editing translated strings in many different languages, but I only had one control, a List-View, which btw offers the LVM_SETUNICODEFORMAT message, thus allowing for unicode texts, even in a mbcs application.
The easiest method is simply run the application as is, but it will only be working on machines with the proper default codepage, as most non-unicode applications do.
The system default codepage can be changed by setting the "Language for non-Unicode programs" option, available in the regional settings, Administrative tab, and requires a reboot. Changing the Windows UI language will change this option as well, but by setting this option you don't need to change the UI language, eg you can have English UI and East-European codepage.
See a very similar post here.
Late to the party:
In our application, all special characters are rendered properly, but GetWindowText (or WM_GETTEXT) convert the special characters to the similar ascii counterpart (čćđ -> ccd).
That sounds like the ES_OEMCONVERT flag has been set for the control:
Converts text entered in the edit control. The text is converted from the Windows character set to the OEM character set and then back to the Windows character set. This ensures proper character conversion when the application calls the CharToOem function to convert a Windows string in the edit control to OEM characters. This style is most useful for edit controls that contain file names that will be used on file systems that do not support Unicode.
To change this style after the control has been created, use SetWindowLong.

Qt internationalization from native language

I am going to write software in Qt. Its string literals should be written in native (non-English) language and they should support internationalization. The Qt docs advice to use tr() function for this http://doc.qt.io/qt-5/i18n-source-translation.html
So I try to write:
edit->setText(tr("Фильтр"));
and and I can see only question marks in running app
I replace it with QString::fromStdWString
edit->setText(QString::fromStdWString(L"Фильтр"));
and I can see correct text in my language
So the question is: How should I write non-ASCII strings to be able to correctly display them and translate using Qt Linguist
PS: I use UTF8 encoding for all source files, compiler is vs2013
PS2: I have found QTextCodec::setCodecForTr() function.. but It was removed from Qt 5.4
I think that the best option is to use some kind of Latin1 transliteration inside program source. Then, it's possible to implement both Russian and English versions as normal Qt translations.
BTW. It's possible, with some additional work, to use even plain numbers as translation-placeholders. Just like MFC did.
I found the strange solution for my problem:
By default VS saves files in UTF8 with BOM. In File -> Advanced save options I choose to save file in UTF without BOM and everything works like a charm:
edit->setText(tr("Фильтр"));
It looks like a VS compiler bug.. Interestingly MS claims that its compiler support Unicode only for UTF8 with BOM https://msdn.microsoft.com/en-us/library/xwy0e8f2.aspx
PS: length of "Фильтр" is 12 bytes, so it is really utf8 string

Which steps are needed for an Unicode program in C++

I want to write a C++ program which can support typing Unicode characters in text editors like LibreOffice, MS Office, Notepad, (because I'm a Vietnamese and my mother tongue language includes Unicode characters such as: đ, â, à ế, ẹ, ẻ, ...). That means when I use a text editor like those above or any applications which can support text editing such as Browsers (in address bar or search bar), Chat applications like Yahoo or Skype, ... and when I type a key or a group of keys in keyboard, my C++ program will notice that and convert it into Unicode character and send it back to text editor.
For example, when I type double 'e' key in text editor, C++ program with notice that and make it as 'ê' in text editor. Please tell me steps needed or mechanism to do a such application. I don't know where to start.
Use a solid library like Qt, wxWidgets, or if you don't need extra ballast, plain old ICU
As far as I understood you want to write an IME (input method editor). There are plenty of them available already for Vietnamese, supporting various input methods.
You did not specify the platform. However for both Windows and Linux there are quite a many Vietnamese IMEs available - practically all are open source for Linux, and Unikey, which to my knowledge is one of the most popular IMEs for Windows, is also an open source program, and thus would provide an easy start for hacking your own favourite options to an IME.

What is the native narrow string encoding on Windows?

The Subversion API has a number of functions for converting from "natively-encoded" strings to strings that are encoded in UTF-8. My question is: what is this native encoding on Windows? Does it depend on locale?
"Natively encoded" strings are strings written in whatever code page the user is using. That is, they are numbers that are translated to the appropriate glyphs based on the correct code page. Assuming the file was saved that way and not as a UTF-8 file.
This is a candidate question for Joel's article on Unicode.
Specifically:
Eventually this OEM free-for-all got
codified in the ANSI standard. In the
ANSI standard, everybody agreed on
what to do below 128, which was pretty
much the same as ASCII, but there were
lots of different ways to handle the
characters from 128 and on up,
depending on where you lived. These
different systems were called code
pages. So for example in Israel DOS
used a code page called 862, while
Greek users used 737. They were the
same below 128 but different from 128
up, where all the funny letters
resided. The national versions of
MS-DOS had dozens of these code pages,
handling everything from English to
Icelandic and they even had a few
"multilingual" code pages that could
do Esperanto and Galician on the same
computer! Wow! But getting, say,
Hebrew and Greek on the same computer
was a complete impossibility unless
you wrote your own custom program that
displayed everything using bitmapped
graphics, because Hebrew and Greek
required different code pages with
different interpretations of the high
numbers.
Windows 1252. Jukka Korpela has an excellent page on character encodings, with an extensive discussion of the Windows character set.
From the header svn_string.h you can see that the relevant svn_strings are just plain old const char* + a length element.
I would guess that the "natively encoded" svn strings are interpreted according to your system locale (I do not know this for sure, but this is the convention). On Windows 7 you can check your locale by selecting "Start-->Control Panel-->Region and Language-->Administrative-->Change system locale" where any value of English would probably entail the character encoding Windows 1252. However, a different system locale, for example Hebrew (Israel), would entail a different character encoding (Windows 1255 for the case of Hebrew).
Sadly the MSVC version of the C library does not support UTF-8 and uses legacy codepages only, but cygwin provides a UTF-8 locale as part of its emulation layer. If your svn is built on cygwin, you should be able to use UTF-8 just fine.

How does Windows identify non-Unicode applications?

I am building an MFC C++ application with "Use Unicode Character Set" selected in Visual Studio. I have UNICODE defined, my CStrings are 16-bit, I handle filenames with Japanese characters in them, etc. But, when I put Unicode strings containing Japanese characters in a CComboBox (using AddString), they show up as ?????.
I'm running Windows XP Professional x64 (in English). If I use Windows Control Panel Regional and Language Options, Advanced Tab, and set the Language for non-Unicode programs to Japanese, my combo box looks right.
So, I want my combo box to look right, and I want to understand why the "Language for non-Unicode programs" setting is changing the behavior of my Unicode program. Is there something else I should do to tell Windows my application is a Unicode application?
Thanks for any help!
Windows knows the difference between Unicode and non-Unicode programs by the functions they call. Most Windows API functions will come in two variants, one ending in A for non-Unicode and one ending in W for Unicode. The include files that define these functions will use the compiler settings to pick one or the other for you automatically.
The characters might not be coming out properly because you've selected a font that doesn't include them to be your default UI font.
Where do you get the strings from?
If they are hard-coded in your C sources, then at the time you call AddString they are (most likely) already damaged.
Nothing prevents one from taking some Unicode string, "squeeze" it in a std::string, for instance, and damage it. Even if the applications is compiled as Unicode.