Windows active code page - c++

I have a certain library (IBM's WebSphere MQ) which I'm using, with an API that is suppose to return a remote servers character set.
After some debugging, it seems as though the return value of this function call returns the active code page of my machine. I saw this by looking at the return value of the function call and the result of running chcp in a command line - both returned 862. When I changed the language in the Control Panel->Regional and Language Options->Advanced tab to something else, both values changed again, which verified my suspicion.
My question is, what is the value that chcp returns? What Win32 API gets/sets it? How does it relate to locales? (trying to change the global locale in a C++ application using std::locale::global had no impact on it apparently).

CHCP returns the OEM codepage (OEMCP). The API is Get/SetConsoleCP.
You can set the C++ locale to ".OCP" to match this locale.

Locales mostly identifies languages, and given that historically there are not so many codepages (many languages' alphabet differ not so greatly from 26-Latin), several languages can be "mapped" to the same codepage. As I remember, there is no direct converstion function, but I made it with statistical approach:
For any given locale I collected this languages words I can obtain from the system (LOCALE_SMONTHNAME1..LOCALE_SMONTHNAME12, LOCALE_SNATIVELANGNAME etc) in Unicode
I called WideCharToMultiByte function for every string trying to convert them to this codepage one-byte encoding
WideCharToMultiByte(CodePage, CP_ACP or WC_NO_BEST_FIT_CHARS, ..., #DefChar, #DefUsed);
If DefUsed was set during the process, that it basically meant that this languages is not compatible with this codepage.

Related

Linux wide string to multibyte issue

I know there are many questions already asked on this topic but I am facing a very unusual situation here.
I am working in Centos. My application reads some data in wchar_t and converts in multibyte (UTF-8 encoding) and fills the char buffer in a google proto message and sends to another application.
The other application converts it again to wide string and displays it to user. I am using wcstombs for the conversion. My locale is "en_US.UTF-8".
For some strings it is working fine. I am facing issue in one particular wide string (maybe there are several others) in which wcstombs returns -1. Error number is set to 84 (Invalid or incomplete multibyte or wide character).
The problem is, when I am running my application through eclipse, the conversion is successful but when my application is run from root (as a service), the conversion fails.
Same string conversion is successful in windows using widechartomultibyte API.
I am not able to understand why this is happening.
Hope the experts can help me out.
EDIT
My Wide string is L"\006£æ?Jÿ" which when converted and displayed to user becomes as the image
L"\006" doesn't appear to be a valid unicode string (neither in UTF-16 nor in UTF-32). I agree with wcstombs, there's no corresponding UTF-8 sequence.
I suspect you didn't use WC_ERR_INVALID_CHARS on Windows. That would catch the same error.

Change Console Code Page in Windows C++

I'm trying to output UTF8 characters in the Windows command line. I can't seem to get the function, setConsoleOutputCP to work. I also heard that you had to change the font to "Lucida Grande" for it to work but I can't get that working either. Can someone please provide me with a short example of how to use these functions to correctly output UTF-8 characters to the console?
Also I heard that those functions don't work in Windows XP, is there a better alternative to those functions which will work in Windows XP?
[I know this question is old and was about Windows XP, but it still seemed like a good place to drop this information so I (and maybe others) can find it again in the future.]
Support for Unicode in CMD windows has improved in newer versions of Windows. This program will work on Windows 10.
#include <iostream>
#include <Windows.h>
class UTF8CodePage {
public:
UTF8CodePage() : m_old_code_page(::GetConsoleOutputCP()) {
::SetConsoleOutputCP(CP_UTF8);
}
~UTF8CodePage() { ::SetConsoleOutputCP(m_old_code_page); }
private:
UINT m_old_code_page;
};
int main() {
UTF8CodePage use_utf8;
const char *text = u8"This text is in UTF-8. ¡Olé! 佻\n";
std::cout << text;
return 0;
}
I made an RAII class to ensure the code page is restored because it would be rude to leave the code page changed if the user had purposely selected a specific one. All the Windows-specific code (SetConsoleOutputCP) is contained within that class. The definition of the use_utf8 variable in main changes the code page to UTF-8, and that code page will stay in effect until the variable is destructed at the end of the scope.
Note that I used the u8 prefix on the string literal, which is a newer feature of C++ to ensure that the string is encoded using UTF-8 regardless of the encoding used for the source file. You don't have to use that feature if you have another way to make a string of valid UTF-8 text.
You still have to be sure that the CMD window is using a font that supports the glyphs you need. I don't think there's a way to get font linking automatically.
But this will at least show a the replacement character if the font is missing the glyph. For example, on my window, the ¡Olé! looks right but the CJK glyph is shown approximately like �. If the user copies that replacement character, the clipboard will receive the original glyph, so they can paste it into other programs without any loss of fidelity.
Note that command line parameters you get from main's argv will be in the original code page. One way to work around this is to get the unconverted "wide" command line with GetCommandLineW, convert it to UTF-8 with WideToMultibyte, and then parse it yourself. Alternatively, you can pass the result of GetCommandLineW to CommandLineToArgvW, which will parse it, and then you'd convert each argument to UTF-8.
Finally, note that changing the code page affects only the output. If you input text from the user, it arrives encoded using the original code page (often called the OEM code page).
TODO: Figure out input. SetConsoleCP isn't doing what I think the documentation says it should do.
Windows console doesn't play nice with UNICODE and particularly with UTF-8.
Setting a console code page to utf-8 won't work.
One approach is to use WideCharToMultiByte() (or something else) to convert the text to UTF-16, then MultiByteToWideChar() (or something else) to convert to a localised ISO encoding. The set the console code page to the ISO code page.
Its ugly, but it sort of works.
In short: SetConsoleOutputCP CP_UTF8 and cout/wcout dont work together by default.
Though windows CRT supports utf-8 output, a robust way to output to console utf-8 chars is to convert them into a console current codepage, especially if you want to use count/wcout.
Standard high level functions of basic_ostream does not work properly with utf-8 by default.
I've seen usage of MultiByteToWideChar and WideCharToMultiByte with CP_OEMCP and CP_UTF8 parameters.
You may setup your application environment, including console font via SetCurrentConsoleFontEx but it works only from Vista and Server 2008.
Also, check this about cout and console.
_setmode and wprintf works together as well, but this may lead to crash for non-wide char functions.
The problem occurs because there is a difference of codepage that uses windows in your console with the encoding of your source code text file.
Qt uses utf-8 by default, but another editor can use another one. So you must to verify which one you're using.
To change to utf-8 use:
#include <windows.h>
SetConsoleOutputCP(CP_UTF8);

MFC Dialog Data Exchange (DDX) comma instead point as decimal

To initialize the controls in my dialogs and to gather user input, I'm using DDX. How can I change the program to display float numbers with a comma instead of a point (best without changing the locale)?
The program has the "C" locale set, if I change the locale, I have to take care on every atof, sprintf operation (the library for get-/setting the float numbers, in the underlying mysql database, expects strings with the decimal as point).
So far, I only think of changing the locale and then use stringstream with imbue (found here), but maybe there's a chance without changing the locale.
Thanks for your help!
This is a locale specific thing you probably will need to handle the changing of it using locale.
Note that DDX is for initializing control objects so that your control variable member declarations stay in sync with the values you chose in your resource file or whatever you did when initializing the dialog the controls reside on.
Edit: Some controls like CComboBox and CListBox have a SetLocale method but I've never used it so not sure how well it works and it's not available on all controls.

Is there a way to access the locale used by gettext under windows?

I have a program where i18n is handled by gettext. The program works fine, however for some reason I need to know the name of the locale used by gettext at runtime (something like 'fr_FR') under win32.
I looked into gettext sources, and there is a quite frightening function that computes it on all platforms (gl_locale_name, in a C file called "localename.h/c"). However, this file does not seem to be installed alongside gettext or libintl, so I can't seem to call the function. Is there another function provided by gettext to get this value ? Or in another package (boost, glib, anything ?)
(On a related note, there is a thing called std::locale in the C++ standard library, and according to the doc calling std::locale("") should create a locale with the settings of the system, unless I am mistaken ... but then the name is 'C' under windows. Is it a viable way of getting the locale name ? What I am doing wrong ?)
On Windows typically used function GetUserDefaultLCID which returns you integer value of locale identifier. To convert from LCID to string like 'fr_FR' you need to map it based on the info from http://msdn.microsoft.com/en-us/library/ms776260
Turns out the "gl_locale_name" function was not part of gettext directly, but rather part of gnulib - http://www.gnu.org/software/gnulib. I just discovered the package today.
So getting the infamous localename.h header in my project was a matter of
gnulib-tool --import localename
Then the gl_locale_name function works just fine when cross-compiling.
Thanks to everyone for the answers !
You can use setlocale(NULL) to pull the locale from the CRT. But from Windows, I've got no idea. Also, gettext is a pretty generic function and you're going to have to be more specific about what gettext.

Multi-byte character set in MFC application

I have a MFC application in which I want to add internationalization support. The project is configured to use the "multi-byte character set" (the "unicode character set" is not an option in my situation).
Now, I would expect the CWnd::OnChar() function to send me multi-byte characters if I set my keyboard to some foreign language, but it doesn't seem to work that way. The OnChar() function always sends me a 1-byte character in its nChar variable.
I thought that the _getmbcp() function would give me the current code page for the application, but this function always return 0.
Any advice would be appreciated.
And help here? Multibyte Functions in Microsoft C Run-time
As far as changing the default code page:
The default code page for a user (for WinXP - not sure how it is on Vista) is set in the "Regional and Languages options" Control Panel applet on the "Advanced" tab.
The "Language for non-Unicode programs" sets the default code page for the current user. Unfortunately it does not actually tell you the codepage number it's configuring - it just gives the language (which might be further specified with a region variant). This meakes sense from an end-user perspective, because I think codepage numbers have no meaning to 99.999% of end users. You need to reboot for a change to take effect. If you use regmon to determine what changes you could probably come up with something that specifies the default codepage somewhat easier.
Microsoft also has an unsupported utility, AppLocale, for testing localization that changes the codepage for particular applications: http://www.microsoft.com/globaldev/tools/apploc.mspx
Also you can change the code page for a thread by calling SetThreadLocale() - but you also have to call the C runtime's setlocale() function because some CRT functions don't talk to the Win API locale functions (and vice versa). See "Windows SetThreadLocale and CRT setlocale" by Chris Grimes for details.
As always in non Unicode scenarios, you'll get a reliable result only if the system locale (aka in Control Panel "language for non-unicode applications") is set accordingly. If not, don't expect anything good.
For example, if system locale is Chinese Traditional, you'll receive 2 successive WM_CHAR messages (One for each byte, assuming user composed a 2-char character).
isleadbyte() should help you determine if a 2nd byte is coming soon.
If your system locale is NOT set to Chinese, don't expect to receive correct messages even iusing a Chinese keyboard/IME. The misleading part is that some scenarios work. e.g. using a Greek keyboard, you'll receive WM_CHAR char values based of the Greek codepage even if your system locale is Latin-based. But you should really stay away from trying to cope with such scenarios: Success is not guaranteed and will likely vary according to Windows version and locale.
As MikeB wrote, MS AppLocale is your friend to make basic tests.
[ad] and appTranslator is your friend if you need to translate your UI [/ad]
For _getmbcp, MSDN says "A return value of 0 indicates that a single byte code page is in use." That sure makes it not very useful. Try one of these: GetUserDefaultLCID GetSystemDefaultLCID GetACP. (Now why is there no "user" equivalent for GetACP?)
Anyway if you want _getmbcp to return an actual value then set your system default language to Chinese, Japanese, or Korean.
There is actually a very simple (but weird) way of force the OnChar function to send unicode characters to the application even if it's configured in multi-byte character set:
SetWindowLongW( m_hWnd, GWL_WNDPROC, GetWindowLong( m_hWnd, GWL_WNDPROC ) );
Simply by calling the unicode version of "SetWindowLong", it forces the application to receive unicode characters.