In my application I have to query for System and User Locale and return them as strings(like en-US instead of language codes). I can get the LCID for these Languages from GetSystemDefaultLCID() and GetUserDefaultLCID() functions, but what I am struggling with is the conversion from LCID to Language strings.
My app has to run on Windows XP as well so I can't use the LCIDToLocaleName() Win API. The only I have been able to Get the locale name so far is to use the GetLocaleInfo() Win API by passing LOCALE_SYSTEM_DEFAULT and LOCALE_USER_DEFAULT as LCID and LOCALE_SISO639LANGNAME as LCType. My code looks somewhat like this to Query for System Locale
int localeBufferSize = GetLocaleInfo(LOCALE_SYSTEM_DEFAULT, LOCALE_SISO639LANGNAME, NULL, 0);
char *sysLocale = new char[localeBufferSize];
GetLocaleInfo(LOCALE_SYSTEM_DEFAULT, LOCALE_SISO639LANGNAME, sysLocale,localeBufferSize);
But the value I get is only the language name(only en of en-US). To get the full locale I have to call GetLocaleInfo() twice passing LOCALE_SISO3166CTRYNAME as the LCType the second time and then append the two values. Is there any better way to do this?
Sorry to say that, but no. Only the two (or more) letter codes are standardized, the concatenation of them are not. But! There is no need for the dynamic buffer allocation. Quoted from MSDN:
The maximum number of characters allowed for this string is nine,
including a terminating null character.
Reading this, it sounds overkill to dynamically allocate the required buffer, so if you really want to keep it extra-clean, you can do something like this:
char buf[19];
int ccBuf = GetLocaleInfo(LOCALE_SYSTEM_DEFAULT, LOCALE_SISO639LANGNAME, buf, 9);
buf[ccBuf++] = '-';
ccBuf += GetLocaleInfo(LOCALE_SYSTEM_DEFAULT, LOCALE_SISO3166CTRYNAME, buf+ccBuf, 9);
Related
My software supports multiple languages (English, German, Polish, Russian, ...). For this reason I have some language specific files with the dialog texts in the specific language (Encoded as UTF-8).
In my mfc application I open and read those files and insert the text into my AfxMessageBoxes and other UI-Windows.
// Get the codepage number. 65001 = UTF-8
// In the real code this is a parameter in the function I call (just for clarification)
LANGID languageID = 65001;
TCHAR szCodepage[10];
GetLocaleInfo (MAKELCID (languageID, SORT_DEFAULT), LOCALE_IDEFAULTANSICODEPAGE, szCodepage, 10);
int nAnsiCodePage = _ttoi (szCodepage);
// Open the file
CFile file;
CString filename = getName();
if (!file.Open(FileName, CFile::modeRead, NULL))
{
//Check if everything is fine, else break
}
// Read the file
CString inString;
int len = file.GetLength ();
UINT n = file.Read (inString.GetBuffer(len), len);
inString.ReleaseBuffer ();
int size = MultiByteToWideChar (CP_ACP, 0, strAllItems, -1, NULL, 0);
WCHAR *ubuf = new WCHAR[size + 1];
MultiByteToWideChar ((UINT) nAnsiCodePage, (nAnsiCodePage == CP_UTF8 ?
0 : MB_PRECOMPOSED), inString, -1, ubuf, (int) size);
outString = ubuf;
file.Close ();
Result:
This mechanism is working fine for special letters of russian and german, but not for polish. I already checked the utf-8 site (http://www.utf8-chartable.de/unicode-utf8-table.pl?number=1024) and the polish characters are part of it.
I also checked the hex values of my CString and everything seems to be alright, but it is not visualized in the correct way. Just for testing I changed the used codepage from utf-8 to 1250 (Eastern Europe, Polish included) and it also did not work.
What am I doing wrong?
EDIT:
When I use:
MultiByteToWideChar (CP_UTF8 , 0, inString, -1, ubuf, (int) size);
The hex-values are shortend to the "best match" letters. Meaning my result is: mezczyzna
I am using windows 7 with the english language selected.
Well, you have two options:
A. Make your application Unicode. You don't tell us whether it actually is, but I conclude it's not. This is the 'best" solution technically, but it may require a lot of effort, and it may even not be feasible at all (eg use of non-Unicode libraries).
B. If your app is non-Unicode, you have some limitations:
- Your application will only be capable of displaying correctly one codepage using the non-unicode APIs & messages, and this unfortunately cannot be set per application, it's globally set in Windows with the "Language for non-Unicode programs" option, and requires a reboot.
- To display correctly strings containing characters not in the default codepage, you need to convert them to Unicode and use the "wide" versions of APIs & messages explicitly, to display them (eg MessageBoxW()). A little cumbersome, but doable, if the operation concerns only a small number of controls.
The machine you're working on has some western european language as the "Language for non-Unicode programs", and I come to this conclusion because "This mechanism is working fine for special letters of russian and german" and "Using MessageBoxA(0, "mężczyzna", 0, 0) does not work", as you said (though i'm not sure at all about russian, as it's a different codepage).
Apart from this, as IInspectable said, int size = MultiByteToWideChar (CP_ACP, 0, strAllItems, -1, NULL, 0); makes not sense at all, as the string is known to be UTF-8, and not of the default codepage. You may also need to remove the UTF-8 BOM header, if your file contains it.
I am working in Windows,using vc++2010 and MFC, in Unicode.
Following is my code:
CStringW strLowerGreek(L"αβγ");
CStringW strUpperGreek(L"ΑΒΓ");
if (0 == strLowerGreek.CompareNoCase(strUpperGreek))
AfxMessageBox(_T("Two strings are the same"));
else
AfxMessageBox(_T("Two strings are different"));
When I run the code, pop-up message is "Two strings are different".
What I wanna do is to create αβγ.txt in C disk and ΑΒΓ.txt in D disk for example(I can't create αβγ.txt and ΑΒΓ.txt in the same disk), I think αβγ.txt and ΑΒΓ.txt have the same name(not the full path,just file name), then I wanna compare the two files contents, and check them if the same.So I need a function to compare the two files name ingoring case-sensitivity.
My question are:
According to the pop-up message, CString member function CompareNoCase don't support Greek lower case and upper case. So is there something can do the work in MFC or in C++?
Is there function or library support non-English (such as Greek, Russian) upper case and lower case comparison. I wanna ingore the case sensitivity when comparing two strings.
What you have to do is set the locale. Example:
setlocale(LC_ALL, "");
CStringW strLowerGreek(L"αβγ");
CStringW strUpperGreek(L"ΑΒΓ");
int i = strLowerGreek.CompareNoCase(strUpperGreek);
CStringW strText;
strText.Format(L"%s %s %d", strLowerGreek, strUpperGreek, i);
AfxMessageBox(strText);
CStringW strLowerGreek2 = strLowerGreek.MakeUpper();
i = strLowerGreek2.CompareNoCase(strUpperGreek);
strText.Format(L"%s %s %d", strLowerGreek2, strUpperGreek, i);
AfxMessageBox(strText);
I have used the setlocale to set the locale. Initially I set it to the Greek ("ell") locale, but any locale other than the C locale works. So the more generic setlocale(LC_ALL,""); can be used to set the current locale equal to the system locale.
When I leave it with the default locale (which is English):
The return value is 32.
The text is not converted to UPPERCASE.
When I set the locale first to Greek:
The return value is 0.
The text is converted to UPPERCASE.
Result of CompareNoCase:
Result of MakeUpper:
I thought setting the locale only affected things like date formatting. But it also affects string comparisons. See the _wcsicmp reference: Call setlocale with any locale other than the C locale before the call to _wcsicmp. for additional information.
Windows seems to keep track of at least four dimensions of "current locale":
http://www.siao2.com/2005/02/01/364707.aspx
DEFAULT USER LOCALE
DEFAULT SYSTEM LOCALE
DEFAULT USER INTERFACE LANGUAGE
DEFAULT INPUT LOCALE
My brain hurts just trying to keep track of what the hell four separate locale's are useful for...
However, I don't grok the relationship between code page and locale (or LCID, or Language ID), all of which appear to be different (e.g. Japanese (Japan) is LANGID = 0x411 location code 1, but the code page for Japan is 932).
How can I configure our application to use the user's desired language as the default MBCS target when converting between Unicode and narrow strings?
That is to say, we used to be an MBCS application. Then we switched to Unicode. Things work well in English, but fail in Asian languages, apparently because Windows conversion functions WideCharToMultiByte and MultiByteToWideChar take an explicit code page (not a locale ID or language ID), which can be set to CP_ACP (default to ANSI code page), but don't appear to have a value for "default to user's default interface language's code page".
I mean, this is some seriously convoluted twaddle. Four separate dimensions of "current language", three different identifier types, as well as (different) string-identifiers for C library and C++ standard library.
In our previous MBCS builds, disk I/O and user I/O worked correctly: everything remained in the DEFAULT SYSTEM LOCALE (Windows XP term: "Language for non-Unicode Programs"). But now, in our UNICODE builds, everything tries to use "C" as the locale, and file I/O fails to properly transcode UNICODE to user's locale, and vice verse.
We want to have text files written out (when narrow) using the current user's language's code page. And when read in, the current user's language's code page should be converted back to UNICODE.
Help!!!
Clarification: I would ideally like to use the MUI language code page rather than the OS default code page. GetACP() returns the system default code page, but I am unaware of a function that returns the user's chosen MUI language (which auto-reverts to system default if no MUI specified / installed).
I agree with the comments by Jon Trauntvein, the GetACP function does reflect the user's language settings in the control panel. Also, based on the link to the "sorting it all out" blog, that you provided, DEFAULT USER INTERFACE LANGUAGE is the language that the Windows user interface will use, which is not the same as the language to be used by programs.
However, if you really want to use DEFAULT USER INTERFACE LANGUAGE then you get it by calling GetUserDefaultUILanguage and then you can map the language id to a code page, using the following table.
Language Identifiers and Locales
You can also use the GetLocaleInfo function to do the mapping, but first you would have to convert the language id that you got from GetUserDefaultUILanguage into a locale id, and I think you will get the name of the code page instead of a numeric value, but you could try it and see.
If all you want to be able to do is configure a locale object to use the currently selected locale settings, you should be able to do something like this:
std::locale loc = std::locale("");
You can also access the current code page in windows using the Win32 ::GetACP() function. Here is an example that I implemented in a string class to append multi-byte characters to a unicode string:
void StrUni::append_mb(char const *buff, size_t buff_len)
{
UINT current_code_page = ::GetACP();
int space_needed;
if(buff_len == 0)
return;
space_needed = ::MultiByteToWideChar(
current_code_page,
MB_PRECOMPOSED | MB_ERR_INVALID_CHARS,
buff,
buff_len,
0,
0);
if(space_needed > 0)
{
reserve(this->buff_len + space_needed + 1);
MultiByteToWideChar(
current_code_page,
MB_PRECOMPOSED | MB_ERR_INVALID_CHARS,
buff,
buff_len,
storage + this->buff_len,
space_needed);
this->buff_len += space_needed;
terminate();
}
}
Just use CW2A() or CA2W() which will take care of the conversion for you using the current system locale (or language used for non-Unicode applications).
FWIW, this is what I ended up doing:
#define _CONVERSION_DONT_USE_THREAD_LOCALE // force CP_ACP *not* CP_THREAD_ACP for MFC CString auto-conveters!!!
In application startup, construct the desired locale: m_locale(FStringA(".%u", GetACP()).GetString(), LC_CTYPE)
force it to agree with GetACP(): // force C++ and C libraries based on setlocale() to use system locale for narrow strings
m_locale = ::std::locale::global(m_locale); // we store the previous global so we can restore before termination to avoid memory loss
This gives me relatively ideal use of MFC's built-in narrow<->wide conversions in CString to automatically use the user's default language when converting to or from MBCS strings for the current locale.
Note: m_locale is type ::std::locale
I'm getting a value from the registry. This value might have double byte characters in it.
I will later have to transfer this across the network to a C# client to display. C# is all unicode.
The function returns MBCS if you call it non-unicode.
What should I use?
string result = string(cbData);
RegQueryValueExA(h_sub_key, "DisplayName", NULL, NULL, (LPBYTE) &result[0], &cbData)
or
string result = string(cbData);
RegQueryValueExW(h_sub_key, L"DisplayName", NULL, NULL, (LPBYTE) &result[0], &cbData)
Using Unicode whenever possible will make your life easier. The registry contains Unicode natively and converts to MBCS on the fly when you use ReqQueryValueExA, why would you want to do an unneeded conversion?
Converting to UTF-8 from UTF-16 might make sense for information going over the network, but if you control both ends of the connection it wouldn't be necessary.
No, that's not the way it works. The string you get back from the first snippet is encoded according to the current system code page. Could be a double-byte encoding. Could be anything. Big problem of course, the C# code on the other end of that Internet connection has no way to guess what the code page might be.
So do not use the first snippet. The second one gets you the string in utf16, the native encoding used in Windows, result needs to be an std::wstring. Also the encoding used by C# so you could send the binary string. Although that's not typically a good idea, xml is popular. It is up to you to set the wire format.
I need to grab the path from the registry. The following code works except for the last part where I'm storing the path to the string. Running the debugger in Visual Studio 2008 the char array has the path, but every other character is a zero. This results in the string only being assigned the first letter. I've tried changing char res[1024] to char *res = new char[1024] and this just makes it store the first letter in the char array instead of the string. The rest of the program needs the path as a string datatype so it can't stay as a char array. What am I missing here?
unsigned long type=REG_SZ, size=1024;
string path;
char res[1024];
HKEY key;
if (RegOpenKeyEx(HKEY_LOCAL_MACHINE, _T("SOFTWARE\\Classes\\dsn\\shell\\open\\command"), NULL, KEY_READ, &key)==ERROR_SUCCESS){
RegQueryValueEx(key,
NULL,// YOUR value
NULL,
&type,
(LPBYTE)res,
&size);
RegCloseKey(key);
path = string(res);
}
You're getting back a Unicode string, but assigning it to a char-based string.
You could switch path's class to being a 'tstring' or 'wstring', or use RegQueryValueExA (A for ASCII).
You are compiling in Unicode. Go to Project Settings>Configuration Properties>General and change "Character Set" to "Not Set", and rebuild your project.
RegOpenKey is actually a macro defined in the WINAPI headers. If Unicode is enabled, it resolves to RegOpenKeyW, if not then it resolves to RegOpenKeyA. If you want to continue to compile under unicode, then you can just call RetgOpenKeyA directly instead of using the macro.
Otherwise, you'll need to deal with Unicode strings which, if needed, we can help you with also.
For C++, you may prefer to access the Registry using the ATL helper class CRegKey. The method for storing string values is QueryStringValue. There are other (somewhat) typesafe methods for retrieving and setting different registry value types.
It's not the best C++ interface (eg. no std::string support) but a little smoother than native Win32.