How to get current locale of my environment? - c++

Had tried following code in Linux, but always return 'C' under different LANG settings.
#include <iostream>
#include <locale.h>
#include <locale>
using namespace std;
int main()
{
cout<<"locale 1: "<<setlocale(LC_ALL, NULL)<<endl;
cout<<"locale 2: "<<setlocale(LC_CTYPE, NULL)<<endl;
locale l;
cout<<"locale 3: "<<l.name()<<endl;
}
$ ./a.out
locale 1: C
locale 2: C
locale 3: C
$
$ export LANG=zh_CN.UTF-8
$ ./a.out
locale 1: C
locale 2: C
locale 3: C
What should I do to get current locale setting in Linux(like Ubuntu)?
Another question is, is it the same way to get locale in Windows?

From man 3 setlocale (New maxim: "When in doubt, read the entire manpage."):
If locale is "", each part of the locale that should be modified is set according to the environment variables.
So, we can read the environment variables by calling setlocale at the beginning of the program, as follows:
#include <iostream>
#include <locale.h>
using namespace std;
int main()
{
setlocale(LC_ALL, "");
cout << "LC_ALL: " << setlocale(LC_ALL, NULL) << endl;
cout << "LC_CTYPE: " << setlocale(LC_CTYPE, NULL) << endl;
return 0;
}
My system does not support the zh_CN locale, as the following output reveals:
$ ./a.out
LC_ALL: en_US.utf8
LC_CTYPE: en_US.utf8
$ export LANG=zh_CN.UTF-8
$ ./a.out
LC_ALL: C
LC_CTYPE: C
Windows: I have no idea about Windows locales. I suggest starting with an MSDN search, and then opening a separate Stack Overflow question if you still have questions.

Just figured out how to get locale by C++, simply use an empty string "" to construct std::locale, which does the same thing as setlocale(LC_ALL, "").
locale l("");
cout<<"Locale by C++: "<<l.name()<<endl;
This link described differences in details between C locale and C++ locale.

For Windows use the following code:
LCID lcid = GetThreadLocale();
wchar_t name[LOCALE_NAME_MAX_LENGTH];
if (LCIDToLocaleName(lcid, name, LOCALE_NAME_MAX_LENGTH, 0) == 0)
error(GetLastError());
std::wcout << L"Locale name = " << name << std::endl;
This is going to print something like "en-US".
To purge sublanguage information use the following:
wchar_t parentLocateName[LOCALE_NAME_MAX_LENGTH];
if (GetLocaleInfoEx(name, LOCALE_SPARENT, parentLocateName, LOCALE_NAME_MAX_LENGTH) == 0)
error(GetLastError());
std::wcout << L"parentLocateName = " << parentLocateName << std::endl;
This will give you just "en".

A good alternative to consider instead of std::locale is boost::locale which is capable of returning more reliable information - see http://www.boost.org/doc/libs/1_52_0/libs/locale/doc/html/locale_information.html
boost::locale::info has the following member functions:
std::string name() -- the full name of the locale, for example en_US.UTF-8
std::string language() -- the ISO-639 language code of the current locale, for example "en".
std::string country() -- the ISO-3199 country code of the current locale, for example "US".
std::string variant() -- the variant of current locale, for example "euro".
std::string encoding() -- the encoding used for char based strings, for example "UTF-8"
bool utf8() -- a fast way to check whether the encoding is UTF-8.

The default constructor of std::locale creates a copy of the global C++ locale.
So to get the name of the current locale:
std::cout << std::locale().name() << '\n';

Related

Which characters are transformed by toupper() in the default C++ locale?

With the default "C" locale only a-z get transformed by std::toupper() as is documented for example here. Which characters exactly get transformed by std::ctype<CharT>::toupper() with the default C++ locale?
I'm asking because std::toupper(L'ω', std::locale::classic()) returns L'Ω' on Windows and I'm wondering for which other characters the C++ locale also returns an upper case form. In the "C" locale the same character is not transformed: static_cast<wchar_t>(std::towupper(static_cast<std::wint_t>(L'ω'))) returns L'ω' as expected.
I used the following program to verify this:
#include <cwctype>
#include <fstream>
#include <locale>
int main()
{
std::wofstream fs("out.txt");
fs.imbue(std::locale("en_US.UTF8"));
fs << L"std::toupper(L'ω', std::locale::classic()): " << std::toupper(L'ω', std::locale::classic()) << std::endl;
fs << L"static_cast<wchar_t>(std::towupper(static_cast<std::wint_t>(L'ω'))): "
<< static_cast<wchar_t>(std::towupper(static_cast<std::wint_t>(L'ω'))) << std::endl;
return 0;
}
Content of out.txt when compiled with Visual Studio 2019 (save source file with UTF-8 encoding and add compiler switch /utf-8) and executed on Windows 10:
std::toupper(L'ω', std::locale::classic()): Ω
static_cast<wchar_t>(std::towupper(static_cast<std::wint_t>(L'ω'))): ω
Output with gcc version 8.4.0 (Ubuntu 8.4.0-1ubuntu1~18.04):
std::toupper(L'ω', std::locale::classic()): ω
static_cast<wchar_t>(std::towupper(static_cast<std::wint_t>(L'ω'))): ω

Multiple calls to setlocale

I am trying to figure out how Unicode is supported in C++.
When I want to output multilingual text to console, I call std::setlocale. However I noticed that the result depends on prior calls to setlocale.
Consider the following example. If run without arguments it calls setlocale once, otherwise it makes a prior call to setlocale to get the value of current locale and restore it at the end of the function.
#include <iostream>
#include <locale>
using namespace std;
int main(int argc, char** argv)
{
char *current_locale = 0;
if (argc > 1) {
current_locale = setlocale(LC_ALL, NULL);
wcout << L"Current output locale: " << current_locale << endl;
}
char* new_locale = setlocale(LC_ALL, "ru_RU.UTF8");
if (! new_locale)
wcout << L"failed to set new locale" << endl;
else
wcout << L"new locale: " << new_locale << endl;
wcout << L"Привет!" << endl;
if (current_locale) setlocale(LC_ALL, current_locale);
return 0;
}
The output is different:
:~> ./check_locale
new locale: ru_RU.UTF8
Привет!
:~> ./check_locale 1
Current output locale: C
new locale: ru_RU.UTF8
??????!
Is there something that setlocale(LC_ALL, NULL) does that needs to be taken care of in future setlocale calls?
The compiler is g++ 7.5.0 or clang++ 7.0.1. And the console is a linux console in a graphical terminal.
More details on the system config: OpenSUSE 15.1, linux 4.12, glibc 2.26, libstdc++6-10.2.1
Is there something that setlocale(LC_ALL, NULL) does that needs to be taken care of in future setlocale calls?
No, setlocale(..., NULL) does not modify the current locale. The following code is fine:
setlocale(LC_ALL, NULL);
setlocale(LC_ALL, "ru_RU.UTF8");
wprintf(L"Привет!\n");
However the following code will fail:
wprintf(L"anything"); // or even just `fwide(stdout, 1);`
setlocale(LC_ALL, "ru_RU.UTF8");
wprintf(L"Привет!\n");
The problem is that stream has it's own locale that is determined at the point the stream orientation is changed to wide.
// here stdout has no orientation and no locale associated with it
wprintf(L"anything");
// `stdout` stream orientation switches to wide stream
// current locale is used - `stdout` has C locale
setlocale(LC_ALL, "ru_RU.UTF8");
wprintf(L"Привет!\n");
// `stdout` is wide oriented
// current locale is ru_RU.UTF-8
// __but__ the locale of `stdout` is still C and cannot be changed!
The only documentation I found of this gnu.org Stream and I18N emphasis mine:
Since a stream is created in the unoriented state it has at that point no conversion associated with it. The conversion which will be used is determined by the LC_CTYPE category selected at the time the stream is oriented. If the locales are changed at the runtime this might produce surprising results unless one pays attention. This is just another good reason to orient the stream explicitly as soon as possible, perhaps with a call to fwide.
You can:
Use separate locale for C++ stream and C FILE (see here):
std::ios_base::sync_with_stdio(false);
std::wcout.imbue(std::locale("ru_RU.utf8"));
Reopen stdout:
wprintf(L""); // stdout has C locale
char* new_locale = setlocale(LC_ALL, "ru_RU.UTF8");
freopen("/dev/stdout", "w", stdout); // stdout has no stream orientation
wprintf(L"Привет!\n"); // stdout is wide and ru_RU locale
I think (untested) that in glibc you can even reopen stdout with explicit locale (see GNU opening streams):
freopen("/dev/stdout", "w,css=ru_RU.UTF-8", stdout);
std::wcout << L"Привет!\n"; // fine
In any case, try to set locale as soon as possible before doing anything else.

架 (U+67B6) is not graphical with en_US.UTF-8. Whats going on?

This is a follow up question to:
std::isgraph asserts, how to fix?
After setting locale to "en_US.UTF-8", std::isgraph no longer asserts.
However, the unicode character 架 (U+67B6) is reported as false in the same function. What is going on ?
It's a unicode built on Windows platform.
If you want to test characters that are too large to fit in an unsigned char, you can try using the wide-character versions, or a Unicode library as already suggested (Which is really the better option for portable code, as it removes any system or locale based differences in behavior).
This program:
#include <clocale>
#include <cwctype>
#include <iostream>
int main() {
wchar_t x = L'\u67B6';
char *loc = std::setlocale(LC_CTYPE, "");
std::wcout << "Using locale " << loc << ".\n";
std::wcout << "Character " << x << " is graphical: " << std::boolalpha
<< static_cast<bool>(std::iswgraph(x)) << '\n';
return 0;
}
when compiled and ran on my Ubuntu test system, outputs
Using locale en_US.utf8.
Character 架 is graphical: true
You said you're using Windows, but I don't have a Windows computer available for testing, so I can't confirm if this'll work there or not.
std::isgraph is not a Unicode-aware function.
It's an antiquity from C.
From the documentation:
The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF.
It only takes int because .. it's an antiquity from C. Just like std::tolower.
You should be using something like ICU instead.

Get culture info of the system using win32 application

I am working win32 console application. I want to get the current system locale or culture info in my win32 application.
Like en-US or zh-CN.
Is there anything provided by WINAPI.
Sample code for this will really help.
You should use GetLocaleInfo.
wchar_t szISOLang[5] = { 0 };
wchar_t szISOCountry[5] = { 0 };
::GetLocaleInfo(LOCALE_USER_DEFAULT,
LOCALE_SISO639LANGNAME,
szISOLang,
sizeof(szISOLang) / sizeof(wchar_t));
::GetLocaleInfo(LOCALE_USER_DEFAULT,
LOCALE_SISO3166CTRYNAME,
szISOCountry,
sizeof(szISOCountry) / sizeof(WCHAR));
std::wcout << szISOLang << "_" << szISOCountry << std::endl;
In C or C++, you can create a locale based on a name, so if you supply a name like en-US it will create a matching locale (assuming you use one of the strings it knows about--obviously most libraries aren't going to recognize every possible string).
This has one little-known feature though: if you supply an empty string, it will create a locale that's appropriate for the environment as configured by the user (determined by some means the language doesn't specify).
So, you can retrieve that, and use it. For example:
#include <locale>
#include <iostream>
int main() {
auto loc = std::locale("");
std::cout << loc.name() << "\n";
}
On the machine I'm using at the moment (Linux), this prints out: en_US.UTF-8.

How to change Locale in MinGW

It seems like only the "C" locale is working with MinGW. I tried the example found here and no commas were added even though the system locale is set to Canada.
#include <iostream>
#include <locale>
int main()
{
std::wcout << "User-preferred locale setting is " << std::locale("").name().c_str() << '\n';
// on startup, the global locale is the "C" locale
std::wcout << 1000.01 << '\n';
// replace the C++ global locale as well as the C locale with the user-preferred locale
std::locale::global(std::locale(""));
// use the new global locale for future wide character output
std::wcout.imbue(std::locale());
// output the same number again
std::wcout << 1000.01 << '\n';
}
The output is
User-preferred locale setting is C
100000
100000
I tried std::locale("en-CA") and get locale::facet::_S_create_c_locale name not valid at run time. I compile from CMD using g++. I'm running 64bit Windows 10.
Also I tried compiling this program found in the accepted answer here and got the compiler error 'LOCALE_ALL' was not declared in this scope.
How can I set the locale in MinGW to the system default or something explicit?