Locale support in libc++ and libstdc++ [duplicate] - c++

#include <locale>
#include <iostream>
int main()
{
std::locale::global(std::locale("en_US.utf8"));
std::wcout << L"Həł£ō שøяļđ\n";
return 0;
}
This works as expected with libstdc++ (both gcc and clang), but only prints the first character (which happens to be ASCII) with libc++. I'm using libcxx-0.0_p20140322 on Gentoo Linux.
Is this a known bug in libc++, or just me not knowing how to cook it?
Update 1. I have tried
std::locale::global(std::locale("en_US.utf8"));
std::locale::global(std::locale(""));
std::setlocale(LC_ALL, "en_US.utf8");
std::setlocale(LC_ALL, "");
which all do the same thing.
Update 2. The wide string literal is here for simplicity. The same thing happens when the string is obtained in any other way (converted from UTF-8, read from binary file, ...)

You have to explicitly imbue the output stream with a locale, like so:
std::wcout.imbue(std::locale());
This makes things work as expected. In fact, it is required by the standard:
27.5.3.3 ios_base functions
locale getloc() const;
4 If no locale has been imbued, a copy of the global C++ locale, locale(), in effect at the time of construction.
So when wcout is constructed, it gets a copy of the initial locale imbued in it. The initial locale is "C". My incorrect assumption was that streams which have no locale explicitly imbued use the current global locale always (and not just at the time of construction). This assumption is totally unreasonable if one thinks about it a little.
June 2021 edit: So in theory just imbue should work, however in practice it doesn't in libstdc++. One needs to set the global locale for this to work, which is probably a libstdc++ bug. Imbuing works with other wide-character streams, but not with std::wcout.

Related

Inconsistent return from std::isblank between Visual C++ and gcc. Which one is wrong?

I'm seeing inconsistent behavior in a call to std::isblank between Visual C++ on Windows and gcc on Ubuntu and I'm wondering which one is correct.
On both compilers -- when the default locale is the "C" locale -- the following call returns false
std::isblank('\n');
This is what I expect. And it squares with what I see on cppreference.com
In the default C locale, only space (0x20) and horizontal tab (0x09)
are classified as blank characters.
However with C++, we also have the version that takes a std::locale argument
std::isblank('\n', std::locale::classic());
Here I am supplying std::locale::classic. Shouldn't that be the equivalent to the previous call? Because when I call this second version on Windows, it returns true. It considers a newline to be a blank character. Linux still says false.
Is my understanding (about std::locale::classic) correct? And if so, is the Visual C++ version wrong?
Yes, MSVS is wrong. [locale.statics] states:
static const locale& classic();
The "C" locale.
Returns: A locale that implements the classic "C" locale semantics, equivalent to the value locale("C").
Remarks: This locale, its facets, and their member functions, do not change with time.
Thus the following:
std::isblank('\n', std::locale::classic());
Is the same as:
std::isblank('\n');
Where locale("C") has been called.

Creating an std::locale with an empty string ""

Question
Is constructing std::locale with an empty string to get the user-preferred native locale a part of the standard? If yes, could you point out a source which explicitly states that?
Problem description
Example from documentation of std::locale has this line:
std::wcout << "User-preferred locale setting is " << std::locale("").name().c_str()
Which hints that creating a locale with an empty string will return a user-preferred native locale. After quick googling, this article also mentions:
The empty string tells setlocale to use the locale specified by the
user in the environment.
However, when looking at the documentation for std::locale constructors, there is no mentioning of a special case, when an empty string is provided.
Here's the quote:
3-4) Constructs a copy of the system locale with specified std_name
(such as "C", or "POSIX", or "en_US.UTF-8", or "English_US.1251"), if
such locale is supported by the operating system. The locale
constructed in this manner has a name.
The draft standard says in [locale.cons]:
explicit locale(const char* std_name);
Effects:
Constructs a locale using standard C locale names, e.g., "POSIX". The resulting locale implements semantics defined to be associated with that name.
Throws:
runtime_error if the argument is not valid, or is null.
Remarks:
The set of valid string argument values is "C" , "" , and any implementation-defined values.
This says "" is a valid constructor argument, and arguments are standard C locale names.
Then in [c.locale] it explicitly refers to the standard C header <locale.h>.
Quoting from the C standard (C99), 7.11.1.1/3:
A value of "C" for locale specifies the minimal environment for C translation; a value of "" for locale specifies the locale-specific native environment. Other implementation-defined strings may be passed as the second argument to setlocale.
I think this means the answer to your question is "yes": A name of "" refers to the native locale.

Is there any C/C++ system() function accepting unicode?

Question: In C/C++, is there any system function that accepts Unicode ?
See below for reason:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[])
{
// http://stackoverflow.com/questions/3313332/bringing-another-apps-window-to-front-on-mac-in-c
system("osascript -e \"tell application \\\"Address Book\\\" to activate\"");
return EXIT_SUCCESS;
}
Use _wsystem.
_wsystem is a wide-character version of system; the command argument to _wsystem is a wide-character string. These functions behave identically otherwise.
http://msdn.microsoft.com/en-us/library/277bwbdz.aspx
A special function named main is the starting point of execution for all C and C++ programs. If you are writing code that adheres to the Unicode programming model, you can use wmain, which is the wide-character version of main.
http://msdn.microsoft.com/en-us/library/vstudio/6wd819wh.aspx
system() does not care about the encoding as far as I know, it should just pass it through.
maybe your question is "how to type a UTF-8 string literal in C", or "what encoding does osascript expect"?
the portable way to do UTF-8 in C is with \x escape sequences, though if you are willing to rely on C99 or a specific compiler you can often type the UTF-8 directly.
I would guess osascript expects UTF-8 though I have no real idea.
Standard C and C++ do not explicitly understand Unicode at all: none of the standard APIs are defined as accepting or returning Unicode strings. Unlike Java. Whether a wide string or multibyte string is actually a Unicode encoded string is system dependent. So the simple answer is no.

Unicode support in C++0x

I'm trying to use new unicode characters in C++0x.
So I wrote sample code:
#include <fstream>
#include <string>
int main()
{
std::u32string str = U"Hello World";
std::basic_ofstream<char32_t> fout("output.txt");
fout<<str;
return 0;
}
But after executing this program I'm getting empty output.txt file. So why it's not printing Hello World?
Also is there something like a cout and cin already defined for these types, or stdin and stdout doesn't support Unicode?
Edit: I'm using g++ and Linux.
EDIT:АТТЕNTION. I have discovered, that standard committee dismissed Unicode streams from C++0x. So previously accepted answer is not correct anymore. For more information see my answer!
Unicode string literals support began in GCC 4.5. Maybe that's the problem.
[edit]
After some digging I've found that streams for this new unicode literals are described in N2035 and it was included in a draft of the standard. According to this document you need u32ofstream to output you string but this class is absent in GCC 4.5 C++0x library.
As a workaround you can use ordinary fstream:
std::ofstream fout2("output2.txt", std::ios::out | std::ios::binary);
fout2.write((const char *)str.c_str(), str.size() * 4);
This way I've output your string in UTF-32LE on my Intel machine (which is little-endian).
[edit]
I was a little bit wrong about the status of u32ofstream: according to the latest draft on the The C++ Standards Committee's web site you have to use std::basic_ofstream<char32_t> as you did. This class would use codecvt<char32_t,char,typename traits::state_type> class (see end of §27.9.1.1) which has to be implemented in the standard library (search codecvt<char32_t in the document), but it's not available in GCC 4.5.
In new C++ standard there will not be Unicode streams.
As #ssmir mentioned, standard committee was going to add stream support for Unicode in C++0x. However in the feature editions committee decided to remove stream support for Unicode. For more information see this link.
It seams like the only way to output Unicode string is to convert it to ASCII string with codecvt .
When creating, the stream tries to obtain a 'codecvt' from the global locale, but fails to get one because the only standard codecvt's are for char and wchar_t.
As a result, _M_codecvt member of the stream object is NULL.
Later, during the attempt to output, your code throws an exception (not visible to user) in facet checking function in basic_ios.h, because the facet is initialized from _M_codecvt.
Add a facet to the local associated with the stream to do the conversion from char32_t to the correct output.
Imbue the stream with a locale containing a codecvt of the right type.

What std::locale names are available on common windows compilers?

The standard is pretty much silent on what constitutes a valid locale name; only that passing an invalid locale name results in std::runtime_error. What locale names are usable on common windows compilers such as MSVC, MinGW, and ICC?
Ok, there is a difference between C and C++ locales.
Let's start:
MSVC C++ std::locale and C setlocale
Accepts locale names as "Language[_Country][.Codepage]" for example "English_United States.1251" Otherwise would throws. Note: codepage can't be 65001/UTF-8 and should be consistent with ANSI codepage for this locale (or just omitted)
MSVC C++ std::locale and C setlocale in Vista and 7 should accept locales
[Language][-Script][-Country] like "en-US" using ISO-631 language codes and
ISO 3166 regions and script names.
I tested it with Visual Studio on Windows 7 - it does not work.
MinGW C++ std::locale accepts "C" and "POSIX" it does not support other locales,
actually gcc supports locales only over GNU C library - basically only under Linux.
setlocale is native Windows API call so should support all I mentioned above.
It may support wider range of locales when used with alternative C++ libraries
like Apache stdcxx or STL Port.
ICC - I hadn't tested it but it depends on the standard C++ library it uses. For
example under Linux it used GCC's libstdc++ so it supports all the locales gcc
supports. I don't know what standard C++ library it uses under Windows.
If you want to have "compiler and platform" independent locales support (and actually
much better support) take a look on Boost.Locale
Artyom
I believe the information you need is here :
locale "lang[_country_region[.code_page]]"
| ".code_page"
| ""
| NULL
This page provides links to :
Language Strings
Country/Region String
Code Pages
Although my answers covers setlocale instead of std::locale, this MSDN page seems to imply that the format is indeed the same :
An object of class locale also stores
a locale name as an object of class
string. Using an invalid locale name
to construct a locale facet or a
locale object throws an object of
class runtime_error. The stored
locale name is "*" if the locale
object cannot be certain that a
C-style locale corresponds exactly to
that represented by the object.
Otherwise, you can establish a matching locale within the Standard C
Library, for the locale object loc, by
calling setlocale(LC_ALL,
loc.name.c_str).
Also see this page and this thread which tend to show that std::locale internally uses setlocale.
Here's one locale name that's usable pretty much anywhere: "". That is, the empty string. The is in contrast to the "C" locale that you are probably getting by default. The empty string as an argument to std::setlocale() means something like "Use the preferred locale set by the user or environment." If you use this, the downside is that your program won't have the same output everywhere; the upside is that your users might think it works just the way they want.