What std::locale names are available on common windows compilers? - c++

The standard is pretty much silent on what constitutes a valid locale name; only that passing an invalid locale name results in std::runtime_error. What locale names are usable on common windows compilers such as MSVC, MinGW, and ICC?

Ok, there is a difference between C and C++ locales.
Let's start:
MSVC C++ std::locale and C setlocale
Accepts locale names as "Language[_Country][.Codepage]" for example "English_United States.1251" Otherwise would throws. Note: codepage can't be 65001/UTF-8 and should be consistent with ANSI codepage for this locale (or just omitted)
MSVC C++ std::locale and C setlocale in Vista and 7 should accept locales
[Language][-Script][-Country] like "en-US" using ISO-631 language codes and
ISO 3166 regions and script names.
I tested it with Visual Studio on Windows 7 - it does not work.
MinGW C++ std::locale accepts "C" and "POSIX" it does not support other locales,
actually gcc supports locales only over GNU C library - basically only under Linux.
setlocale is native Windows API call so should support all I mentioned above.
It may support wider range of locales when used with alternative C++ libraries
like Apache stdcxx or STL Port.
ICC - I hadn't tested it but it depends on the standard C++ library it uses. For
example under Linux it used GCC's libstdc++ so it supports all the locales gcc
supports. I don't know what standard C++ library it uses under Windows.
If you want to have "compiler and platform" independent locales support (and actually
much better support) take a look on Boost.Locale
Artyom

I believe the information you need is here :
locale "lang[_country_region[.code_page]]"
| ".code_page"
| ""
| NULL
This page provides links to :
Language Strings
Country/Region String
Code Pages
Although my answers covers setlocale instead of std::locale, this MSDN page seems to imply that the format is indeed the same :
An object of class locale also stores
a locale name as an object of class
string. Using an invalid locale name
to construct a locale facet or a
locale object throws an object of
class runtime_error. The stored
locale name is "*" if the locale
object cannot be certain that a
C-style locale corresponds exactly to
that represented by the object.
Otherwise, you can establish a matching locale within the Standard C
Library, for the locale object loc, by
calling setlocale(LC_ALL,
loc.name.c_str).
Also see this page and this thread which tend to show that std::locale internally uses setlocale.

Here's one locale name that's usable pretty much anywhere: "". That is, the empty string. The is in contrast to the "C" locale that you are probably getting by default. The empty string as an argument to std::setlocale() means something like "Use the preferred locale set by the user or environment." If you use this, the downside is that your program won't have the same output everywhere; the upside is that your users might think it works just the way they want.

Related

Inconsistent return from std::isblank between Visual C++ and gcc. Which one is wrong?

I'm seeing inconsistent behavior in a call to std::isblank between Visual C++ on Windows and gcc on Ubuntu and I'm wondering which one is correct.
On both compilers -- when the default locale is the "C" locale -- the following call returns false
std::isblank('\n');
This is what I expect. And it squares with what I see on cppreference.com
In the default C locale, only space (0x20) and horizontal tab (0x09)
are classified as blank characters.
However with C++, we also have the version that takes a std::locale argument
std::isblank('\n', std::locale::classic());
Here I am supplying std::locale::classic. Shouldn't that be the equivalent to the previous call? Because when I call this second version on Windows, it returns true. It considers a newline to be a blank character. Linux still says false.
Is my understanding (about std::locale::classic) correct? And if so, is the Visual C++ version wrong?
Yes, MSVS is wrong. [locale.statics] states:
static const locale& classic();
The "C" locale.
Returns: A locale that implements the classic "C" locale semantics, equivalent to the value locale("C").
Remarks: This locale, its facets, and their member functions, do not change with time.
Thus the following:
std::isblank('\n', std::locale::classic());
Is the same as:
std::isblank('\n');
Where locale("C") has been called.

"wcs" and "_w" and "_mbs" prefix in Visual Studio

I am a little confused with respect to the difference in the functions which are defined with/without the wcs/_w/_mbs prefix.
For Example:
fopen(),_wfopen()
On msdn it is given that:
The fopen function opens the file that is specified by filename.
_wfopen is a wide-character version of fopen; the arguments to _wfopen are wide-character strings. Otherwise, _wfopen and fopen behave
identically.
I just had a doubt whether there is any platform dependence to windows associated with the addition of the "_w" prefix.
strcpy(),wcscpy(),_mbscpy()
On msdn it is given that:
wcscpy and _mbscpy are, respectively, wide-character and multibyte-character versions of strcpy.
Again there is a doubt if the addition of "wcs" or "_mbs" is platform dependent.
EDIT:
Is WideCharToMultiByte function also platform dependent?
WideCharToMultiByte is not a C Runtime function, it's a Windows
API,hence it is platform dependent
Similarly is wcstombs_s function also platform dependent?
It was nonstandard but was standardized in C11 Annex K.
The wcs* functions like wcscpy are part of the C Standard Library. The _wfopen function and other _w* functions are extensions, as are the multibyte string functions like _mbscpy.
For the most part, Visual C++ C Runtime (CRT) functions that have a leading underscore are extensions; functions that do not have a leading underscore are part of the C Standard Library.
There are two main exceptions, where extensions may not have leading underscores:
There are several extension functions, declared with an underscore prefix, that have prefixless aliases for backwards source compatibility. These aliases are deprecated, and if you try to use them you'll get a suppressable deprecation warning (C4996).
There are _s-suffixed secure alternative functions to some C Standard Library functions, e.g. scanf_s. These are declared by default, but their declarations may be suppressed by defining the macro __STDC_WANT_SECURE_LIB__ to have the value 0.
(These functions were actually added to C11 in the optional Annex K,
but note that there are a few differences between what is specified
in the C Standard and what is implemented by Visual C++. The
differences are due to a historical
accident.)
wcscpy is standard. _mbcscpy is specific to MS VC.
That's why there is an underscore at the start: names with leading underscore are reserved for implementation-specific things.

ISO C++ and the infamous underscore [duplicate]

This MSDN article states that getcwd() has been deprecated and that the ISO C++ compatible _getcwd should be used instead, which raises the question: what makes getcwd() not ISO-compliant?
There is a good discussion about that. P.J. Plauger answers to this
I'm the guy who insisted back in 1983 that the space of
names available to a C program be partitioned into:
a) those defined by the implementation for the benefit of the programmer (such as printf)
b) those reserved to the programmer (such as foo)
c) those reserved to the implementation (such as _unlink)
We knew even then that "the implementation" was too monolithic --
often more than one source supplies bits of the implementation --
but that was the best we could do at the time. Standard C++
has introduced namespaces to help, but they have achieved only
a fraction of their stated goals. (That's what happens when you
standardize a paper tiger.)
In this particular case, Posix supplies a list of category (a) names
(such as unlink) that you should get defined when and only when you
include certain headers. Since the C Standard stole its headers from
Unix, which is the same source as for Posix, some of those headers
overlap historically. Nevertheless, compiler warnings should have
some way of taking into account whether the supported environment
is "pure" Standard C++ (a Platonic ideal) or a mixed C/C++/Posix
environment. The current attempt by Microsoft to help us poor
programmers fails to take that into account. It insists on treating
unlink as a category (b) name, which is myopic.
Well, GCC will not declare POSIX names in strict C mode, at least (though, it still does in C++ mode):
#include <stdio.h>
int main() {
&fdopen;
return 0;
}
Output using -std=c99
test.c: In function 'main':
test.c:4: error: 'fdopen' undeclared (first use in this function)
You will have to tell it explicitly that you are operating in a mixed C/Posix by using feature test macros or not passing any specific standard. It will then default to gnu89 which assumes a mixed environment (man feature_test_macros). Apparently, MSVC does not have that possibility.
Functions not specified in the standard are supposed to be prefixed by an underscore as an indication that they're vendor-specific extensions or adhere to a non-ISO standard. Thus the "compliance" here was for Microsoft to add an underscore to the name of this specific function since it's not part of the ISO standard.
As others have already pointed out, getcwd is not included in ISO C++, but is part of POSIX/IEEE Std 1003.1.
Microsoft has decided to include some of the most commonly used POSIX functions in their C standard library (but prefix these functions with an underscore to essentially discourage their usage).
For the record, getcwd() wasn't deprecated by ISO. It was "deprecated" by Microsoft. Microsoft rewrote many C functions -- often with a little better security in mind (say, string functions that also take a max_length parameter). They then had their compiler spit out these warnings, which I consider bogus because no standards group deprecated any of the functions declared deprecated.
To add on to Dan Olson's post: See ANSI C Compliance page on MSDN
The names of Microsoft-specific functions and global variables begin with a single underscore. These names can be overridden only locally, within the scope of your code. For example, when you include Microsoft run-time header files, you can still locally override the Microsoft-specific function named _open by declaring a local variable of the same name. However, you cannot use this name for your own global function or global variable.
As far as I'm aware getcwd() has never been part of ISO Standard C++. _getcwd() definitely isn't, as standard names will not begin with an underscore.
In fact, the MSDN article links to a man page that says it is declared in direct.h, which is not a Standard C++ header file. The article seems bogus to me.
The MSDN article is somewhat confusing in what a normal person would conclude from just a quick reading (if they don't read it with a very careful lawyer eye).
What the MSDN article says is: getcwd() is not compliant with the ISO C++ standard. To comply with that ISO C++ standard for naming of functions (which is what getcwd violates), Microsoft properly put an _ on the front of the function, so the same function becomes _getcwd(). That is the ISO C++ compliant way of naming the function because getcwd() and _getcwd() are not an ISO C++ standard function, but are a Microsoft (vendor) specific, or implementation specific function.
The article does not indicate what a C++ ISO standard call to get the working directory would be... though thats what folks tend to read at a quick glance.

What new Unicode functions are there in C++0x?

It has been mentioned in several sources that C++0x will include better language-level support for Unicode(including types and literals).
If the language is going to add these new features, it's only natural to assume that the standard library will as well.
However, I am currently unable to find any references to the new standard library. I expected to find out the answer for these answers:
Does the new library provide standard methods to convert UTF-8 to UTF-16, etc.?
Does the new library allowing writing UTF-8 to files, to the console (or from files, from the console). If so, can we use cout or will we need something else?
Does the new library include "basic" functionality such as: discovering the byte count and length of a UTF-8 string, converting to upper-case/lower-case(does this consider the influence of locales?)
Finally, are any of these functions are available in any popular compilers such as GCC or Visual Studio?
I have tried to look for information, but I can't seem to find anything. I am actually starting to think that maybe these things aren't even decided yet(I am aware that C++0x is a work in progress).
Does the new library provide standard methods to convert UTF-8 to UTF-16, etc.?
No. The new library does provide std::codecvt facets which do the conversion for you when dealing with iostream, however. ISO/IEC TR 19769:2004, the C Unicode Technical Report, is included almost verbatim in the new standard.
Does the new library allowing writing UTF-8 to files, to the console (or from files, from the console). If so, can we use cout or will we need something else?
Yes, you'd just imbue cout with the correct codecvt facet. Note however that the console is not required to display those characters correctly
Does the new library include "basic" functionality such as: discovering the byte count and length of a UTF-8 string, converting to upper-case/lower-case(does this consider the influence of locales?)
AFAIK that functionality exists with the existing C++03 standard. std::toupper and std::towupper of course function just as in previous versions of the standard. There aren't any new functions which specifically operate on unicode for this.
If you need these kinds of things, you're still going to have to rely on an external library -- the <iostream> is the primary piece that was retrofitted.
What, specifically, is added for unicode in the new standard?
Unicode literals, via u8"", u"", and U""
std::char_traits classes for UTF-8, UTF-16, and UTF-32
mbrtoc16, c16rtomb, mbrtoc32, and c32rtomb from ISO/IEC TR 19769:2004
std::codecvt facets for the locale library
The std::wstring_convert class template (which uses the codecvt mechanism for code set conversions)
The std::wbuffer_convert, which does the same as wstring_convert except for raw arrays, not strings.

Why is getcwd() not ISO C++ compliant?

This MSDN article states that getcwd() has been deprecated and that the ISO C++ compatible _getcwd should be used instead, which raises the question: what makes getcwd() not ISO-compliant?
There is a good discussion about that. P.J. Plauger answers to this
I'm the guy who insisted back in 1983 that the space of
names available to a C program be partitioned into:
a) those defined by the implementation for the benefit of the programmer (such as printf)
b) those reserved to the programmer (such as foo)
c) those reserved to the implementation (such as _unlink)
We knew even then that "the implementation" was too monolithic --
often more than one source supplies bits of the implementation --
but that was the best we could do at the time. Standard C++
has introduced namespaces to help, but they have achieved only
a fraction of their stated goals. (That's what happens when you
standardize a paper tiger.)
In this particular case, Posix supplies a list of category (a) names
(such as unlink) that you should get defined when and only when you
include certain headers. Since the C Standard stole its headers from
Unix, which is the same source as for Posix, some of those headers
overlap historically. Nevertheless, compiler warnings should have
some way of taking into account whether the supported environment
is "pure" Standard C++ (a Platonic ideal) or a mixed C/C++/Posix
environment. The current attempt by Microsoft to help us poor
programmers fails to take that into account. It insists on treating
unlink as a category (b) name, which is myopic.
Well, GCC will not declare POSIX names in strict C mode, at least (though, it still does in C++ mode):
#include <stdio.h>
int main() {
&fdopen;
return 0;
}
Output using -std=c99
test.c: In function 'main':
test.c:4: error: 'fdopen' undeclared (first use in this function)
You will have to tell it explicitly that you are operating in a mixed C/Posix by using feature test macros or not passing any specific standard. It will then default to gnu89 which assumes a mixed environment (man feature_test_macros). Apparently, MSVC does not have that possibility.
Functions not specified in the standard are supposed to be prefixed by an underscore as an indication that they're vendor-specific extensions or adhere to a non-ISO standard. Thus the "compliance" here was for Microsoft to add an underscore to the name of this specific function since it's not part of the ISO standard.
As others have already pointed out, getcwd is not included in ISO C++, but is part of POSIX/IEEE Std 1003.1.
Microsoft has decided to include some of the most commonly used POSIX functions in their C standard library (but prefix these functions with an underscore to essentially discourage their usage).
For the record, getcwd() wasn't deprecated by ISO. It was "deprecated" by Microsoft. Microsoft rewrote many C functions -- often with a little better security in mind (say, string functions that also take a max_length parameter). They then had their compiler spit out these warnings, which I consider bogus because no standards group deprecated any of the functions declared deprecated.
To add on to Dan Olson's post: See ANSI C Compliance page on MSDN
The names of Microsoft-specific functions and global variables begin with a single underscore. These names can be overridden only locally, within the scope of your code. For example, when you include Microsoft run-time header files, you can still locally override the Microsoft-specific function named _open by declaring a local variable of the same name. However, you cannot use this name for your own global function or global variable.
As far as I'm aware getcwd() has never been part of ISO Standard C++. _getcwd() definitely isn't, as standard names will not begin with an underscore.
In fact, the MSDN article links to a man page that says it is declared in direct.h, which is not a Standard C++ header file. The article seems bogus to me.
The MSDN article is somewhat confusing in what a normal person would conclude from just a quick reading (if they don't read it with a very careful lawyer eye).
What the MSDN article says is: getcwd() is not compliant with the ISO C++ standard. To comply with that ISO C++ standard for naming of functions (which is what getcwd violates), Microsoft properly put an _ on the front of the function, so the same function becomes _getcwd(). That is the ISO C++ compliant way of naming the function because getcwd() and _getcwd() are not an ISO C++ standard function, but are a Microsoft (vendor) specific, or implementation specific function.
The article does not indicate what a C++ ISO standard call to get the working directory would be... though thats what folks tend to read at a quick glance.