C++ decimal points - c++

I have this function:
#include <cstdlib>
#include <clocale>
double SimplifiedExample () {
setlocale (LC_ALL, "");
char txt [] = "3.14";
for (int x=0; txt [x]; x++)
if (txt [x] == '.')
txt [x] = localeconv ()->decimal_point [0];
char *endptr;
return strtod (txt, &endptr);
}
This works on every Windows 10 and Linux system I've tried it on. It fails on at least one Windows 7 machine (which reports "," for localeconv ()->decimal_point [0], but uses "." in both sprintf and strtod). Is the use of localeconv to supply the system decimal point to strtod correct, or not?
Note that I can't remove the setlocale() call. The application MUST respect locale settings.

Have you tried "man strtod"? I have Slackware linux and the strtod man page has strtod, which mentions locale for the decimal point, and says it lives in stdlib.h.

You're falling into a bog of implementation defined behaviour.
The problem might be in that the setlocale on Windows doesn't behave how it is proposed in standards, due to it being a library wrapper around OS's idiosincrazie, e.g. UTF-8 supported in C++ runtime only in Win10. More of, some C++ library implementations of sprintf\printf\strtod respects only std::locale::global. More of, some runtimes do not support anything but "C" and "" and formatted output only ANSI or IEEE-754 (hence , a dot). Old Win10 and older versions do not even respect IEEE-754.
E.g, an arbitrary MinGW64 on Windows-7 would respond to this code:
auto old_locale = std::locale::global(std::locale(""));
cout << old_locale.name() << "\n";
cout << std::locale("").name() << "\n";
with reporting an ANSI locale
C
C
output regardless of system locale and doesn't recognize OEM locale ".OCP" or any present locale name. This behaviour is hard-coded in library linked with application,not in some system-wide one.
Note that you can try check returned value of setlocale:
auto loc = setlocale(LC_ALL, "");
if (loc == NULL){
printf ("setlocale failed!\n");
}
else
printf ("setlocale %s!\n",loc );
And in my case it had returned... "English_United States.1252" while I have a US locale set for user and "Russian_Russia.1251" for russian user locale, which doesn't match more usual "En-US" format. Windows uses underscores.
There would be another problem to solve while interfacing with some database service which would be run as a different user. Existing databases already implement support in defining what locale the particular database uses.
More of output of all library functions didn't change with the change of locale.
Developers of multiplatform applications are urged to use proper library packages for formatted input\output which would eliminate the struggle with platform and implementation specific behaviour.

Related

How do I use system("chcp 936") in my dialog based project?

The code below is supposed to convert a wstring "!" to a string and output it,
setlocale(LC_ALL, "Chinese_China.936");
//system("chcp 936");
std::wstring ws = L"!";
string as((ws.length()) * sizeof(wchar_t), '-');
auto rs = wcstombs((char*)as.c_str(), ws.c_str(), as.length());
as.resize(rs);
cout << rs << ":" << as << endl;
If you run it without system("chcp 936");, the converted string is "£¡" rather than "!". If with system("chcp 936");, the result is correct in a console project.
But on my Dialog based project, system("chcp 936")is useless, even if it's workable, I can't use it, because it would popup a console.
PS: the IDE is Visual Studio 2019, and my source code is stored as in UTF-8 with signature.
My operation system language is English and language for non-unicode programs is English (United States).
Edit: it's interesting, even with "en-US" locale, "!" can be converted to an ASCII "!".
But I don't get where "£¡" I got in the dialog based project.
There are two distinct points to considere with locales:
you must tell the program what charset should be used when converting unicode characters to plain bytes (this is the role for setlocale)
you must tell the terminal what charset it should render (this is the role for chcp in Windows console)
The first point depends on the language and optionaly libraries that you use in your program (here the C++ language and Standard Library)
The second point depends on the console application and underlying system. Windows console uses chcp, and you will find in that other post how you can configure xterm in a Unix-like system.
I found out the cause, the wstring-to-string conversion is no problem, the problem was I used CA2T to convert the Chinese punctuation mark and it failed. So it showed "£¡" in the UI finally.
By means of mbstowcs, the counterpart of wcstombs, it would work.

C++ - UTF 8 on windows 10 [duplicate]

If I want to make the following work on Windows, what is the correct locale and how do I detect that it is actually present:
Does this code work universaly, or is it just my system?
Although there isn't good support for named locales, Visual Studio 2010 does include the UTF-8 conversion facets required by C++11: std::codecvt_utf8 for UCS2 and std::codecvt_utf8_utf16 for UTF-16:
#include <fstream>
#include <iostream>
#include <string>
#include <locale>
#include <codecvt>
void prepare_file()
{
// UTF-8 data
char utf8[] = {'\x7a', // latin small letter 'z' U+007a
'\xe6','\xb0','\xb4', // CJK ideograph "water" U+6c34
'\xf0','\x9d','\x84','\x8b'}; // musical sign segno U+1d10b
std::ofstream fout("text.txt");
fout.write(utf8, sizeof utf8);
}
void test_file_utf16()
{
std::wifstream fin("text.txt");
fin.imbue(std::locale(fin.getloc(), new std::codecvt_utf8_utf16<wchar_t>));
std::cout << "Read from file using UTF-8/UTF-16 codecvt\n";
for(wchar_t c; fin >> c; )
std::cout << std::hex << std::showbase << c << '\n';
}
void test_file_ucs2()
{
std::wifstream fin("text.txt");
fin.imbue(std::locale(fin.getloc(), new std::codecvt_utf8<wchar_t>));
std::cout << "Read from file using UTF-8/UCS2 codecvt\n";
for(wchar_t c; fin >> c; )
std::cout << std::hex << std::showbase << c << '\n';
}
int main()
{
prepare_file();
test_file_utf16();
test_file_ucs2();
}
this outputs, on my Visual Studio 2010 EE SP1
Read from file using UTF-8/UTF-16 codecvt
0x7a
0x6c34
0xd834
0xdd0b
Read from file using UTF-8/UCS2 codecvt
0x7a
0x6c34
0xd10b
Press any key to continue . . .
Basically, you are out of luck: http://www.siao2.com/2007/01/03/1392379.aspx
In the past UTF-8 (and some other code pages) wasn't allowed as the system locale because
Microsoft said that a UTF-8 locale might break some functions as they were written to assume multibyte encodings used no more than 2 bytes per character, thus code pages with more bytes such as UTF-8 (and also GB 18030, cp54936) could not be set as the locale.
https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8
However Microsoft has gradually introduced UTF-8 locale support and started recommending the ANSI APIs (-A) again instead of the Unicode (-W) versions like before
Until recently, Windows has emphasized "Unicode" -W variants over -A APIs. However, recent releases have used the ANSI code page and -A APIs as a means to introduce UTF-8 support to apps. If the ANSI code page is configured for UTF-8, -A APIs operate in UTF-8. This model has the benefit of supporting existing code built with -A APIs without any code changes.
-A vs. -W APIs
Firstly they added a "Beta: Use Unicode UTF-8 for worldwide language support" checkbox since Windows 10 insider build 17035 for setting the locale code page to UTF-8
To open that dialog box open start menu, type "region" and select Region settings > Additional date, time & regional settings > Change date, time, or number formats > Administrative
After enabling it you can call setlocal as normal:
Starting in Windows 10 build 17134 (April 2018 Update), the Universal C Runtime supports using a UTF-8 code page. This means that char strings passed to C runtime functions will expect strings in the UTF-8 encoding. To enable UTF-8 mode, use "UTF-8" as the code page when using setlocale. For example, setlocale(LC_ALL, ".utf8") will use the current default Windows ANSI code page (ACP) for the locale and UTF-8 for the code page.
UTF-8 Support
You can also use this in older Windows versions
To use this feature on an OS prior to Windows 10, such as Windows 7, you must use app-local deployment or link statically using version 17134 of the Windows SDK or later. For Windows 10 operating systems prior to 17134, only static linking is supported.
Later in 2019 they added the ability for programs to use the UTF-8 locale without even setting the UTF-8 beta flag above. You can use the /execution-charset:utf-8 or /utf-8 options when compiling with MSVC or set the ActiveCodePage property in appxmanifest
Per MSDN, it would be named "english_us.65001". But code page 65001 is somewhat flaky on Windows.

How can I use std::imbue to set the locale for std::wcout?

I am trying to use the std::locale mechanism in C++11 to count words in different languages. Specifically, I have std::wstringstream which contains the title of a famous Russian novel ("Crime and Punishment" in English). What I want to do is to use the appropriate locale (ru_RU.utf8 on my Linux machine) to read the stringstream, count the words and print the results. I should also probably note that my system is set to use the en_US.utf8 locale.
The desired result is this:
0: "Преступление"
1: "и"
2: "наказание"
I counted 3 words.
and the last word was "наказание"
That all works when I set the global locale, but not when I attempt to imbue the wcout stream. When I try that, I get this result instead:
0: "????????????"
1: "?"
2: "?????????"
I counted 3 words.
and the last word was "?????????"
Also, when I attempt to use a solution suggested in the comments, (which can be activate by changing #define USE_CODECVT 0 to #define USE_CODECVT 1) I get the error mentioned in this other question.
Those interested in experimenting with the code, or with compiler settings or both may wish to use this live code.
My questions
Why does that not work? Is it because wcout is already open?
Is there way to use imbue rather than setting the global locale to do what I want?
If it makes a difference, I'm using g++ 4.8.3. The full code is shown below.
getwords.cpp
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <locale>
#define USE_CODECVT 0
#define USE_IMBUE 1
#if USE_CODECVT
#include <codecvt>
#endif
using namespace std;
int main()
{
#if USE_CODECVT
locale ru("ru_RU.utf8",
new codecvt_utf8<wchar_t, 0x10ffff, consume_header>{});
#else
locale ru("ru_RU.utf8");
#endif
#if USE_IMBUE
wcout.imbue(ru);
#else
locale::global(ru);
#endif
wstringstream in{L"Преступление и наказание"};
in.imbue(ru);
wstring word;
unsigned wordcount = 0;
while (in >> word) {
wcout << wordcount << ": \"" << word << "\"\n";
++wordcount;
}
wcout << "\nI counted " << wordcount << " words.\n"
<< "and the last word was \"" << word << "\"\n";
}
First I did some more test using your code and I can confirm that L"Преступление и наказание" is a correct UTF16 string. I controlled the code of the individual characters, and they are correctly 0x41f, 0x440, 0x435, 0x441, 0x442, 0x443, 0x43f, 0x43b, 0x435, 0x43d, 0x438, 0x435, 0x20, 0x438, 0x20, 0x43d, 0x430, 0x43a, 0x430, 0x437, 0x430, 0x43d, 0x438, 0x435
I could not find any reference about it, but it looks like simply calling imbue is not enough. imbue it a method from basic_ios which is an ancestor of cout and wcout. It does act on numeric conversions, but on all my tests, it has no effect on the charset used for output.
By default, the locale used in a C++ (or C) program is ... the C locale which knows nothing about unicode. All printable ASCII characters (below 128) are outputted as is, and others are replaced with a ?. It is exactly what your program does.
To make it work correctly, you have to select a locale that knows about unicode characters with setlocale. Once this is done, you can change the numeric conversion by calling imbue, and as you selected a unicode charset all will be fine.
So provided your current locale uses an UTF-8 charset, you only have to add
setlocale(LC_ALL, "");
as first line in your program, and the output will be as expected :
0: "Преступление"
1: "и"
2: "наказание"
I counted 3 words.
and the last word was "наказание"
If your current locale does not use UTF-8, choose one that is installed on you system and that supports it. I used setlocale(LC_ALL, "fr_FR.UTF-8");, or even setlocale(LC_ALL, "en_US.UTF-8"); and both worked.
Edit :
In fact, the best way to correctly output unicode to screen is to use setlocale(LC_ALL, "");. It automatically adapts to the current charset. I tested with a stripped down variant using Latin1 charset (my system speaks natively french and not russian ...)
#include <iostream>
#include <locale>
using namespace std;
int main() {
setlocale(LC_ALL, "");
wchar_t ws[] = { 0xe8, 0xe9, 0 };
wcout << ws << endl;
}
I tried it under Linux using UTF-8 charset and ISO-8859-1 (latin1) (resp export LANG=fr_FR.UTF-8 and export LANG=fr_FR.ISO-8859-1) and I got correctly èé in the proper charset. I tried it also under Windows XP, with codepage 851 (oem) and 1252 (ansi) (resp. chcp 850 and chcp 1252 with Lucida console charset), and got èé on the console too.
Edit 2 :
Of course, you can also set a global C++ locale with locale::global(locale(""); with default locale or locale::global(locale("ru_RU.UTF-8"); with russian locale, but it is more than simply calling setlocale. According to the documentation of Gnu implementation of C++ Standard Library about locale : there is only one relation (of the C++ locale mechanism) to the C locale mechanism: the global C locale is modified if a named C++ locale object is set as the global locale", that is: std::locale::global(std::locale("")); affects the C functions as if the following call was made: std::setlocale(LC_ALL, "");. On the other hand, there is no vice versa, that is, calling setlocale has no whatsoever on the C++ locale mechanism, in particular on the working of locale("").
So it really looks like there was an underlying C library mechanizme that should be first enabled with setlocale to allow imbue conversion to work correctly.
In this answer, I'm taking the questions in reverse order, and adding another (with answer) that came up along the way.
Is there way to use imbue rather than setting the global locale to do what I want?
Yes. By default, std::wcout is synchronized to the underlying stdout C stream. So std::wcout can use imbue if that synchronization is turned off, allowing the C++ stream to operate independently. So to modify the original code to use imbue and work as intended only a single line need be added, calling std::ios_base::sync_with_stdio:
std::ios_base::sync_with_stdio(false);
std::wcout.imbue(ru);
Why didn't the original version work?
The standard (I'm referring to INCITS/ISO/IEC 14882-2011[2012]) says very little about the tie to the underlying stdio stream, but in 27.4.3 it says
The object wcout controls output to a stream buffer associated with the object stdout, declared in <cstdio>
Further, without explicitly setting a global locale, the locale is the "C" locale which is US English ASCII, so this appears to imply that stdout will, by default, have an ASCII mapping. Since no Cyrillic characters are represented in ASCII, the underlying stdout is what converts the proper Russian into a series of ? characters.
Why must the sync_with_stdio call precede imbue?
According to 27.5.3.4 of the standard:
If any input or output operation has occurred using the standard streams prior to the call,
the effect is implementation-defined. Otherwise, called with a false argument, it allows the standard streams to operate independently of the standard C streams.
I don't know what languages you're planning on supporting, but there are languages where your algorithm doesn't apply, eg. Japanese. I suggest checking out the word iterators in International Components for Unicode. http://userguide.icu-project.org/boundaryanalysis

What is the Windows equivalent for en_US.UTF-8 locale?

If I want to make the following work on Windows, what is the correct locale and how do I detect that it is actually present:
Does this code work universaly, or is it just my system?
Although there isn't good support for named locales, Visual Studio 2010 does include the UTF-8 conversion facets required by C++11: std::codecvt_utf8 for UCS2 and std::codecvt_utf8_utf16 for UTF-16:
#include <fstream>
#include <iostream>
#include <string>
#include <locale>
#include <codecvt>
void prepare_file()
{
// UTF-8 data
char utf8[] = {'\x7a', // latin small letter 'z' U+007a
'\xe6','\xb0','\xb4', // CJK ideograph "water" U+6c34
'\xf0','\x9d','\x84','\x8b'}; // musical sign segno U+1d10b
std::ofstream fout("text.txt");
fout.write(utf8, sizeof utf8);
}
void test_file_utf16()
{
std::wifstream fin("text.txt");
fin.imbue(std::locale(fin.getloc(), new std::codecvt_utf8_utf16<wchar_t>));
std::cout << "Read from file using UTF-8/UTF-16 codecvt\n";
for(wchar_t c; fin >> c; )
std::cout << std::hex << std::showbase << c << '\n';
}
void test_file_ucs2()
{
std::wifstream fin("text.txt");
fin.imbue(std::locale(fin.getloc(), new std::codecvt_utf8<wchar_t>));
std::cout << "Read from file using UTF-8/UCS2 codecvt\n";
for(wchar_t c; fin >> c; )
std::cout << std::hex << std::showbase << c << '\n';
}
int main()
{
prepare_file();
test_file_utf16();
test_file_ucs2();
}
this outputs, on my Visual Studio 2010 EE SP1
Read from file using UTF-8/UTF-16 codecvt
0x7a
0x6c34
0xd834
0xdd0b
Read from file using UTF-8/UCS2 codecvt
0x7a
0x6c34
0xd10b
Press any key to continue . . .
Basically, you are out of luck: http://www.siao2.com/2007/01/03/1392379.aspx
In the past UTF-8 (and some other code pages) wasn't allowed as the system locale because
Microsoft said that a UTF-8 locale might break some functions as they were written to assume multibyte encodings used no more than 2 bytes per character, thus code pages with more bytes such as UTF-8 (and also GB 18030, cp54936) could not be set as the locale.
https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8
However Microsoft has gradually introduced UTF-8 locale support and started recommending the ANSI APIs (-A) again instead of the Unicode (-W) versions like before
Until recently, Windows has emphasized "Unicode" -W variants over -A APIs. However, recent releases have used the ANSI code page and -A APIs as a means to introduce UTF-8 support to apps. If the ANSI code page is configured for UTF-8, -A APIs operate in UTF-8. This model has the benefit of supporting existing code built with -A APIs without any code changes.
-A vs. -W APIs
Firstly they added a "Beta: Use Unicode UTF-8 for worldwide language support" checkbox since Windows 10 insider build 17035 for setting the locale code page to UTF-8
To open that dialog box open start menu, type "region" and select Region settings > Additional date, time & regional settings > Change date, time, or number formats > Administrative
After enabling it you can call setlocal as normal:
Starting in Windows 10 build 17134 (April 2018 Update), the Universal C Runtime supports using a UTF-8 code page. This means that char strings passed to C runtime functions will expect strings in the UTF-8 encoding. To enable UTF-8 mode, use "UTF-8" as the code page when using setlocale. For example, setlocale(LC_ALL, ".utf8") will use the current default Windows ANSI code page (ACP) for the locale and UTF-8 for the code page.
UTF-8 Support
You can also use this in older Windows versions
To use this feature on an OS prior to Windows 10, such as Windows 7, you must use app-local deployment or link statically using version 17134 of the Windows SDK or later. For Windows 10 operating systems prior to 17134, only static linking is supported.
Later in 2019 they added the ability for programs to use the UTF-8 locale without even setting the UTF-8 beta flag above. You can use the /execution-charset:utf-8 or /utf-8 options when compiling with MSVC or set the ActiveCodePage property in appxmanifest
Per MSDN, it would be named "english_us.65001". But code page 65001 is somewhat flaky on Windows.

how do I set the proper initial locale for a C++ program on Windows?

I'm fairly new to localized programming, and I'm trying to figure out how to set the proper initial locale for a newly-launched unmanaged C++ application (from within the app).
As far as I can tell, new applications start with the C locale, rather than the proper regional locale (English, German, etc). So what I need to do is call setlocale( LC_ALL, "???" ), but I'm not sure how to get the correct value for the second argument. It will be something like "English" or "German:Germany" - basically whatever locale was set by the user via the Regional and Language Options control panel. Just to be clear, I'm not looking for how to format the locale string, I'm looking for the correct locale string for the computer where the app is running.
I'm guessing that there's some Win32 API that would give me this, or perhaps a registry key that would contain the proper value. Does anybody know what I should be doing?
setlocale() is C, not C++. I vaguely remember seeing interference between the two on VC6, but that was a bug. Normally, setlocale() affects the behavior of the C functions only.
In C++, localization is controlled by the std::locale class. By default, locale-sensitive operations use the global locale, which is obtained by default-constructing a locale object, and can be set with std::locale::global(const std::locale&).
Constructing a locale object with an empty string (std::locale("")) creates a locale corresponding to the program's environment.
At program startup, the global locale is the "C" or "Classic" locale. To set the global locale to the program's environment locale (which I guess is what you're asking), you thus write:
std::locale::global(std::locale(""));
For example, my regional settings are currently set to French(Canada). Running this:
int main(void)
{
std::cout << std::locale().name() << std::endl;
std::locale::global(std::locale(""));
std::cout << std::locale().name() << std::endl;
std::locale::global(std::locale("C"));
std::cout << std::locale().name() << std::endl;
return 0;
}
prints:
C
French_Canada.1252
C