stupid bug with fscanf - comma instead of point - c++

I have a stupid bug in one of my c++ source for a project. I do in this part of the source I/O operations. I have a stupid bug where I print the fscanf read value. Below in this part :
Firstly, I don't read the good value and when I print a float value, I get a decimal value with a comma ',' instead of a point '.' between the integer part and the floating part.
FILE* file3;
file3=fopen("test.dat","r");
float test1;
fscanf(file3," %f ",&test1);
printf("here %f\n",test1);
float test3 = 1.2345;
printf("here %f\n",test3);
fclose(file3);
where test.dat file contains "1.1234" and I get at the execution :
here 1,000000
here 1,234500
So, I did a simple test C program compiled with g++ :
#include <stdio.h>
#include <stdlib.h>
int main()
{
FILE* file3;
float test3;
file3=fopen("test.dat","r");
fscanf(file3,"%f",&test3);
printf("here %f\n",test3);
fclose(file3);
}
and it gives :
here 1.123400
This is the first time I have a bug like this. Anyone could see what's wrong ?

Is your C++ locale somehow set to use a European convention? They use commas where we use points and points for thousand's separators.
Have a look at settings of environment variables
LANG
LC_CTYPE
LC_ALL
try setting en_GB or en_US. Having established that it is a locale problem, next decide what behaviour makes sense. Is diplaying 1224,45 a bug at all? The user has locale set for a reason.

You're using code that using the locale set for the programs environment. In some locale's such as in French-speaking locale's, the comma is a the decimal separator. So this code is doing what its locale is presumably telling it to.
In your simple code, you have not initialise the locale support, so this does not happen.
Assuming a Unix-like environment, what is the value of the environment variable LANG, and the various LC_* environment variables?
env | grep -e ^LANG -e ^LC_
For some background reading, try some of the GNU Libc manual (Locales and Internationalisation)
http://www.gnu.org/software/libc/manual/html_node/Locales.html#Locales

My guess is that the application is setting the locale to the
users preference, with std::locale::global( std::locale( "" )
). Which is what console applications should
do, always; they should also imbue std::cin, std::cout and
std::cerr with this locale. Most linguistic communities use
the comma for the decimal, rather than the point. (For demon
processes, it is often more appropriate to use the "C" locale,
or the "Posix" locale under Unix, regardless of where the
server is actually running. Since the "C" locale is the
default, doing nothing is generally fine for demons and
servers.)
The global locale affects all C style input and output, which
is yet another reason for using C++'s iostream instead. With
std::ifstream, just imbue the stream with the locale in which
the file was written before doing your first input. For
machine generated files, the "C" locale is the usual default,
so your code would become something like:
std::ifstream file3( "test.dat" );
if ( ! file3.is_open() ) {
// error handling...
}
file3.imbue( std::locale( "C" ) );
float test1;
file3 >> test1;
// ...
For output to the terminal, expect the locale conventions to be
followed. And set the environment variables to specify the
locale you want to see.

Related

C++ decimal points

I have this function:
#include <cstdlib>
#include <clocale>
double SimplifiedExample () {
setlocale (LC_ALL, "");
char txt [] = "3.14";
for (int x=0; txt [x]; x++)
if (txt [x] == '.')
txt [x] = localeconv ()->decimal_point [0];
char *endptr;
return strtod (txt, &endptr);
}
This works on every Windows 10 and Linux system I've tried it on. It fails on at least one Windows 7 machine (which reports "," for localeconv ()->decimal_point [0], but uses "." in both sprintf and strtod). Is the use of localeconv to supply the system decimal point to strtod correct, or not?
Note that I can't remove the setlocale() call. The application MUST respect locale settings.
Have you tried "man strtod"? I have Slackware linux and the strtod man page has strtod, which mentions locale for the decimal point, and says it lives in stdlib.h.
You're falling into a bog of implementation defined behaviour.
The problem might be in that the setlocale on Windows doesn't behave how it is proposed in standards, due to it being a library wrapper around OS's idiosincrazie, e.g. UTF-8 supported in C++ runtime only in Win10. More of, some C++ library implementations of sprintf\printf\strtod respects only std::locale::global. More of, some runtimes do not support anything but "C" and "" and formatted output only ANSI or IEEE-754 (hence , a dot). Old Win10 and older versions do not even respect IEEE-754.
E.g, an arbitrary MinGW64 on Windows-7 would respond to this code:
auto old_locale = std::locale::global(std::locale(""));
cout << old_locale.name() << "\n";
cout << std::locale("").name() << "\n";
with reporting an ANSI locale
C
C
output regardless of system locale and doesn't recognize OEM locale ".OCP" or any present locale name. This behaviour is hard-coded in library linked with application,not in some system-wide one.
Note that you can try check returned value of setlocale:
auto loc = setlocale(LC_ALL, "");
if (loc == NULL){
printf ("setlocale failed!\n");
}
else
printf ("setlocale %s!\n",loc );
And in my case it had returned... "English_United States.1252" while I have a US locale set for user and "Russian_Russia.1251" for russian user locale, which doesn't match more usual "En-US" format. Windows uses underscores.
There would be another problem to solve while interfacing with some database service which would be run as a different user. Existing databases already implement support in defining what locale the particular database uses.
More of output of all library functions didn't change with the change of locale.
Developers of multiplatform applications are urged to use proper library packages for formatted input\output which would eliminate the struggle with platform and implementation specific behaviour.

How to get output in other spoken languages other than English?

I want to make an calculator in a spoken language another than English e.g. Hindi. I tried to make one using '''cout''' function but it gave me unexpected errors. Please Tell what should I do
Code Sample
cout<<name<<", "<<"कृपया ऑपरेटर दर्ज करें--> ";
cin>>op;
switch (op)
{
case '+':
cout<<name<<", "<<"कृपया पहला नंबर दर्ज करें--> ";
cin>>x;
cout<<name<<", "<<"कृपया दूसरा नंबर दर्ज करें--> ";
cin>>y;
Result:
कृपया ऑपरेटर दर्ज करें--> +
कृपया पहला नंबर दर्ज करें-->
कृपया दूसरा नंबर दर्ज करें-->
I am not familiar with Hindi font used in console, but you need to do the following things:-
Set the correct code page using std::locale OR use unicode (for examplestd::wstring and std::wcout).
Set your console to a font that can display those characters.
I would suggest you to find the character set Hindi, like English has ASCII (classification ISO646 series).
You can find more info about wstring and wcout , here posted by a fellow user.Remember to save the code file to UTF8 (or in the correct locale) otherwise the characters will simply 'disappear' from the code.
For Display output in Hindi or any none language, we Firstly need to set UTF encoding to output encoding
#include <string>
#include <iostream>
#include <Windows.h>
int main() {
std::string test = u8"Greek: αβγδ; German: Übergrößenträger";
SetConsoleOutputCP(65001);
std::cout << test;
}
You can use setlocale function in C++. You can include locale.h header (if you need more C specific functionality, this functionality can give possibility to play with currency, language and so on).
Example code:
include <locale.h>
int main(){
setlocale(LC_ALL,"en_US.utf8");
// Your code
return 0;
}
Explanation: setlocale function sets locale information to be used by the current program instead of changing whole locale. setlocale function accepts two parameters: 1. category (LC_ALL); 2. locale ("en_US.utf8"). Locales can be different.
Reference: setlocale in c++ (reference)
This function helped me a lot in my projects.

How can I use std::imbue to set the locale for std::wcout?

I am trying to use the std::locale mechanism in C++11 to count words in different languages. Specifically, I have std::wstringstream which contains the title of a famous Russian novel ("Crime and Punishment" in English). What I want to do is to use the appropriate locale (ru_RU.utf8 on my Linux machine) to read the stringstream, count the words and print the results. I should also probably note that my system is set to use the en_US.utf8 locale.
The desired result is this:
0: "Преступление"
1: "и"
2: "наказание"
I counted 3 words.
and the last word was "наказание"
That all works when I set the global locale, but not when I attempt to imbue the wcout stream. When I try that, I get this result instead:
0: "????????????"
1: "?"
2: "?????????"
I counted 3 words.
and the last word was "?????????"
Also, when I attempt to use a solution suggested in the comments, (which can be activate by changing #define USE_CODECVT 0 to #define USE_CODECVT 1) I get the error mentioned in this other question.
Those interested in experimenting with the code, or with compiler settings or both may wish to use this live code.
My questions
Why does that not work? Is it because wcout is already open?
Is there way to use imbue rather than setting the global locale to do what I want?
If it makes a difference, I'm using g++ 4.8.3. The full code is shown below.
getwords.cpp
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <locale>
#define USE_CODECVT 0
#define USE_IMBUE 1
#if USE_CODECVT
#include <codecvt>
#endif
using namespace std;
int main()
{
#if USE_CODECVT
locale ru("ru_RU.utf8",
new codecvt_utf8<wchar_t, 0x10ffff, consume_header>{});
#else
locale ru("ru_RU.utf8");
#endif
#if USE_IMBUE
wcout.imbue(ru);
#else
locale::global(ru);
#endif
wstringstream in{L"Преступление и наказание"};
in.imbue(ru);
wstring word;
unsigned wordcount = 0;
while (in >> word) {
wcout << wordcount << ": \"" << word << "\"\n";
++wordcount;
}
wcout << "\nI counted " << wordcount << " words.\n"
<< "and the last word was \"" << word << "\"\n";
}
First I did some more test using your code and I can confirm that L"Преступление и наказание" is a correct UTF16 string. I controlled the code of the individual characters, and they are correctly 0x41f, 0x440, 0x435, 0x441, 0x442, 0x443, 0x43f, 0x43b, 0x435, 0x43d, 0x438, 0x435, 0x20, 0x438, 0x20, 0x43d, 0x430, 0x43a, 0x430, 0x437, 0x430, 0x43d, 0x438, 0x435
I could not find any reference about it, but it looks like simply calling imbue is not enough. imbue it a method from basic_ios which is an ancestor of cout and wcout. It does act on numeric conversions, but on all my tests, it has no effect on the charset used for output.
By default, the locale used in a C++ (or C) program is ... the C locale which knows nothing about unicode. All printable ASCII characters (below 128) are outputted as is, and others are replaced with a ?. It is exactly what your program does.
To make it work correctly, you have to select a locale that knows about unicode characters with setlocale. Once this is done, you can change the numeric conversion by calling imbue, and as you selected a unicode charset all will be fine.
So provided your current locale uses an UTF-8 charset, you only have to add
setlocale(LC_ALL, "");
as first line in your program, and the output will be as expected :
0: "Преступление"
1: "и"
2: "наказание"
I counted 3 words.
and the last word was "наказание"
If your current locale does not use UTF-8, choose one that is installed on you system and that supports it. I used setlocale(LC_ALL, "fr_FR.UTF-8");, or even setlocale(LC_ALL, "en_US.UTF-8"); and both worked.
Edit :
In fact, the best way to correctly output unicode to screen is to use setlocale(LC_ALL, "");. It automatically adapts to the current charset. I tested with a stripped down variant using Latin1 charset (my system speaks natively french and not russian ...)
#include <iostream>
#include <locale>
using namespace std;
int main() {
setlocale(LC_ALL, "");
wchar_t ws[] = { 0xe8, 0xe9, 0 };
wcout << ws << endl;
}
I tried it under Linux using UTF-8 charset and ISO-8859-1 (latin1) (resp export LANG=fr_FR.UTF-8 and export LANG=fr_FR.ISO-8859-1) and I got correctly èé in the proper charset. I tried it also under Windows XP, with codepage 851 (oem) and 1252 (ansi) (resp. chcp 850 and chcp 1252 with Lucida console charset), and got èé on the console too.
Edit 2 :
Of course, you can also set a global C++ locale with locale::global(locale(""); with default locale or locale::global(locale("ru_RU.UTF-8"); with russian locale, but it is more than simply calling setlocale. According to the documentation of Gnu implementation of C++ Standard Library about locale : there is only one relation (of the C++ locale mechanism) to the C locale mechanism: the global C locale is modified if a named C++ locale object is set as the global locale", that is: std::locale::global(std::locale("")); affects the C functions as if the following call was made: std::setlocale(LC_ALL, "");. On the other hand, there is no vice versa, that is, calling setlocale has no whatsoever on the C++ locale mechanism, in particular on the working of locale("").
So it really looks like there was an underlying C library mechanizme that should be first enabled with setlocale to allow imbue conversion to work correctly.
In this answer, I'm taking the questions in reverse order, and adding another (with answer) that came up along the way.
Is there way to use imbue rather than setting the global locale to do what I want?
Yes. By default, std::wcout is synchronized to the underlying stdout C stream. So std::wcout can use imbue if that synchronization is turned off, allowing the C++ stream to operate independently. So to modify the original code to use imbue and work as intended only a single line need be added, calling std::ios_base::sync_with_stdio:
std::ios_base::sync_with_stdio(false);
std::wcout.imbue(ru);
Why didn't the original version work?
The standard (I'm referring to INCITS/ISO/IEC 14882-2011[2012]) says very little about the tie to the underlying stdio stream, but in 27.4.3 it says
The object wcout controls output to a stream buffer associated with the object stdout, declared in <cstdio>
Further, without explicitly setting a global locale, the locale is the "C" locale which is US English ASCII, so this appears to imply that stdout will, by default, have an ASCII mapping. Since no Cyrillic characters are represented in ASCII, the underlying stdout is what converts the proper Russian into a series of ? characters.
Why must the sync_with_stdio call precede imbue?
According to 27.5.3.4 of the standard:
If any input or output operation has occurred using the standard streams prior to the call,
the effect is implementation-defined. Otherwise, called with a false argument, it allows the standard streams to operate independently of the standard C streams.
I don't know what languages you're planning on supporting, but there are languages where your algorithm doesn't apply, eg. Japanese. I suggest checking out the word iterators in International Components for Unicode. http://userguide.icu-project.org/boundaryanalysis

Using long strings with wcout, Simple "Hello World" doesn't works ! ( How do I change the way wcout is opened BINARY or TEXT )

Essentially the code "Hello World" a little bit modified should be made working correctly
#include <iostream>
#include <fstream>
using namespace std;
int main(){ wcout<<L"GoodMorning Καλημέρα"<<endl;return 0; }
do I ask too much ? That is the short story !
The long one is: My platform is ubuntu and g++.
As I recently realized streams may be opened in binary or text mode.
When we use wcout to print a wstring the mode should be text mode because the wide character string ( essentially utf32 ) should be converted during the transfer from the stream to the "file" in order to get the code page of the terminal ( actually that "code page" is utf8 - but could be any one e.g. in the specific case ISO8859-7).
So when we use:
std::wcout<<L"καλημέρα"<<std::endl;
if the mode of wcout is text mode the string gets converted from utf32 (linux - under conditions) or UCS2 string (windows) into the codepage of the terminal (linux utf8) according to the std::locale::global or to the imbue used on the stream.
So in order to work with wcout, the mode should be TEXT not BINARY.
But my system opens it in BINARY not TEXT so the text isn't shown correctly.
How do I change the way wcout is opened ?
Text vs. binary mode only controls whether a newline character '\n' gets converted to an end of line sequence on systems where text fikes concentionally have and end of line sequence. On POSIX systems text vs. binary mode has no effect. That said, there is no standard way to control text vs. binary mode. On POSIX systems you can probably just replace file descriptor 1. However, doing so will not change the encoding being used at all.
When you feel you need to play tricks with encodings you'll pretty much only get support when using fike streams. If you want to deal with encodings for other streams you'll need to implement your own stream buffer doing conversions.
This works nicely for me, using g++ 4.8.1 in some old Ubuntu:
#include <iostream>
#include <fstream>
#include <locale.h>
using namespace std;
int main()
{
setlocale( LC_ALL, "" );
wcout<<L"GoodMorning Καλημέρα"<<endl;
}
In Windows you need instead to call _setmode (and if using g++, -U the no-gcc-extensions-please macro symbol).
As Dietmar has already noted, C stream text mode has nothing to do with this.
I individuated the problem the output was used inside main as follows:
int main(){
std::locale::global(std::locale("en_US.utf-8"));
std::wcout<<L"Ελληνικά"<<std::endl;
While the wide printing is among the first instructiosn to use it the output was not correct.
I supposed that maybe somewhere inside the program ( may be initialization routines ) accessed the cout or wcout and so setted it in binary mode.
I changed the includes as follows:
#include <string>
#include <locale>
#include <iostream>
struct initLocale{
initLocale(){
std::locale::global(std::locale("en_US.utf-8"));
std::wcout<<L"Ελληνικά"<<std::endl;
}
The scope is to create a dummy variable just after including with the only scope to output something to wcout in order to be the first to use it.
With that change the output is correct, as soon as I comment out the line std::wcout<<L"Ελληνικά"<<std::endl; the output is again wrong.
You may also change std::wcout<<L"Ελληνικά"<<std::endl; into std::wcout<<std::endl and it works.
While I have solved my problem, that should be pointed out somehow and any way a way the question of how to change the binary or text behaviour of wcout should be found.

stdout and stderr character encoding

i working on a c++ string library that have main 4 classes that deals with ASCII, UTF8, UTF16, UTF32 strings, every class has Print function that format an input string and print the result to stdout or stderr. my problem is i don't know what is the default character encoding for those streams.
for now my classes work in windows, later i'll add support for mac and linux so if you know anything about those stream encoding i'll appreciate it.
so my question is: what is the default encoding for stdout and stderr and can i change that encoding later and if so what would happens to data stored there?
thank you.
stdout and stderr use "C" locale. "C" locale is netural, and in most system translated into the current user's locale. You can force the program to use a specific locale using setlocale function:
// Set all categories and return "English_USA.1252"
setlocale( LC_ALL, "English" );
// Set only the LC_MONETARY category and return "French_France.1252"
setlocale( LC_MONETARY, "French" );
setlocale( LC_ALL, NULL );
The locale strings supported are system and compiler specific. Only "C" and "" are required to be supported.
http://www.cplusplus.com/reference/clibrary/clocale/
You might take a look at this SO answer (most upvoted answer).
This is not exactly your question, but it is surely related and gives a lot of useful information.
I'm not expert here, but I guess we can assume you should use std::cout whenever you use std::string and std::wcout whenever you use std::wstring.