Problem:
I have a QT based multiplatform (win,mac,*nix) application that parses ascii files containing decimal numbers.
parsing is done using a variety of different code pieces that use anything from qt string stuff, c++ stdin, oldstyle scanf, etc.
ascii files have always the '.' (dot) as separated decimal (e.g. in the file to be parsed 1/10 is written 0.1 as standard in many countries).
people using the application within a OS localized for using comma separated decimal encounter a lot of problems (e.g. for french users scanf expect to find 0,1 as a valid textual representation of 1/10 and if they find 0.1 scanf will parse it as 0)
How can I be sure that the OS Locale indication of how decimal point has to be written is always ignored?
Is it safe assuming that adding
QLocale::setDefault(QLocale(QLocale::English,QLocale::UnitedStates));
is enough to get rid of all these problems?
Any suggestion for portable ways of setting the locale globally?
Yes, this should be sufficient as it means anywhere a QLocale is created without arguments it will use US English. The scope is application (QApplication) wide.
Related
I am using two functions sprintf and snprintf for dealing with conversions of "double" to string,
In one of the case, the application which is running has a different locale than the Windows' locale. So, in such a scenario locale which is considered by sprintf is always of the application. Whereas, snprintf sometimes starts using Windows locale. As a consequence of this, decimal characters returned by both the methods are different and it causes a problem.
To provide further details,
I have a library in my project which builds a string from "double", this library uses snprintf to convert a double to string. Then I need to send this information to server which would understand "." (dot) only as a decimal symbol. Hence, I need to replace the local decimal character with a "." (dot). To find out the local decimal character (in order to replace it), I am using one of the libraries provided in my project which uses sprintf. Then I replace this character with a dot to get the final output.
Also, please note, sprintf is always considering locale of native application while snprintf sometimes considers locale of Windows.
As the problem is inconsistent, sorry for not providing a clear example.
So, what are the circumstances under which snprintf might behave differently?
Why am I getting such different behavior from these two methods?
How can I avoid it?
P.S. - I have to use these 2 methods, so please suggest a solution which would not require me to use any different methods.
Thanks.
The locale used by both sprintf and snprintf is not the Windows locale, but your application locale. As this locale is global to your application, any line of code in your program can change it.
In your case, the (not thread safe) solution may be to temporarily replace the locale for the snprintf call:
auto old = std::locale::global(std::locale::classic());
snprintf(...);
std::locale::global(old);
BTW, the "Windows locale" can be accessed via just std::locale("") , you don't need to know its exact name.
What information does the Standard library of C++ use when parsing a (float) number?
Here's the possibilities I know to parse a (single) float number with std c++:
double atof( const char *str )
sscanf
double strtod( const char* str, char** str_end );
istringstream, via operator>> or
via num_get directly
It seems obvious, that at the very least, we have to know what character is used as decimal separator.
iostreams, in particular num_get::get, in addition also talk about:
ios_base I/O format flags - Is there any information here that is used when parsing floating point?
the thousands_separator (* see below)
On the other hand, in std::strtod, which seems to be what sscanf is defined in terms of (which in turn is referenced by num_get), there the only variable information seems to be what is considered a space and the decimal character, although it doesn't seem to be specified where that is defined. (At least neither on cppref nor on MSDN.)
So, what information is actually used, and what comprises a valid parseable float representation for the C++ Standard lib?
From what I see, only the decimal separator from the global (Cor C++ ???) is needed and, in addition, if the number contains a thousands separator, I would expect it to only be parsed correctly by num_get since strod/sscanf do not support the thousands separator.
(*) The group (thousands) separator is an interesting case to me. As far as I can tell the "C" functions do not make any reference to it and last time I checked C and C++ standard printf function will never write it. So is it really processed by the strtod/scanf functions? (I know that there is a POSIX printf extension for the group separator, but that's not really standard, and notably missing from Microsoft's implementation.)
The C11 spec for strtod() seems to have a opening big enough for any size truck to drive through. It appears so open ended, I see no limitation.
ยง7.22.1.3 6 In other than the "C" locale, additional locale-specific subject sequence forms may be accepted.
For non- "standard C" locales, the isspace(), decimal (radix) point, group separator, digits per group and sign seem to constitute the typical variants. But apparently there is no limit.
For fun experimented with 500+ locales using printf(), sscanf(), strftime() and isspace().
All tested locales had a radix (decimal) point of '.' or ',', the same +/- sign, no digit grouping, and the expected 0-9.
strftime(... "%Y" ...) did not use a digit separator over years 1000-99999.
sscanf("1,234.5", "%lf", .. and sscanf("1.234,5", "%lf", .. did not produce 1234.5 in any locale.
All int values in the range 0 to 255 produced the same isspace() results with the sometimes exception of 154 and 160.
Of course these test do not prove a limit to what may occur, but do represent a sample of possibilities.
Let's say I have to read a file, containing a bunch of floating-point numbers. The numbers can be like 1e+10, 5, -0.15 etc., i.e., any generic floating-point number, using decimal points (this is fixed!). However, my code is a plugin for another application, and I have no control over what's the current locale. It may be Russian, for example, and the LC_NUMERIC rules there call for a decimal comma to be used. Thus, Pi is expected to be spelled as "3,1415...", and
sscanf("3.14", "%f", &x);
returns "1", and x contains "3.0", since it refuses to parse past the '.' in the string.
I need to ignore the locale for such number-parsing tasks.
How does one do that?
I could write a parseFloat function, but this seems like a waste.
I could also save the current locale, reset it temporarily to "C", read the file, and restore to the saved one. What are the performance implications of this? Could setlocale() be very slow on some OS/libc combo, what does it really do under the hood?
Yet another way would be to use iostreams, but again their performance isn't stellar.
My personal preference is to never use LC_NUMERIC, i.e. just call setlocale with other categories, or, after calling setlocale with LC_ALL, use setlocale(LC_NUMERIC, "C");. Otherwise, you're completely out of luck if you want to use the standard library for printing or parsing numbers in a standared form for interchange.
If you're lucky enough to be on a POSIX 2008 conforming system, you can use the uselocale and *_l family of functions to make the situation somewhat better. There are at least 2 basic approaches:
Leave the default locale unset (at least the troublesome parts like LC_NUMERIC; LC_CTYPE should probably always be set), and pass a locale_t object for the user's locale to the appropriate *_l functions only when you want to present things to the user in a way that meets their own cultural expectations; otherwise use the default C locale.
Have your code that needs to work with data for interchange keep around a locale_t object for the C locale, and either switch back and forth using uselocale when you need to work with data in a standard form for interchange, or use the appropriate *_l functions (but there is no scanf_l).
Note that implementing your own floating point parser is not easy and is probably not the right solution to the problem unless you're an expert in numerical computing. Getting it right is very hard.
POSIX.1-2008 specifies isalnum_l(), isalpha_l(), isblank_l(), iscntrl_l(), isdigit_l(), isgraph_l(), islower_l(), isprint_l(), ispunct_l(), isspace_l(), isupper_l(), and isxdigit_l().
Here's what I've done with this stuff in the past.
The goal is to use locale-dependent numeric converters with a C-locale numeric representation. The ideal, of course, would be to use non-locale-dependent converters, or not change the locale, etc., etc., but sometimes you just have to live with what you've got. Locale support is seriously broken in several ways and this is one of them.</rant>
First, extract the number as a string using something like the C grammar's simple pattern for numeric preprocessing tokens. For use with scanf, I do an even simpler one:
" %1[-+0-9.]%[-+0-9A-Za-z.]"
This could be simplified even more, depending on how what else you might expect in the input stream. The only thing you need to do is to not read beyond the end of the number; as long as you don't allow numbers to be followed immediately by letters, without intervening whitespace, the above will work fine.
Now, get the struct lconv (man 7 locale) representing the current locale using localeconv(3). The first entry in that struct is const char* decimal_point; replace all of the '.' characters in your string with that value. (You might also need to replace '+' and '-' characters, although most locales don't change them, and the sign fields in the lconv struct are documented as only applying to currency conversions.) Finally, feed the resulting string through strtod and see if it passes.
This is not a perfect algorithm, particularly since it's not always easy to know how locale-compliant a given library actually is, so you might want to do some autoconf stuff to configure it for the library you're actually compiling with.
I am not sure how to solve it in C.
But C++ streams (can) have a unique locale object.
std::stringstream dataStream;
dataStream.imbue(std::locale("C"));
// Note: You must imbue the stream before you do anything wit it.
// If any operations have been performed then an imbue() can
// be silently ignored by the stream (which is a pain to debug).
dataStream << "3.14";
float x;
dataStream >> x;
please explain purpose of usage of locale in c++? i have read documents but dont uderstand please help
The basic purpose is for localizing applications. For example, in the US a large number with a decimal separator would normally be written like: "1,234.56". Throughout much of Europe the same number would normally be written like: "1.234,56".
A locale allows you to isolate information about such formatting (and other things that vary between countries, languages, cultures, etc.) into one place. For example, I might use:
std::locale loc("");
std::cout.imbue(loc);
std::cout << 1234.56;
The unnamed locale ("") is special: it automatically picks out whatever locale the user has configured. When I run this code, the output I get is: "1,234.56". Somebody else could run exactly the same code, but if their environment was configured for some other convention, they might get "1.234,56" or "1 234,56", etc.
So, most of what the locale buys us (in this case) is keeping writing a number separate from formatting that number appropriately for a specific audience. Of course, a locale has a number of "facets", each of which covers a separate...well, facet of localization, such as formatting numbers, formatting currency, determining what's considered a lower-case or upper-case letter, etc.
I am working in a project using openframeworks and I've been having some problems lately when writing XMLs. I've traced down the problem to a sprintf:
It seems that under certain conditions an sprintf call may write commas instead of dots on float numbers (e.g. "2,56" instead of "2.56"). In my locale the floating numbers are represented with a ',' to separate the decimals from the units.
I am unable to reproduce this behaviour in a simple example, but I've solved the problem by stringifying the value using a stringstream.
I am curious about the circumstances of sprintf using a different localization. When sprintf uses ',' instead of '.' and how to control it?
The decimal separator is controlled by the LC_NUMERIC locale variable. Set setlocale for details. Setting it to the "C" locale will give you a period. You can find out the characters and settings for the current locale by looking in the (read-only) struct returned by localeconv.