mfc program using wrong decimal separator/language - c++

I have the comma as the decimal separator in my Windows regional settings (Portuguese language), and all the programs I develop use the comma when formatting strings or using atof.
However, this particular program that came into my hands insists on using the dot as the decimal separator, regardless of my regional settings.
I'm not calling setlocale anywhere in the program or any other language changing function AFAIK. In fact I put these lines of code at the very beginning of the InitInstance() function:
double var = atof("4,87");
TRACE("%f", fDecimal);
This yields 4.000000 in this program and 4,870000 in every other one.
I figure there must be some misplaced setting in the project's properties, but I don't know what it is. Anyone can help?

I'm not calling setlocale anywhere in the program or any other language changing function AFAIK.
That'd be why. C and C++ default to the "C" locale. Try setting the locale to "": setlocale(LC_ALL,"");

atof is relying on the C locale when it comes to determining the expected decimal separator. Thus, as another member mentioned, setlocale(LC_NUMERIC, ""); will set the C locale to the user locale (regional settings) for number-related functions. See MSDN page for more informations on the available flags and the locale names.
For those who don't want to change your C locale, you can use atof_l instead of the standard atol and provide it a locale structure created with _create_locale (what a name).
double _atof_l(const char *str, _locale_t locale);
There are a multitude of alternatives. For example you could use strtod (and its Windows strtod_l counterpart) which is IMHO a better option because it will let you know if something wrong happened.

Related

Why doesn't fstream support an em-dash in the file name?

I ported some code from C to C++ and have just found a problem with paths that contain em-dash, e.g. "C:\temp\test—1.dgn". A call to fstream::open() will fail, even though the path displays correctly in the Visual Studio 2005 debugger.
The weird thing is that the old code that used the C library fopen() function works fine. I thought I'd try my luck with the wfstream class instead, and then found that converting my C string using mbstowcs() loses the em-dash altogether, meaning it also fails.
I suppose this is a locale issue, but why isn't em-dash supported in the default locale? And why can't fstream handle an em-dash? I would have thought any byte character supported by the Windows filesystem would be supported by the file stream classes.
Given these limitations, what is the correct way to handle opening a file stream that may contain valid Windows file names that doesn't just fall over on certain characters?
Character em-dash is coded as U+2014 in UTF-16 (0x14 0x20 in little endian), 0xE2 0x80 0x94 in UTF-8, and with other codes or not code at all depending on the charset and code page used. Windows-1252 code page (very common in western European languages) has dash character 0x97 that we could consider equivalent.
Windows internally manages UTF-16 paths, so every time a function is called with its bad-called ANSI interface (functions ending with A) the path is converted using the current code page configured for the user to UTF-16.
On the other hand, RTL of C and C++ could be implemented accessing to the "ANSI" or "Unicode" (functions ending in W) interface. In the first case, the code page used to represent the string must be the same of the code page used for the system. In the second case, either we directly use utf-16 strings from the beginning, or the functions used to convert to utf-16 must be configured to use the same code page of the source string for the mapping.
Yes, it is a complex problem. And there are several wrong (or with problems) proposal to solve it:
Use wfstream instead fstream: wfstream do nothing with paths different to fstream. Nothing. It just means "manage the stream of bytes like wchar_t". (And it does that in a different way as one can expect, so making this class unuseless in the most of cases, but that is another history). To use the Unicode interface in Visual Studio implementation, it exists the overloaded constructor and open() function that accept const wchar_t*. Those function and constructor are overloaded for fstream and for wfstream. Use fstream with the right open().
mbstowcs(): The problem here is the locale (which contains the code page used in the string) to use. If you match the locale because the default locale matches the system one, cool. If not, you can try with mbstowcs_l(). But these functions are unsafe C functions, so you have to be careful with the buffer size. Anyway, this approach could makes sense only if the path to convert is got in runtime. If it is an static string known at compile time, better is to use it directly in your code.
L"C:\\temp\\test—1.dgn": The L prefix in the string doesn't means "converts this string to utf-16" (source code use to be in 8-bit characters), at least no in Visual Studio implementation. L prefix means "add a 0x00 byte after each character between the quotes". So —, equivalent to byte 0x97 in a narrow (ordinary) string, it become 0x97 0x00 when in a wide (prefixed with L) string, but not 0x14 0x20. Instead it is better to use its universal character name: L"C:\\temp\\test\\u20141.dgn"
One popular approach is to use always in your code either utf-8 or utf-16 and make the conversions only when strictly necessary. When converting a string with a specific code page to utf-8 or utf-16, tries to first convert to one of them (utf-8 or utf-16) identifying first the right code page. To do that conversion, uses the functions depending on where they come from. If you get your string from a XML file, well, the used code page is usually explicated there (and use to be utf-8). If it comes from a Windows control, use Windows API function, like MultiByteToWideChar. (CP_ACP or GetACP() uses to work as by default code page).
Uses always fstream (not wfstream) and its wide interfaces (open and constructor), not its narrow ones. (You can use again MultiByteToWideChar to convert from utf-8 to utf-16).
There are several articles and post with advices for this approach. One of them that I recommend you: http://www.nubaria.com/en/blog/?p=289.
This should work, provided that everything you do is in wide-char notation with wide-char functions. That is, use wfstream, but instead of using mbstowcs, use wide-string literals prefixed with the L char:
const wchar_t* filename = L"C:\temp\test—1.dgn";
Also, make sure your source file is saved as UTF-8 in Visual Studio. Otherwise the em-dash could get locale issues with the em-dash.
Posting this solution for others who run into this. The problem is that Windows assigns the "C" locale on startup by default, and em-dash (0x97) is defined in the "Windows-1252" codepage but is unmapped in the normal ASCII table used by the "C" locale. So the simple solution is to call:
setlocale ( LC_ALL, "" );
Prior to fstream::open. This sets the current codepage to the OS-defined codepage. In my program, the file I wanted to open with fstream was defined by the user, so it was in the system-defined codepage (Windows-1252).
So while fiddling with unicode and wide chars may be a solution to avoid unmapped characters, it wasn't the root of the problem. The actual problem was that the input string's codepage ("Windows-1252") didn't match the active codepage ("C") used by default in Windows programs.

C++, using environment variables for paths

is there any way to use environment variables in c++ for path to file?
Idea is to use them without expending so I don't need to use wchar for languages with unicode standard when I want to save/read file.
//EDIT
Little edit with more explanations.
So what I try to achieve is to read/write to file without worrying about characters in path. So I don't want to use wchar as path but it should work if path contains some wide chars.
There are functions getenv and GetEnvironmentVariable but they need to set proper language in Language for non-Unicode programs in windows settings (Constrol Panel -> Clock, Language, and Region -> Region and Language -> Administrative) which need some actions from users and this is something that I try to avoid.
There are functions getenv and GetEnvironmentVariable but they need to set proper language in Language for non-Unicode programs in windows settings
This is specifically a Windows problem.
On other platforms such as Linux, filepaths and environment variables are natively byte-based; you can access them using the standard C library functions that take byte-string paths like fopen() and getenv(). The pathnames may represent Unicode strings to the user (decoded using some encoding, almost always UTF-8 which can encode any character), but to the code they're just byte strings.
Windows, on the other hand, has filenames and environment variables that are natively strings of 16-bit (UTF-16) code units (which are nearly the same thing as Unicode character code points, but not quite, because that would be too easy... but that's a sadness for another time). You can call Win32 file-handling APIs like CreateFileW() and GetEnvironmentVariableW() using UTF-16 code unit strings (wchar_t, when compiled on Windows) and access any file names directly.
There are also old-school legacy byte-based Win32 functions like GetEnvironmentVariableA() (which is what GetEnvironmentVariable() points to if you are compiling a non-Unicode project). If you call those functions, Windows has to convert from the char byte strings you give it to UTF-16 strings, using some encoding.
That encoding is the ‘ANSI’ (‘A’) locale-specific default code page, which is what “Language for non-Unicode programs” sets.
Although that encoding can be changed by the user, it can't be set to UTF-8 or any other encoding that supports all characters, so even if you ask the user to change it, that still doesn't let you access all files. Thus the Win32 A APIs are always to be avoided.
The problem comes when you want to access files in a manner that works on both Windows and the other platforms. If you call the C standard library with byte strings, the Microsoft C runtime library adapts those calls to call the Win32 A byte-based APIs, which as above are irritatingly limited.
So your unattractive choices are:
use wchar_t and std::wstring strings in your code, using only Win32 APIs for interacting with filenames and environment variables, and accepting that your code will never run on other platforms, or;
use char and UTF-8-encoded std::string strings, and give up on your code accessing filenames and environment variables with non-ASCII characters in on Windows, or;
write a load of branching #ifdef code to switch between using C standard functions for filename and environment interaction, or using Win32 APIs with a bunch of UTF-8-char-to-wchar_t string conversions in between, so that code works across multiple platforms, or;
use a library that encapsulates (3) for you.
Notably there is boost::nowide (since Boost 1.73) which contains boost::nowide::getenv.
This isn't entirely Microsoft's fault: Windows NT was designed in their early days of Unicode before UTF-8 or the astral planes were invented, when it was thought that 16-bit code unit strings were a totally sensible way to store text, and not a lamentable disaster like we know it is now. It is, however, very sad that Windows has not been updated since then to treat UTF-8 as a first-class citizen and provide an easy way to write cross-platform applications.
The standard library gives you the function getenv. Here is an example:
#include <cstdlib>
int main()
{
char* pPath;
pPath = getenv("PATH");
if (pPath)
std::cout << "Path =" << pPath << std::endl;
return 0;
}

C++ : Which locale is considered by sprintf?

I am using two functions sprintf and snprintf for dealing with conversions of "double" to string,
In one of the case, the application which is running has a different locale than the Windows' locale. So, in such a scenario locale which is considered by sprintf is always of the application. Whereas, snprintf sometimes starts using Windows locale. As a consequence of this, decimal characters returned by both the methods are different and it causes a problem.
To provide further details,
I have a library in my project which builds a string from "double", this library uses snprintf to convert a double to string. Then I need to send this information to server which would understand "." (dot) only as a decimal symbol. Hence, I need to replace the local decimal character with a "." (dot). To find out the local decimal character (in order to replace it), I am using one of the libraries provided in my project which uses sprintf. Then I replace this character with a dot to get the final output.
Also, please note, sprintf is always considering locale of native application while snprintf sometimes considers locale of Windows.
As the problem is inconsistent, sorry for not providing a clear example.
So, what are the circumstances under which snprintf might behave differently?
Why am I getting such different behavior from these two methods?
How can I avoid it?
P.S. - I have to use these 2 methods, so please suggest a solution which would not require me to use any different methods.
Thanks.
The locale used by both sprintf and snprintf is not the Windows locale, but your application locale. As this locale is global to your application, any line of code in your program can change it.
In your case, the (not thread safe) solution may be to temporarily replace the locale for the snprintf call:
auto old = std::locale::global(std::locale::classic());
snprintf(...);
std::locale::global(old);
BTW, the "Windows locale" can be accessed via just std::locale("") , you don't need to know its exact name.

How to parse numbers like "3.14" with scanf when locale expects "3,14"

Let's say I have to read a file, containing a bunch of floating-point numbers. The numbers can be like 1e+10, 5, -0.15 etc., i.e., any generic floating-point number, using decimal points (this is fixed!). However, my code is a plugin for another application, and I have no control over what's the current locale. It may be Russian, for example, and the LC_NUMERIC rules there call for a decimal comma to be used. Thus, Pi is expected to be spelled as "3,1415...", and
sscanf("3.14", "%f", &x);
returns "1", and x contains "3.0", since it refuses to parse past the '.' in the string.
I need to ignore the locale for such number-parsing tasks.
How does one do that?
I could write a parseFloat function, but this seems like a waste.
I could also save the current locale, reset it temporarily to "C", read the file, and restore to the saved one. What are the performance implications of this? Could setlocale() be very slow on some OS/libc combo, what does it really do under the hood?
Yet another way would be to use iostreams, but again their performance isn't stellar.
My personal preference is to never use LC_NUMERIC, i.e. just call setlocale with other categories, or, after calling setlocale with LC_ALL, use setlocale(LC_NUMERIC, "C");. Otherwise, you're completely out of luck if you want to use the standard library for printing or parsing numbers in a standared form for interchange.
If you're lucky enough to be on a POSIX 2008 conforming system, you can use the uselocale and *_l family of functions to make the situation somewhat better. There are at least 2 basic approaches:
Leave the default locale unset (at least the troublesome parts like LC_NUMERIC; LC_CTYPE should probably always be set), and pass a locale_t object for the user's locale to the appropriate *_l functions only when you want to present things to the user in a way that meets their own cultural expectations; otherwise use the default C locale.
Have your code that needs to work with data for interchange keep around a locale_t object for the C locale, and either switch back and forth using uselocale when you need to work with data in a standard form for interchange, or use the appropriate *_l functions (but there is no scanf_l).
Note that implementing your own floating point parser is not easy and is probably not the right solution to the problem unless you're an expert in numerical computing. Getting it right is very hard.
POSIX.1-2008 specifies isalnum_l(), isalpha_l(), isblank_l(), iscntrl_l(), isdigit_l(), isgraph_l(), islower_l(), isprint_l(), ispunct_l(), isspace_l(), isupper_l(), and isxdigit_l().
Here's what I've done with this stuff in the past.
The goal is to use locale-dependent numeric converters with a C-locale numeric representation. The ideal, of course, would be to use non-locale-dependent converters, or not change the locale, etc., etc., but sometimes you just have to live with what you've got. Locale support is seriously broken in several ways and this is one of them.</rant>
First, extract the number as a string using something like the C grammar's simple pattern for numeric preprocessing tokens. For use with scanf, I do an even simpler one:
" %1[-+0-9.]%[-+0-9A-Za-z.]"
This could be simplified even more, depending on how what else you might expect in the input stream. The only thing you need to do is to not read beyond the end of the number; as long as you don't allow numbers to be followed immediately by letters, without intervening whitespace, the above will work fine.
Now, get the struct lconv (man 7 locale) representing the current locale using localeconv(3). The first entry in that struct is const char* decimal_point; replace all of the '.' characters in your string with that value. (You might also need to replace '+' and '-' characters, although most locales don't change them, and the sign fields in the lconv struct are documented as only applying to currency conversions.) Finally, feed the resulting string through strtod and see if it passes.
This is not a perfect algorithm, particularly since it's not always easy to know how locale-compliant a given library actually is, so you might want to do some autoconf stuff to configure it for the library you're actually compiling with.
I am not sure how to solve it in C.
But C++ streams (can) have a unique locale object.
std::stringstream dataStream;
dataStream.imbue(std::locale("C"));
// Note: You must imbue the stream before you do anything wit it.
// If any operations have been performed then an imbue() can
// be silently ignored by the stream (which is a pain to debug).
dataStream << "3.14";
float x;
dataStream >> x;

sprintf, commas and dots in C(++) (and localization?)

I am working in a project using openframeworks and I've been having some problems lately when writing XMLs. I've traced down the problem to a sprintf:
It seems that under certain conditions an sprintf call may write commas instead of dots on float numbers (e.g. "2,56" instead of "2.56"). In my locale the floating numbers are represented with a ',' to separate the decimals from the units.
I am unable to reproduce this behaviour in a simple example, but I've solved the problem by stringifying the value using a stringstream.
I am curious about the circumstances of sprintf using a different localization. When sprintf uses ',' instead of '.' and how to control it?
The decimal separator is controlled by the LC_NUMERIC locale variable. Set setlocale for details. Setting it to the "C" locale will give you a period. You can find out the characters and settings for the current locale by looking in the (read-only) struct returned by localeconv.