Should I use wsprintf() to print a double as a wide string? - c++

I am unable to print double value using wsprintf().
I tried sprintf() and it worked fine.
Syntax used for wsprintf() and sprintf() is as follows:
wsprintf(str,TEXT("Square is %lf "),iSquare); // Does not show value
sprintf(str," square is %lf",iSquare); // works okay
Am I making any mistakes while using wsprintf() ?

wsprintf doesn't support floating point. The mistake is using it at all.
If you want something like sprintf, but for wide characters/strings, you want swprintf instead.
Actually, since you're using the TEXT macro, you probably want _stprintf instead though: it'll shift from a narrow to wide implementation in sync with the same preprocessor macros as TEXT uses to decide whether the string will be narrow or wide. This whole approach, however, is largely a relic from the days when Microsoft still sold and supported versions of Windows based on both the 32-bit NT kernel, and on the 16-bit kernel. The 16-bit versions had only extremely minimal wide-character support, so Microsoft worked hard at allowing a single source code base to be compiled to use either narrow characters (targeting 16-bit kernels) or wide characters (to target the 32-bit kernels). The 16-bit kernels have been gone for long enough that almost nobody really has much reason to support them any more.
For what it's worth: wsprintf is almost entirely a historic relic. The w apparently stands for Windows. It was included as part of Windows way back when (back to the 16-bit days). It was written without support for floating point because at that time, Windows didn't use any floating point internally--this is part of why it has routines like MulDiv built-in, even though doing (roughly) the same with floating point is quite trivial.

The function wsprintf() does not support floating point parameters, try using swprintf() instead if you're working with floating point values.
More information about swprint can be found here

wsprintf does not support floating point. See its documentation - lf is not listed as a valid format code.
The swprintf function part of the Visual Studio standard library is what you want. It supports all of the format codes that sprintf does.

Presumably you're not compiled to UNICODE and TEXT is #defined to just a regular string.

Related

Not getting expected result from wsprintf [duplicate]

I am unable to print double value using wsprintf().
I tried sprintf() and it worked fine.
Syntax used for wsprintf() and sprintf() is as follows:
wsprintf(str,TEXT("Square is %lf "),iSquare); // Does not show value
sprintf(str," square is %lf",iSquare); // works okay
Am I making any mistakes while using wsprintf() ?
wsprintf doesn't support floating point. The mistake is using it at all.
If you want something like sprintf, but for wide characters/strings, you want swprintf instead.
Actually, since you're using the TEXT macro, you probably want _stprintf instead though: it'll shift from a narrow to wide implementation in sync with the same preprocessor macros as TEXT uses to decide whether the string will be narrow or wide. This whole approach, however, is largely a relic from the days when Microsoft still sold and supported versions of Windows based on both the 32-bit NT kernel, and on the 16-bit kernel. The 16-bit versions had only extremely minimal wide-character support, so Microsoft worked hard at allowing a single source code base to be compiled to use either narrow characters (targeting 16-bit kernels) or wide characters (to target the 32-bit kernels). The 16-bit kernels have been gone for long enough that almost nobody really has much reason to support them any more.
For what it's worth: wsprintf is almost entirely a historic relic. The w apparently stands for Windows. It was included as part of Windows way back when (back to the 16-bit days). It was written without support for floating point because at that time, Windows didn't use any floating point internally--this is part of why it has routines like MulDiv built-in, even though doing (roughly) the same with floating point is quite trivial.
The function wsprintf() does not support floating point parameters, try using swprintf() instead if you're working with floating point values.
More information about swprint can be found here
wsprintf does not support floating point. See its documentation - lf is not listed as a valid format code.
The swprintf function part of the Visual Studio standard library is what you want. It supports all of the format codes that sprintf does.
Presumably you're not compiled to UNICODE and TEXT is #defined to just a regular string.

Why did C++11 introduce the char16_t and char32_t types

Why did the C++11 Standard introduce the types char16_t and char32_t? Isn't 1 Byte enough to store characters? Is there any purpose for extending the size for characters type?
So after you've read Joel's article about Unicode, you should know about Unicode in general, but not in C++.
The problem with C++98 was that it didn't know about Unicode, really. (Except for the universal character reference escape syntax.) C++ just required the implementation to define a "basic source character set" (which is essentially meaningless, because it's about the encoding of the source file, and thus comes down to telling the compiler "this is it"), a "basic execution character set" (some set of characters representable by narrow strings, and an 8-bit (possibly multi-byte) encoding used to represent it at runtime, which has to include the most important characters in C++), and a "wide execution character set" (a superset of the basic set, and an encoding that uses wchar_t as its code unit to go along with it, with the requirement that a single wchar_t can represent any character from the set).
Nothing about actual values in these character sets.
So what happened?
Well, Microsoft switched to Unicode very early, back when it still had less than 2^16 characters. They implemented their entire NT operating system using UCS-2, which is the fixed-width 16-bit encoding of old Unicode versions. It made perfect sense for them to define their wide execution character set to be Unicode, make wchar_t 16 bits and use UCS-2 encoding. For the basic set, they chose "whatever the current ANSI codepage is", which made zero sense, but they pretty much inherited that. And because narrow string support was considered legacy, the Windows API is full of weird restrictions on that. We'll get to that.
Unix switched a little later, when it was already clear that 16 bits weren't enough. Faced with the choice of using 16-bit variable width encoding (UTF-16), a 32-bit fixed width encoding (UTF-32/UCS-4), or an 8-bit variable width encoding (UTF-8), they went with UTF-8, which also had the nice property that a lot of code written to handle ASCII and ISO-8859-* text didn't even need to be updated. For wchar_t, they chose 32 bits and UCS-4, so that they could represent every Unicode code point in a single unit.
Microsoft then upgraded everything they had to UTF-16 to handle the new Unicode characters (with some long-lingering bugs), and wchar_t remained 16 bits, because of backwards compatibility. Of course that meant that wchar_t could no longer represent every character from the wide set in a single unit, making the Microsoft compiler non-conformant, but nobody thought that was a big deal. It wasn't like some C++ standard APIs are totally reliant on that property. (Well, yes, codecvt is. Tough luck.)
But still, they thought UTF-16 was the way to go, and the narrow APIs remained the unloved stepchildren. UTF-8 didn't get supported. You cannot use UTF-8 with the narrow Windows API. You cannot make the Microsoft compiler use UTF-8 as the encoding for narrow string literals. They just didn't feel it was worth it.
The result: extreme pain when trying to write internationalized applications for both Unix and Windows. Unix plays well with UTF-8, Windows with UTF-16. It's ugly. And wchar_t has different meanings on different platforms.
char16_t and char32_t, as well as the new string literal prefixes u, U and u8, are an attempt to give the programmer reliable tools to work with encodings. Sure, you still have to either do weird compile-time switching for multi-platform code, or else decide on one encoding and do conversions in some wrapper layer, but at least you now have the right tools for the latter choice. Want to go the UTF-16 route? Use u and char16_t everywhere, converting to UTF-8 near system APIs as needed. Previously you couldn't do that at all in Unix environments. Want UTF-8? Use char and u8, converting near UTF-16 system APIs (and avoid the standard library I/O and string manipulation stuff, because Microsoft's version still doesn't support UTF-8). Previously you couldn't do that at all in Windows. And now you can even use UTF-32, converting everywhere, if you really want to. That, too, wasn't possible before in Windows.
So that's why these things are in C++11: to give you some tools to work around the horrible SNAFU around character encodings in cross-platform code in an at least somewhat predictable and reliable fashion.
1 byte has never been enough. There are hundreds of Ansi 8bit encodings in existence because people kept trying to stuff different languages into the confines of 8bit limitations, thus the same byte values have different meanings in different languages. Then Unicode came along to solve that problem, but it needed 16 bits to do it (UCS-2). Eventually, the needs of the world's languages exceeded 16bit, so UTF-8/16/32 encodings were created to extend the available values.
char16_t and char32_t (and their respective text prefixes), were created to handle UTF-16/32 in a uniform manner on all platforms. Originally, there was wchar_t, but it was created when Unicode was new, and its byte size was never standardized, even to this day. On some platforms, wchar_t is 16bit (UTF-16), whereas on other platforms it is 32bit (UTF-32) instead. This has caused plenty of interoperability issues over the years when exchanging Unicode data across platforms. char16_t and char32_t were finally introduced to have standardized sizes - 16bit and 32bit, respectively - and semantics on all platforms.
There are around 100000 characters (they call them code points) defined in Unicode. So in order to specify any one of them, 1 Byte is not enough. 1 Byte is just enough to enumerate the first 256 of them, which happen to be identical to ISO-8859-1. Two bytes are enough for the most important subset of Unicode, the so-called Basic Multilingual Plane, and many applications, like e.g. Java, settle on 16 bit characters for Unicode. If you want truly every single Unicode character, you have to go beyond that and allow 4 Bytes / 32 bit. And because different people have different needs, C++ allows different sizes here. And UTF-8 is a variable-size encoding rarely used within programs, because different characters have different length. To some extend this also applies to UTF-16, but in most cases you can safely ignore this issue with char16_t.

What are the disadvantages to not using Unicode in Windows?

What are the disadvantages to not using Unicode on Windows?
By Unicode, I mean WCHAR and the wide API functions. (CreateWindowW, MessageBoxW, and so on)
What problems could I run into by not using this?
Your code won't be able to deal correctly with characters outside the currently selected codepage when dealing with system APIs1.
Typical problems include unsupported characters being translated to question marks, inability to process text with special characters, in particular files with "strange characters" in their names/paths.
Also, several newer APIs are present only in the "wide" version.
Finally, each API call involving text will be marginally slower, since the "A" versions of APIs are normally just thin wrappers around the "W" APIs, that convert the parameters to UTF-16 on the fly - so, you have some overhead in respect to a "plain" W call.
Nothing stops you to work in a narrow-characters Unicode encoding (=>UTF-8) inside your application, but Windows "A" APIs don't speak UTF-8, so you'd have to convert to UTF-16 and call the W versions anyway.
I believe the gist of the original question was "should I compile all my Windows apps with "#define _UNICODE", and what's the down side if I don't?
My original reply was "Yeah, you should. We've moved 8-bit ASCII, and '_UNICODE' is a reasonable default for any modern Windows code."
For Windows, I still believe that's reasonably good advice. But I've deleted my original reply. Because I didn't realize until I re-read my own links how much "UTF-16 is quite a sad state of affairs" (as Matteo Italia eloquently put it).
For example:
http://utf8everywhere.org/
Microsoft has ... mistakenly used ‘Unicode’ and ‘widechar’ as
synonyms for ‘UCS-2’ and ‘UTF-16’. Furthermore, since UTF-8 cannot be
set as the encoding for narrow string WinAPI, one must compile her
code with _UNICODE rather than _MBCS. Windows C++ programmers are
educated that Unicode must be done with ‘widechars’. As a result of
this mess, they are now among the most confused ones about what is the
right thing to do about text.
I heartily recommend these three links:
The Absolute Minimum Every Software Developer Should Know about Unicode
Should UTF-16 Be Considered Harmful?
UTF-8 Everywhere
IMHO...

C++ and UTF8 - Why not just replace ASCII?

In my application I have to constantly convert string between std::string and std::wstring due different APIs (boost, win32, ffmpeg etc..). Especially with ffmpeg the strings end up utf8->utf16->utf8->utf16, just to open a file.
Since UTF8 is backwards compatible with ASCII I thought that I consistently store all my strings UTF-8 std::string and only convert to std::wstring when I have to call certain unusual functions.
This worked kind of well, I implemented to_lower, to_upper, iequals for utf8. However then I met several dead-ends std::regex, and regular string comparisons. To make this usable I would need to implement a custom ustring class based on std::string with re-implementation of all corresponding algorithms (including regex).
Basically my conclusion is that utf8 is not very good for general usage. And the current std::string/std::wstring is mess.
However, my question is why the default std::string and "" are not simply changed to use UTF8? Especially as UTF8 is backward compatible? Is there possibly some compiler flag which can do this? Of course the stl implemention would need to be automatically adapted.
I've looked at ICU, but it is not very compatible with apis assuming basic_string, e.g. no begin/end/c_str etc...
The main issue is the conflation of in-memory representation and encoding.
None of the Unicode encoding is really amenable to text processing. Users will in general care about graphemes (what's on the screen) while the encoding is defined in terms of code points... and some graphemes are composed of several code points.
As such, when one asks: what is the 5th character of "Hélène" (French first name) the question is quite confusing:
In terms of graphemes, the answer is n.
In terms of code points... it depends on the representation of é and è (they can be represented either as a single code point or as a pair using diacritics...)
Depending on the source of the question (a end-user in front of her screen or an encoding routine) the response is completely different.
Therefore, I think that the real question is Why are we speaking about encodings here?
Today it does not make sense, and we would need two "views": Graphemes and Code Points.
Unfortunately the std::string and std::wstring interfaces were inherited from a time where people thought that ASCII was sufficient, and the progress made didn't really solve the issue.
I don't even understand why the in-memory representation should be specified, it is an implementation detail. All a user should want is:
to be able to read/write in UTF-* and ASCII
to be able to work on graphemes
to be able to edit a grapheme (to manage the diacritics)
... who cares how it is represented? I thought that good software was built on encapsulation?
Well, C cares, and we want interoperability... so I guess it will be fixed when C is.
You cannot, the primary reason for this is named Microsoft. They decided not to support Unicode as UTF-8 so the support for UTF-8 under Windows is minimal.
Under windows you cannot use UTF-8 as a codepage, but you can convert from or to UTF-8.
There are two snags to using UTF8 on windows.
You cannot tell how many bytes a string will occupy - it depends on which characters are present, since some characters take 1 byte, some take 2, some take 3, and some take 4.
The windows API uses UTF16. Since most windows programs make numerous calls to the windows API, there is quite an overhead converting back and forward. ( Note that you can do a "non-unicode' build, which looks like it uses a utf8 windows api, but all that is happening is that the conversion back and forward on each call is hidden )
The big snag with UTF16 is that the binary representation of a string depends on the byte order in a word on the particular hardware the program is running on. This does not matter in most cases, except when strings are transmitted between computers where you cannot be sure that the other computer uses the same byte order.
So what to do? I uses UTF16 everywhere 'inside' all my programs. When string data has to be stored in a file, or transmitted from a socket, I first convert it to UTF8.
This means that 95% of my code runs simply and most efficiently, and all the messy conversions between UTF8 and UTF16 can be isolated to routines responsible for I/O.

C++: Making my project support unicode

My C++ project currently is about 16K lines of code big, and I admit having completely not thought about unicode support in the first place.
All I have done was a custom typedef for std::string as String and jump into coding.
I have never really worked with unicode myself in programs I wrote.
How hard is it to switch my project to unicode now? Is it even a good idea?
Can I just switch to std::wchar without any major problems?
Probably the most important part of making an application unicode aware is to track the encoding of your strings and to make sure that your public interfaces are well specified and easy to use with the encodings that you wish to use.
Switching to a wider character (in c++ wchar_t) is not necessarily the correct solution. In fact, I would say it is usually not the simplest solution. Some applications can get away with specifying that all strings and interfaces use UTF-8 and not need to change at all. std::string can perfectly well be used for UTF-8 encoded strings.
However, if you need to interpret the characters in a string or interface with non-UTF-8 interfaces then you will have to put more work in but without knowing more about your application it is impossible to recommend a single best approach.
There are some issues with using std::wstring. If your application will be storing text in Unicode, and it will be running on different platforms, you may run into trouble. std::wstring relies on wchar_t, which is compiler dependent. In Microsoft Visual C++, this type is 16 bits wide, and will thus only support UTF-16 encodings. The GNU C++ compiler specifes this type to be 32 bits wide, and will thus only support UTF-32 encodings. If you then store the text in a file from one system (say Windows/VC++), and then read the file from another system (Linux/GCC), you will have to prepare for this (in this case convert from UTF-16 to UTF-32).
Can I just switch to [std::wchar_t] without any major problems?
No, it's not that simple.
The encoding of a wchar_t string is platform-dependent. Windows uses UTF-16. Linux usually uses UTF-32. (C++0x will mitigate this difference by introducing separate char16_t and char32_t types.)
If you need to support Unix-like systems, you don't have all the UTF-16 functions that Windows has, so you'd need to write your own _wfopen, etc.
Do you use any third-party libraries? Do they support wchar_t?
Although wide characters are commonly-used for an in-memory representation, on-disk and on-the-Web formats are much more likely to be UTF-8 (or other char-based encoding) than UTF-16/32. You'd have to convert these.
You can't just search-and-replace char with wchar_t because C++ confounds "character" and "byte", and you have to determine which chars are characters and which chars are bytes.