How to print uint32_t variables value via wprintf function? - c++

It is a well-known fact that to print values of variables that type is one of fixed width integer types (like uint32_t) you need to include cinttypes (in C++) or inttypes.h (in C) header file and to use format specifiers macros like PRIu32. But how to do the same thing when wprintf function is used? Such macro should expand as a string literal with L prefix in that case.

If this will work or not actually depends on which standard of C the compiler is using.
From this string literal reference
Only two narrow or two wide string literals may be concatenated.
(until C99)
and
If one literal is unprefixed, the resulting string literal has the width/encoding specified by the prefixed literal. If the two string literals have different encoding prefixes, concatenation is implementation-defined. (since C99)
[Emphasis mine]
So if you're using an old compiler or one that doesn't support the C99 standard (or later) it's not possible. Besides fixed-width integer types was standardized in C99 so the macros don't really exist for such old compilers, making the issue moot.
For more modern compilers which support C99 and later, it's a non-issue since the string-literal concatenation will work and the compiler will turn the non-prefixed string into a wide-character string, so doing e.g.
wprintf(L"Value = %" PRIu32 "\n", uint32_t_value);
will work fine.
If you have a pre-C99 compiler, but still have the macros and fixed-width integer types, you can use function-like macros to prepend the L prefix to the string literals. Something like
#define LL(s) L ## s
#define L(s) LL(s)
...
wprintf(L"Value = %" L(PRIu32) L"\n", uint32_t_value);

Not sure where the problem is, but here (VS 2015) both
wprintf(L"AA %" PRIu32 L" BB", 123);
and
printf("AA %" PRIu32 " BB", 123);
compile correctly and give following output:
AA 123 BB

Even if your compiler does not support concatenation of differently-prefixed literals, you can always widen a narrow one:
#define WIDE(X) WIDE2(X)
#define WIDE2(X) L##X
wprintf(L"%" WIDE(PRIu32), foo);
Demo

A (weaker) alternative to using the macros from <inttypes.h> is to convert/cast the the fixed width type to an equivalent or larger standard type.
wprintf(L"%lu\n", 0ul + some_uint32_t_value);
// or
wprintf(L"%lu\n", (unsigned long) some_uint32_t_value);

Related

__int64 to CString returns wrong values - C++ MFC

I want to convert a __int64 variable into a CString. The code is exactly this
__int64 i64TotalGB;
CString totalSpace;
i64TotalGB = 150;
printf("disk space: %I64d GB\n", i64TotalGB);
totalSpace.Format(_T("%I64d", i64TotalGB));
printf("totalSpace contains: %s", totalSpace);
the first printf prints
"disk space: 150GB"
which it's correct, but the second printf prints randomly high numbers like
"totalSpace contains: 298070026817519929"
I also tried to use a INT64 variable instead of a __int64 variable, but the result is the same. What can be the cause of this?
Here:
totalSpace.Format(_T("%I64d", i64TotalGB));
you're passing i64TotalGB as an argument to the _T() macro instead of passing it as the second argument to Format().
Try this:
totalSpace.Format(_T("%I64d"), i64TotalGB);
Having said that, thanks to MS's mess (ha) around character encodings, using _T here is not the right thing, as CString is composed of a TCHAR and not _TCHAR. So taking that into account, might as well use TEXT() instead of T(), as it is dependent on UNICODE and not _UNICODE:
totalSpace.Format(TEXT("%I64d"), i64TotalGB);
In addition, this line is wrong as it tries to pass an ATL CString as a char* (a.k.a. C-style string):
printf("totalSpace contains: %s", totalSpace);
For which the compiler gives this warning:
warning C4477: 'printf' : format string '%s' requires an argument of type 'char *', but variadic argument 1 has type 'ATL::CString'
While the structure of CString is practically compatible with passing it like you have, this is still formally undefined behavior. Use CString::GetString() to safeguard against it:
printf("totalSpace contains: %ls", totalSpace.GetString());
Note the %ls as under my configuration totalSpace.GetString() returned a const wchar_t*. However, as "printf does not currently support output into a UNICODE stream.", the correct version for this line, that will support characters outside your current code page, is a call to wprintf() in the following manner:
wprintf("totalSpace contains: %s", totalSpace.GetString());
Having said ALL that, here's a general advice, regardless of the direct problem behind the question. The far better practice nowadays is slightly different altogether, and I quote from the respectable answer by #IInspectable, saying that "generic-text mappings were relevant 2 decades ago".
What's the alternative? In the absence of good enough reason, try sticking explicitly to CStringW (A Unicode character type string with CRT support). Prefer the L character literal over the archaic data/text mappings that depend on whether the constant _UNICODE or _MBCS has been defined in your program. Conversely, the better practice would be using the wide-character versions of all API and language library calls, such as wprintf() instead of printf().
The bug is a result of numerous issues with the code, specifically these 2:
totalSpace.Format(_T("%I64d", i64TotalGB));
This uses the _T macro in a way it's not meant to be used. It should wrap a single character string literal. In the code it wraps a second argument.
printf("totalSpace contains: %s", totalSpace);
This assumes an ANSI-encoded character string, but passes a CString object, that can store both ANSI as well as Unicode encoded strings.
The recommended course of action is to drop generic-text mappings altogether, in favor of using Unicode (that's UTF-16LE on Windows) throughout1. The generic-text mappings were relevant 2 decades ago, to ease porting of Win9x code to the Windows NT based products.
To do this
Choose CStringW over CString.
Drop all occurences of _T, TEXT, and _TEXT, and replace them with an L prefix.
Use the wide-character version of the Windows API, CRT, and C++ Standard Library.
The fixed code looks like this:
__int64 i64TotalGB;
CStringW totalSpace; // Use wide-character string
i64TotalGB = 150;
printf("disk space: %I64d GB\n", i64TotalGB);
totalSpace.Format(L"%I64d", i64TotalGB); // Use wide-character string literal
wprintf(L"totalSpace contains: %s", totalSpace.GetString()); // Use wide-character library
On an unrelated note, while it is technically safe to pass a CString object in place of a character pointer in a variable argument list, this is an implementation detail, and not formally documented to work. Call CString::GetString() if you care about correct code.
1 Unless there is a justifiable reason to use a character encoding that uses char as its underlying type (like UTF-8 or ANSI). In that case you should still be explicit about it by using CStringA.
try this
totalSpace.Format(_T("%I64d"), i64TotalGB);

Using macros in printf function in VS2013 vs VS2017

I have defined this macro in my source code
#define UINT_08X_FORMAT "%08X"
I need to use the above in printf like this:
printf("Test - "UINT_08X_FORMAT"", 50);
It compiles and works fine in VS2013 where as in VS2017, it throws the following compile error.
invalid literal suffix 'UINT_08X_FORMAT'; literal operator or literal
operator template 'operator ""UINT32_FORMAT' not found
How to use the macro in printf.
Note: I dont want to change the macro definition as it works fine with
VS2013. I need a common solution which will work on both VS2013 and
VS2017.
C++11 added support for user defined literals (UDL), which are triggered by adding a suffix to some other literal (in this case a string literal). You can overcome it by adding spaces around your macro name to force the newer C++ compiler to treat it as a separate token instead of a UDL suffix:
printf("Test - " UINT_08X_FORMAT "", 50);
See this note from http://en.cppreference.com/w/cpp/language/user_literal:
Since the introduction of user-defined literals, the code that uses
format macro constants for fixed-width integer types with no space
after the preceding string literal became invalid:
std::printf("%"PRId64"\n",INT64_MIN); has to be replaced by
std::printf("%" PRId64"\n",INT64_MIN);
Due to maximal munch, user-defined integer and floating point literals
ending in p, P, (since C++17) e and E, when followed by the operators
+ or -, must be separated from the operator with whitespace in the source

How to convert uint16_t to a wide string (std::wstring)

My question is related to this older question
Format specifiers for uint8_t, uint16_t, ...?
Just to recap the original question was related to how to use the specifiers for uint8_t, uint16_t, uint32_t and uint64_t with a scanf?
The answer to the question was as follows:
sscanf (line, "Value of integer: %" SCNd32 "\n", &my_integer);
But does anyone know how to do this but resulting in a wide string?
ie
std::wstring line;
swscanf (line.c_str(), L"Value of integer: %" SCNd16 L"\n", &my_integer);
The sbove line gives me a concatenating error. I believe because the SCNd16 is just not intended for a widestring?
Currently my solution is to create the std::string in the original answer and then convert it to a wide string
sscanf_s(line.c_str(), "%" SCNd16 "\n", &sixteenBitInteger)
// code here to check for EOF and EINVAL
//then I convert it
typedef std::codecvt_utf8<wchar_t> ConverterType;
std::wstring_convert<ConverterType, wchar_t> converter;
std::wstring convertedString = converter.from_bytes(line);
but it's rather ugly and I am sure there must be a more polished way to do this conversion?
If it helps to understand my use, I am using the uint16_t type to store the port number for a web server but I want to be able to convert it to a wide string as that is the expected display type. I am also using C++11 if that changes the answer at all and I do have access to the boost libraries although I would rather not use them.
This is a VS2013 compiler bug. Since it has been closed as "fixed", maybe it'll work in VS2015 (don't have the preview installed to give it a try).
The line of code you have
swscanf (line.c_str(), L"Value of integer: %" SCNd16 L"\n", &my_integer);
is well formed, because even if SCNd16 expands to a string literal that lacks the L prefix, the standard says that if out of two adjacent string literals, one lacks an encoding prefix, it is treated as if it has the same encoding prefix as the other.
§2.14.5/14 [lex.string]
In translation phase 6 (2.2), adjacent string literals are concatenated. If both string literals have the same encoding-prefix, the resulting concatenated string literal has that encoding-prefix. If one string literal has no encoding-prefix, it is treated as a string literal of the same encoding-prefix as the other operand. ...
Typically, you can use the preprocessor to widen strings by using token concatenation. For instance, defining a set of macros like this
#define WIDEN_(x) L##x
#define WIDEN(x) WIDEN_(x)
and converting the offending line of code to
swscanf (line.c_str(), L"Value of integer: %" WIDEN(SCNd16) L"\n", &my_integer);
would fix the problem, but it doesn't on VS2013 because of an implementation detail. The SCNd16 macro actually expands into two separate string literals - "h" "d". So the above macro widens the first literal, but not the second, and you run into the same (bogus) error.
Your options are to either hardcode the string "hd" or go with the runtime conversion solution you've shown.
A pure guess as I don't have time at the moment to try it.
Can you use the preprocesser to token paste the wide-string L to the front of the expanded SCNd16?

Why is there no ASCII or UTF-8 character literal in C11 or C++11?

Why is there no UTF-8 character literal in C11 or C++11 even though there are UTF-8 string literals? I understand that, generally-speaking, a character literal represents a single ASCII character which is identical to a single-octet UTF-8 code point, but neither C nor C++ says the encoding has to be ASCII.
Basically, if I read the standard right, there's no guarantee that '0' will represent the integer 0x30, yet u8"0" must represent the char sequence 0x30 0x00.
EDIT:
I'm aware not every UTF-8 code point would fit in a char. Such a literal would only be useful for single-octet code points (aka, ASCII), so I guess calling it an "ASCII character literal" would be more fitting, so the question still stands. I just chose to frame the question with UTF-8 because there are UTF-8 string literals. The only way I can imagine portably guaranteeing ASCII values would be to write a constant for each character, which wouldn't be so bad considering there are only 128, but still...
It is perfectly acceptable to write non-portable C code, and this is one of many good reasons to do so. Feel free to assume that your system uses ASCII or some superset thereof, and warn your users that they shouldn't try to run your program on an EBCDIC system.
If you are feeling very generous, you can encode a check. The gperf program is known to generate code that includes such a check.
_Static_assert('0' == 48, "must be ASCII-compatible");
Or, for pre-C11 compilers,
extern int must_be_ascii_compatible['0' == 48 ? 1 : -1];
If you are on C11, you can use the u or U prefix on character constants, but not the u8 prefix...
/* This is useless, doesn't do what you want... */
_Static_assert(0, "this code is broken everywhere");
if (c == '々') ...
/* This works as long as wchar_t is UTF-16 or UTF-32 or UCS-2... */
/* Note: you shouldn't be using wchar_t, though... */
_Static_assert(__STDC_ISO_10646__, "wchar_t must be some form of Unicode");
if (c == L'々') ...
/* This works as long as char16_t is UTF-16 or UCS-2... */
_Static_assert(__STDC_UTF_16__, "char16_t must be UTF-16");
if (c == u'々') ...
/* This works as long as char32_t is UTF-32... */
_Static_assert(__STDC_UTF_32__, "char32_t must be UTF-32");
if (c == U'々') ...
There are some projects that are written in very portable C and have been ported to non-ASCII systems (example). This required a non-trivial amount of porting effort, and there's no real reason to make the effort unless you know you want to run your code on EBCDIC systems.
On standards: The people writing the C standard have to contend with every possible C implementation, including some downright bizarre ones. There are known systems where sizeof(char) == sizeof(long), CHAR_BIT != 8, integral types have trap representations, sizeof(void *) != sizeof(int *), sizeof(void *) != sizeof(void (*)()), va_list are heap-allocated, etc. It's a nightmare.
Don't beat yourself up trying to write code that will run on systems you've never even heard of, and don't search to hard for guarantees in the C standard.
For example, as far as the C standard is concerned, the following is a valid implementation of malloc:
void *malloc(void) { return NULL; }
Note that while u8"..." constants are guaranteed to be UTF-8, u"..." and U"..." have no guarantees except that the encoding is 16-bits and 32-bits per character, respectively, and the actual encoding must be documented by the implementation.
Summary: Safe to assume ASCII compatibility in 2012.
UTF-8 character literal would have to have variable length - for many most of them, it's not possible to store single character in char or wchar, what type should it have, then? As we don't have variable length types in C, nor in C++, except for arrays of fixed size types, the only reasonable type for it would be const char * - and C strings are required to be null-terminated, so it wouldn't change anything.
As for the edit:
Quote from the C++11 standard:
The glyphs for the members of the basic source character set are intended to identify characters from the subset of ISO/IEC 10646 which corresponds to the ASCII character set. However, because the mapping from source file characters to the source character set (described in translation phase 1) is specified as implementation-defined, an implementation is required to document how the basic source characters are represented in source files.
(footnote at 2.3.1).
I think that it's good reason for not guaranteeing it. Although, as you noted in comment here, for most (or every) mainstream compiler, the ASCII-ness of character literals is implementation guaranteed.
For C++ this has been addressed by Evolution Working Group issue 119: Adding u8 character literals whose Motivation section says:
We have five encoding-prefixes for string-literals (none, L, u8, u, U)
but only four for character literals -- the missing one is u8. If the
narrow execution character set is not ASCII, u8 character literals
would provide a way to write character literals with guaranteed ASCII
encoding (the single-code-unit u8 encodings are exactly ASCII). Adding
support for these literals would add a useful feature and make the
language slightly more consistent.
EWG discussed the idea of adding u8 character literals in Rapperswil and accepted the change. This paper provides wording for that
extension.
This was incorporated into the working draft using the wording from N4267: Adding u8 character literals and we can find the wording in at this time latest draft standard N4527 and note as section 2.14.3 say they are limited to code points that fit into a single UTF-8 code unit:
A character literal that begins with u8, such as u8'w', is a character
literal of type char, known as a UTF-8 character literal. The value of
a UTF-8 character literal is equal to its ISO10646 code point value,
provided that the code point value is representable with a single
UTF-8 code unit (that is, provided it is a US-ASCII character). A
UTF-8 character literal containing multiple c-chars is ill-formed.
If you don't trust that your compiler will treat '0' as ASCII character 0x30, then you could use static_cast<char>(0x30) instead.
As you are aware, UTF-8-encoded characters need several octets, thus chars, so the natural type for them is char[], which is indeed the type for a u8-prefixed string literal! So C11 is right on track here, just that it sticks to its syntax conventions using " for a string, needing to be used as an array of char, rather than your implied semantic-based proposal to use ' instead.
About "0" versus u8"0", you are reading right, only the latter is guaranteed to be identical to { 0x30, 0 }, even on EBCDIC systems. By the way, the very fact the former is not can be handled conveniently in your code, if you pay attention to the __STDC_MB_MIGHT_NEQ_WC__ predefined identifier.

Implementation of string literal concatenation in C and C++

AFAIK, this question applies equally to C and C++
Step 6 of the "translation phases" specified in the C standard (5.1.1.2 in the draft C99 standard) states that adjacent string literals have to be concatenated into a single literal. I.e.
printf("helloworld.c" ": %d: Hello "
"world\n", 10);
Is equivalent (syntactically) to:
printf("helloworld.c: %d: Hello world\n", 10);
However, the standard doesn't seem to specify which part of the compiler has to handle this - should it be the preprocessor (cpp) or the compiler itself. Some online research tells me that this function is generally expected to be performed by the preprocessor (source #1, source #2, and there are more), which makes sense.
However, running cpp in Linux shows that cpp doesn't do it:
eliben#eliben-desktop:~/test$ cat cpptest.c
int a = 5;
"string 1" "string 2"
"string 3"
eliben#eliben-desktop:~/test$ cpp cpptest.c
# 1 "cpptest.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "cpptest.c"
int a = 5;
"string 1" "string 2"
"string 3"
So, my question is: where should this feature of the language be handled, in the preprocessor or the compiler itself?
Perhaps there's no single good answer. Heuristic answers based on experience, known compilers, and general good engineering practice will be appreciated.
P.S. If you're wondering why I care about this... I'm trying to figure out whether my Python based C parser should handle string literal concatenation (which it doesn't do, at the moment), or leave it to cpp which it assumes runs before it.
The standard doesn't specify a preprocessor vs. a compiler, it just specifies the phases of translation you already noted. Traditionally, phases 1 through 4 were in the preprocessor, Phases 5 though 7 in the compiler, and phase 8 the linker -- but none of that is required by the standard.
Unless the preprocessor is specified to handle this, it's safe to assume it's the compiler's job.
Edit:
Your "I.e." link at the beginning of the post answers the question:
Adjacent string literals are concatenated at compile time; this allows long strings to be split over multiple lines, and also allows string literals resulting from C preprocessor defines and macros to be appended to strings at compile time...
In the ANSI C standard, this detail is covered in section 5.1.1.2, item (6):
5.1.1.2 Translation phases
...
4. Preprocessing directives are executed and macro invocations are expanded. ...
5. Each source character set member and escape sequence in character constants and string literals is converted to a member of the execution character set.
6. Adjacent character string literal tokens are concatenated and adjacent wide string literal tokens are concatenated.
The standard does not define that the implementation must use a pre-processor and compiler, per se.
Step 4 is clearly a preprocessor responsibility.
Step 5 requires that the "execution character set" be known. This information is also required by the compiler. It is easier to port the compiler to a new platform if the preprocessor does not contain platform dependendencies, so the tendency is to implement step 5, and thus step 6, in the compiler.
I would handle it in the scanning token part of the parser, so in the compiler. It seems more logical. The preprocessor has not to know the "structure" of the language, and in fact it ignores it usually so that macros can generate uncompilable code. It handles nothing more than what it is entitled to handle by directives that are specifically addressed to it (# ...), and the "consequences" of them (like those of a #define x h, which would make the preprocessor change a lot of x into h)
There are tricky rules for how string literal concatenation interacts with escape sequences.
Suppose you have
const char x1[] = "a\15" "4";
const char y1[] = "a\154";
const char x2[] = "a\r4";
const char y2[] = "al";
then x1 and x2 must wind up equal according to strcmp, and the same for y1 and y2. (This is what Heath is getting at in quoting the translation steps - escape conversion happens before string constant concatenation.) There's also a requirement that if any of the string constants in a concatenation group has an L or U prefix, you get a wide or Unicode string. Put it all together and it winds up being significantly more convenient to do this work as part of the "compiler" rather than the "preprocessor."