Using macros in printf function in VS2013 vs VS2017 - c++

I have defined this macro in my source code
#define UINT_08X_FORMAT "%08X"
I need to use the above in printf like this:
printf("Test - "UINT_08X_FORMAT"", 50);
It compiles and works fine in VS2013 where as in VS2017, it throws the following compile error.
invalid literal suffix 'UINT_08X_FORMAT'; literal operator or literal
operator template 'operator ""UINT32_FORMAT' not found
How to use the macro in printf.
Note: I dont want to change the macro definition as it works fine with
VS2013. I need a common solution which will work on both VS2013 and
VS2017.

C++11 added support for user defined literals (UDL), which are triggered by adding a suffix to some other literal (in this case a string literal). You can overcome it by adding spaces around your macro name to force the newer C++ compiler to treat it as a separate token instead of a UDL suffix:
printf("Test - " UINT_08X_FORMAT "", 50);
See this note from http://en.cppreference.com/w/cpp/language/user_literal:
Since the introduction of user-defined literals, the code that uses
format macro constants for fixed-width integer types with no space
after the preceding string literal became invalid:
std::printf("%"PRId64"\n",INT64_MIN); has to be replaced by
std::printf("%" PRId64"\n",INT64_MIN);
Due to maximal munch, user-defined integer and floating point literals
ending in p, P, (since C++17) e and E, when followed by the operators
+ or -, must be separated from the operator with whitespace in the source

Related

define string at compiler options

Using Tornado 2.2.1 GNU
at C/C++ compiler options I'm trying to define string as follow:
-DHELLO="Hello" and it doesn't work (it also failed for -DHELLO=\"Hello\" and for -DHELLO=\\"Hello\\" which works in other platforms)
define value -DVALUE=12 works without issue.
does anybody know to proper way to define string in Tornado?
The problem with such a macro is, that it normally isn't a string (in the C/C++ sense), just a preprocessor symbol. With numbers it works indeed, because preprocessor number can be used in C/C++ as is, but with string symbols, if you want to convert them to C/C++ strings (besides adding the escaped quotes) you need to "stringize" them.
So, this should work (without extra escaped quotes):
#define _STRINGIZE(x) #x
#define STRINGIZE(x) _STRINGIZE(x)
string s = STRINGIZE(HELLO)
(note the double expansion to get the value of the macro stringized, i.e. "Hello", instead of the macro name itself, i.e. "HELLO")

How to print uint32_t variables value via wprintf function?

It is a well-known fact that to print values of variables that type is one of fixed width integer types (like uint32_t) you need to include cinttypes (in C++) or inttypes.h (in C) header file and to use format specifiers macros like PRIu32. But how to do the same thing when wprintf function is used? Such macro should expand as a string literal with L prefix in that case.
If this will work or not actually depends on which standard of C the compiler is using.
From this string literal reference
Only two narrow or two wide string literals may be concatenated.
(until C99)
and
If one literal is unprefixed, the resulting string literal has the width/encoding specified by the prefixed literal. If the two string literals have different encoding prefixes, concatenation is implementation-defined. (since C99)
[Emphasis mine]
So if you're using an old compiler or one that doesn't support the C99 standard (or later) it's not possible. Besides fixed-width integer types was standardized in C99 so the macros don't really exist for such old compilers, making the issue moot.
For more modern compilers which support C99 and later, it's a non-issue since the string-literal concatenation will work and the compiler will turn the non-prefixed string into a wide-character string, so doing e.g.
wprintf(L"Value = %" PRIu32 "\n", uint32_t_value);
will work fine.
If you have a pre-C99 compiler, but still have the macros and fixed-width integer types, you can use function-like macros to prepend the L prefix to the string literals. Something like
#define LL(s) L ## s
#define L(s) LL(s)
...
wprintf(L"Value = %" L(PRIu32) L"\n", uint32_t_value);
Not sure where the problem is, but here (VS 2015) both
wprintf(L"AA %" PRIu32 L" BB", 123);
and
printf("AA %" PRIu32 " BB", 123);
compile correctly and give following output:
AA 123 BB
Even if your compiler does not support concatenation of differently-prefixed literals, you can always widen a narrow one:
#define WIDE(X) WIDE2(X)
#define WIDE2(X) L##X
wprintf(L"%" WIDE(PRIu32), foo);
Demo
A (weaker) alternative to using the macros from <inttypes.h> is to convert/cast the the fixed width type to an equivalent or larger standard type.
wprintf(L"%lu\n", 0ul + some_uint32_t_value);
// or
wprintf(L"%lu\n", (unsigned long) some_uint32_t_value);

What are the definitions for valid and invalid pp-tokens?

I want to extensively use the ##-operator and enum magic to handle a huge bunch of similar access-operations, error handling and data flow.
If applying the ## and # preprocessor operators results in an invalid pp-token, the behavior is undefined in C.
The order of preprocessor operation in general is not defined (*) in C90 (see The token pasting operator). Now in some cases it happens (said so in different sources, including the MISRA Committee, and the referenced page) that the order of multiple ##/#-Operators influences the occurrence of undefined behavior. But I have a really hard time to understand the examples of these sources and pin down the common rule.
So my questions are:
What are the rules for valid pp-tokens?
Are there difference between the different C and C++ Standards?
My current problem: Is the following legal with all 2 operator orders?(**)
#define test(A) test_## A ## _THING
int test(0001) = 2;
Comments:
(*) I don't use "is undefined" because this has nothing to do with undefined behavior yet IMHO, but rather unspecified behavior. More than one ## or # operator being applied do not in general render the program to be erroneous. There is obviously an order — we just can't predict which — so the order is unspecified.
(**) This is no actual application for the numbering. But the pattern is equivalent.
What are the rules for valid pp-tokens?
These are spelled out in the respective standards; C11 §6.4 and C++11 §2.4. In both cases, they correspond to the production preprocessing-token. Aside from pp-number, they shouldn't be too surprising. The remaining possibilities are identifiers (including keywords), "punctuators" (in C++, preprocessing-op-or-punc), string and character literals, and any single non-whitespace character which doesn't match any other production.
With a few exceptions, any sequence of characters can be decomposed into a sequence of valid preprocessing-tokens. (One exception is unmatched quotes and apostrophes: a single quote or apostrophe is not a valid preprocessing-token, so a text including an unterminated string or character literal cannot be tokenised.)
In the context of the ## operator, though, the result of the concatenation must be a single preprocessing-token. So an invalid concatenation is a concatenation whose result is a sequence of characters which comprise multiple preprocessing-tokens.
Are there differences between C and C++?
Yes, there are slight differences:
C++ has user defined string and character literals, and allows "raw" string literals. These literals will be tokenized differently in C, so they might be multiple preprocessing-tokens or (in the case of raw string literals) even invalid preprocessing-tokens.
C++ includes the symbols ::, .* and ->*, all of which would be tokenised as two punctuator tokens in C. Also, in C++, some things which look like keywords (eg. new, delete) are part of preprocessing-op-or-punc (although these symbols are valid preprocessing-tokens in both languages.)
C allows hexadecimal floating point literals (eg. 1.1p-3), which are not valid preprocessing-tokens in C++.
C++ allows apostrophes to be used in integer literals as separators (1'000'000'000). In C, this would probably result in unmatched apostrophes.
There are minor differences in the handling of universal character names (eg. \u0234).
In C++, <:: will be tokenised as <, :: unless it is followed by : or >. (<::: and <::> are tokenised normally, using the longest-match rule.) In C, there are no exceptions to the longest-match rule; <:: is always tokenised using the longest-match rule, so the first token will always be <:.
Is it legal to concatenate test_, 0001, and _THING, even though concatenation order is unspecified?
Yes, that is legal in both languages.
test_ ## 0001 => test_0001 (identifier)
test_0001 ## _THING => test_0001_THING (identifier)
0001 ## _THING => 0001_THING (pp-number)
test_ ## 0001_THING => test_0001_THING (identifier)
What are examples of invalid token concatenation?
Suppose we have
#define concat3(a, b, c) a ## b ## c
Now, the following are invalid regardless of concatenation order:
concat3(., ., .)
.. is not a token even though ... is. But the concatenation must proceed in some order, and .. would be a necessary intermediate value; since that is not a single token, the concatenation would be invalid.
concat3(27,e,-7)
Here, -7 is two tokens, so it cannot be concatenated.
And here is a case in which concatenation order matters:
concat3(27e, -, 7)
If this is concatenated left-to-right, it will result in 27e- ## 7, which is the concatenation of two pp-numbers. But - cannot be concatenated with 7, because (as above) -7 is not a single token.
What exactly is a pp-number?
In general terms, a pp-number is a superset of tokens which might be converted into (single) numeric literals or might be invalid. The definition is intentionally broad, partly in order to allow (some) token concatenations, and partly to insulate the preprocessor from the periodic changes in numeric formats. The precise definition can be found in the respective standards, but informally a token is a pp-number if:
It starts with a decimal digit or a period (.) followed by a decimal digit.
The rest of the token is letters, numbers and periods, possibly including sign characters (+, -) if preceded by an exponent symbol. The exponent symbol can be E or e in both languages; and also P and p in C since C99.
In C++, a pp-number can also include (but not start with) an apostrophe followed by a letter or digit.
Note: Above, letter includes underscore. Also, universal character names can be used (except following an apostrophe in C++).
Once preprocessing is terminated, all pp-numbers will be converted to numeric literals if possible. If the conversion is not possible (because the token doesn't correspond to the syntax for any numeric literal), the program is invalid.
#define test(A) test_## A ## _THING
int test(0001) = 2;
This is legal with both LTR and RTL evaluation, since both test_0001 and 0001_THING are valid preprocessor-tokens. The former is an identifier, while the latter is a pp-number; pp-numbers are not checked for suffix correctness until a later stage of compilation; think e.g. 0001u an unsigned octal literal.
A few examples to show that the order of evaluation does matter:
#define paste2(a,b) a##b
#define paste(a,b) paste2(a,b)
#if defined(LTR)
#define paste3(a,b,c) paste(paste(a,b),c)
#elif defined(RTL)
#define paste3(a,b,c) paste(a,paste(b,c))
#else
#define paste3(a,b,c) a##b##c
#endif
double a = paste3(1,.,e3), b = paste3(1e,+,3); // OK LTR, invalid RTL
#define stringify2(x) #x
#define stringify(x) stringify2(x)
#define stringify_paste3(a,b,c) stringify(paste3(a,b,c))
char s[] = stringify_paste3(%:,%,:); // invalid LTR, OK RTL
If your compiler uses a consistent order of evaluation (either LTR or RTL) and presents an error on generation of an invalid pp-token, then precisely one of these lines will generate an error. Naturally, a lax compiler could well allow both, while a strict compiler might allow neither.
The second example is rather contrived; because of the way the grammar is constructed it's very difficult to find a pp-token that is valid when build RTL but not when built LTR.
There are no significant differences between C and C++ in this regard; the two standards have identical language (up to section headers) describing the process of macro replacement. The only way the language could influence the process would be in the valid preprocessing-tokens: C++ (especially recently) has more forms of valid preprocessing-tokens, such as user-defined string literals.

Using MSVC preprocessor 'charizing' operator in Clang

I've got the following code that someone working on MSVC has given to me:
#define MAP1(x, y) map[#x] = ##y;
I'm on Xcode, using Clang, and from various google searches I've found that this is known as a 'charizing operator', and is specific to MSVC's preprocessor. Is there a way of emulating the functionality of this operator while using Clang? I've tried removing the # but got the following error message:
Assigning to 'int' from incompatible type 'const char[2]'
Would an explicit cast to 'int' work or is the charizing operator doing something different?
The stringizing operator (standard C++) converts a into "a", so the charizing operator sounds like it turns a into 'a'. You can, in the simple cases, get 'a' from "a" by taking the first character.
#define MAP1(x, y) map[#x] = static_cast<const char(&)[2]>(#y)[0];
The static_cast to const char(&)[2] ensures you get a compile-time error if you don't get a string of length 1, which is two characters if you count the trailing '\0'. A plain #y[0] would silently take the first character, regardless of the string's length.
Did you try something like #y[0]? Basically, "stringify y and take the first char" :-)
Other than that, since from the looks of it the generated statements are executed at runtime anyway, you can just stringify y, pass it to a function and have that function return the right thing.

Implementation of string literal concatenation in C and C++

AFAIK, this question applies equally to C and C++
Step 6 of the "translation phases" specified in the C standard (5.1.1.2 in the draft C99 standard) states that adjacent string literals have to be concatenated into a single literal. I.e.
printf("helloworld.c" ": %d: Hello "
"world\n", 10);
Is equivalent (syntactically) to:
printf("helloworld.c: %d: Hello world\n", 10);
However, the standard doesn't seem to specify which part of the compiler has to handle this - should it be the preprocessor (cpp) or the compiler itself. Some online research tells me that this function is generally expected to be performed by the preprocessor (source #1, source #2, and there are more), which makes sense.
However, running cpp in Linux shows that cpp doesn't do it:
eliben#eliben-desktop:~/test$ cat cpptest.c
int a = 5;
"string 1" "string 2"
"string 3"
eliben#eliben-desktop:~/test$ cpp cpptest.c
# 1 "cpptest.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "cpptest.c"
int a = 5;
"string 1" "string 2"
"string 3"
So, my question is: where should this feature of the language be handled, in the preprocessor or the compiler itself?
Perhaps there's no single good answer. Heuristic answers based on experience, known compilers, and general good engineering practice will be appreciated.
P.S. If you're wondering why I care about this... I'm trying to figure out whether my Python based C parser should handle string literal concatenation (which it doesn't do, at the moment), or leave it to cpp which it assumes runs before it.
The standard doesn't specify a preprocessor vs. a compiler, it just specifies the phases of translation you already noted. Traditionally, phases 1 through 4 were in the preprocessor, Phases 5 though 7 in the compiler, and phase 8 the linker -- but none of that is required by the standard.
Unless the preprocessor is specified to handle this, it's safe to assume it's the compiler's job.
Edit:
Your "I.e." link at the beginning of the post answers the question:
Adjacent string literals are concatenated at compile time; this allows long strings to be split over multiple lines, and also allows string literals resulting from C preprocessor defines and macros to be appended to strings at compile time...
In the ANSI C standard, this detail is covered in section 5.1.1.2, item (6):
5.1.1.2 Translation phases
...
4. Preprocessing directives are executed and macro invocations are expanded. ...
5. Each source character set member and escape sequence in character constants and string literals is converted to a member of the execution character set.
6. Adjacent character string literal tokens are concatenated and adjacent wide string literal tokens are concatenated.
The standard does not define that the implementation must use a pre-processor and compiler, per se.
Step 4 is clearly a preprocessor responsibility.
Step 5 requires that the "execution character set" be known. This information is also required by the compiler. It is easier to port the compiler to a new platform if the preprocessor does not contain platform dependendencies, so the tendency is to implement step 5, and thus step 6, in the compiler.
I would handle it in the scanning token part of the parser, so in the compiler. It seems more logical. The preprocessor has not to know the "structure" of the language, and in fact it ignores it usually so that macros can generate uncompilable code. It handles nothing more than what it is entitled to handle by directives that are specifically addressed to it (# ...), and the "consequences" of them (like those of a #define x h, which would make the preprocessor change a lot of x into h)
There are tricky rules for how string literal concatenation interacts with escape sequences.
Suppose you have
const char x1[] = "a\15" "4";
const char y1[] = "a\154";
const char x2[] = "a\r4";
const char y2[] = "al";
then x1 and x2 must wind up equal according to strcmp, and the same for y1 and y2. (This is what Heath is getting at in quoting the translation steps - escape conversion happens before string constant concatenation.) There's also a requirement that if any of the string constants in a concatenation group has an L or U prefix, you get a wide or Unicode string. Put it all together and it winds up being significantly more convenient to do this work as part of the "compiler" rather than the "preprocessor."