C++ Preprocessor Standard Behaviour - c++

I'm studying the C++ standard on the exact behaviour the preprocessor (I need to implement some sort of C++ preprocessor). From what I understand, the example I made up (to aid my understanding) below should be valid:
#define dds(x) f(x,
#define f(a,b) a+b
dds(eoe)
su)
I expect the first function like macro invocation dds(eoe) be replaced by f(eoe, (note the comma within the replacement string) which then considered as f(eoe,su) when the input is rescanned.
But a test with VC++2010 gave me this (I told the VC++ to output the preprocessed file):
eoe+et_leoe+et_l
su)
This is counter-intuitive and is obviously incorrect. Is it a bug with VC++2010 or my misunderstanding of the C++ standard? In particular, is it incorrect to put a comma at the end of the replacement string like I did? My understanding of the C++ standard grammar is that any preprocessing-token's are allowed there.
EDIT:
I don't have GCC or other versions of VC++. Could someone help me to verify with these compilers.

My answer is valid for the C preprocessor, but according to Is a C++ preprocessor identical to a C preprocessor?, the differences are not relevant for this case.
From C, A Reference Manual, 5th edition:
When a functionlike macro call is encoutered, the entire macro call is
replaced, after parameter processing, by a copy of the body. Parameter
processing proceeds as follows. Actual argument token strings are
associated with the corresponding formal parameter names. A copy of
the body is then made in which every occurrence of a formal parameter
name is replace by a copy of the actual parameter token sequence
associated with it. This copy of the body then replaces the macro
call.
[...] Once a macro call has been expanded, the scan for macro calls
resumes at the beginning of the expansion so that names of macros may
be recognized within the expansion for the purpose of further macro
replacement.
Note the words within the expansion. That's what makes your example invalid. Now, combine it with this: UPDATE: read comments below.
[...] The macro is invoked by writing its name, a left parenthesis,
then once actual argument token sequence for each formal parameter,
then a right parenthesis. The actual argument token sequences are
separated by commas.
Basically, it all boils down to whether the preprocessor will rescan for further macro invocations only within the previous expansion, or if it will keep reading tokens that show up even after the expansion.
This may be hard to think about, but I believe that what should happen with your example is that the macro name f is recognized during rescanning, and since subsequent token processing reveals a macro invocation for f(), your example is correct and should output what you expect. GCC and clang give the correct output, and according to this reasoning, this would also be valid (and yield equivalent outputs):
#define dds f
#define f(a,b) a+b
dds(eoe,su)
And indeed, the preprocessing output is the same in both examples. As for the output you get with VC++, I'd say you found a bug.
This is consistent with C99 section 6.10.3.4, as well as C++ standard section 16.3.4, Rescanning and further replacement:
After all parameters in the replacement list have been substituted and # and ##
processing has taken place, all placemarker preprocessing tokens are removed. Then, the
resulting preprocessing token sequence is rescanned, along with all subsequent
preprocessing tokens of the source file, for more macro names to replace.

To the best of my understanding there is nothing in the [cpp.subst/rescan] portions of the standard that makes what you do illegal, and clang and gcc are right in expanding it as eoe+su, and the MSC (Visual C++) behaviour has to be reported as a bug.
I failed to make it work but I managed to find an ugly MSC workaround for you, using variadics - you may find it helpful, or you may not, but in any event it is:
#define f(a,b) (a+b
#define dds(...) f(__VA_ARGS__)
It is expanded as:
(eoe+
su)
Of course, this won't work with gcc and clang.

Well, the problem i see is that the preprocessor does the following
ddx(x) becomes f(x,
However, f(x, is defined as well (even thou it's defined as f(a,b) ), so f(x, expands to x+ garbage.
So ddx(x) finally transforms into x + garbage (because you defined f(smthing, ).
Your dds(eoe) actually expands into a+b where a is eoe and b is et_l .
And it does that twice for whatever reason :).
This scenario you made is compiler specific, depends how the preprocessor chooses to handle the defines expansion.

Related

Can a preprocessor macro expand just some pasted parameters?

I know that in expanding a function-like preprocessor macro, the # and ## tokens in the top-level substitution list essentially act "before" any macro expansions on the argument. For example, given
#define CONCAT_NO_EXPAND(x,y,z) x ## y ## z
#define EXPAND_AND_CONCAT(x,y,z) CONCAT_NO_EXPAND(x,y,z)
#define A X
#define B Y
#define C Z
then CONCAT_NO_EXPAND(A,B,C) is the pp-token ABC, and EXPAND_AND_CONCAT(A,B,C) is the pp-token XYZ.
But what if I want to define a macro that expands just some of its arguments before pasting? For example, I would like a macro that allows only the middle of three arguments to expand, then pastes it together with an exact unexpanded prefix and an exact unexpanded suffix, even if the prefix or suffix is the identifier of an object-like macro. That is, if again we have
#define MAGIC(x,y,z) /* What here? */
#define A X
#define B Y
#define C Z
then MAGIC(A,B,C) is AYC.
A simple attempt like
#define EXPAND(x) x
#define MAGIC(x,y,z) x ## EXPAND(y) ## z
results in an error 'pasting ")" and "C" does not give a valid preprocessing token". This makes sense (and I assume it's also producing the unwanted token AEXPAND).
Is there any way to get that sort of result using just standard, portable preprocessor rules? (No extra code-generating or -modifying tools.)
If not, maybe a way that works on most common implementations? Here Boost.PP would be fair game, even if it involves some compiler-specific tricks or workarounds under the hood.
If it makes any difference, I'm most interested in the preprocessor steps as defined in C++11 and C++17.
Here's a solution:
#define A X
#define B Y
#define C Z
#define PASTE3(q,r,s) q##r##s
#define MAGIC(x,y,z,...) PASTE3(x##__VA_ARGS__,y,__VA_ARGS__##z)
MACRO(A,B,C,)
Note that the invocation "requires" another argument (see below for why); but:
MACRO(A,B,C) here is compliant for C++20
MACRO(A,B,C) will "work" in many C++11/C++17 preprocessors (e.g., gnu/clang), but that is an extension not a C++11/C++17 compliant behavior
I know that in expanding a function-like preprocessor macro, the # and ## tokens in the top-level substitution list essentially act "before" any macro expansions on the argument.
To be more precise, there are four steps to macro expansion:
argument identification
argument substitution
stringification and pasting (in an unspecified order)
rescan and further replacement
Argument identification associates parameters in the macro definition with arguments in an invocation. In this case, x associates with A, y with B, z with C, and ... with a "placemarker" (abstract empty value associated with a parameter whose argument has no tokens). For C++ preprocessors up to C++20, use of a ... requires at least one parameter; since C++20's addition of the __VA_OPT__ feature, use of the ... in an invocation is optional.
Argument substitution is the step where arguments are expanded. Specifically, what happens here is that for each parameter in the macro's replacement list (here, PASTE3(x##__VA_ARGS__,y,__VA_ARGS__##z)), where said parameter does not participate in a paste or stringification, the associated argument is fully expanded as if it appeared outside of an invocation; then, all mentions of that parameter in the replacement list that do not participate in stringification and paste are replaced with the expanded result. For example, at this step for the MAGIC(A,B,C,) invocation, y is the only mentioned qualifying parameter, so B is expanded producing Y; at that point we get PASTE3(x##__VA_ARGS__,Y,__VA_ARGS__##z).
The next step applies pastes and stringification operators in no particular order. Placemarker's are needed here specifically because you want to expand the middle and not the end, and you don't want extra stuff; i.e., to get A to not expand to X, and to stay A (as opposed to changing to "A"), you need to avoid argument substitution specifically. a.s. is avoided in only two ways; pasting or stringifying, so if stringification doesn't work we have to paste. And since you want that token to stay the same as what you had, you need to paste to a placemarker (which means you need one to paste to, which is why there's another parameter).
Once this macro applies the pastes to the "placemarkers", you wind up with PASTE3(A,Y,C); then there is the rescan and further replacement step, during which PASTE3 is identified as a macro invocation. Fast forwarding, since PASTE3 pastes its arguments, a.s. doesn't apply to any of them, we do the pastes in "some order" and we wind up with AYC.
As a final note, in this solution I'm using a varying argument to produce the placemarker token precisely because it allows invocations of the form MACRO(A,B,C) in at least C++20. I'm left-pasting that to z because that makes the addition at least potentially useful for something else (MAGIC(A,B,C,_) would use _ as a "delimiter" to produce A_Y_C).

Why won't my variadic macro accept no arguments correctly?

Overloading Macro on Number of Arguments
https://codecraft.co/2014/11/25/variadic-macros-tricks/
I've been looking at the two links above, trying to get the following code to work:
#define _GET_NUMBER(_0, _1, _2, _3, _4, _5, NAME, ...) NAME
#define OUTPUT_ARGS_COUNT(...) _GET_NUMBER(_0, ##__VA_ARGS__, 5, 4, 3, 2, 1, 0)
...
cout << OUTPUT_ARGS_COUNT("HelloWorld", 1.2) << endl;
cout << OUTPUT_ARGS_COUNT("HelloWorld") << endl;
cout << OUTPUT_ARGS_COUNT() << endl;
This compiles, runs, and gives the following output:
2
1
1
I can not for the life of me figure out why the call OUTPUT_ARGS_COUNT() is giving me 1 instead of 0. I have an ok understanding of the code I'm trying to use, but it's a tad greek to me still so I guess it's possible that I'm not applying something correctly despite the fact I literally copied and pasted the example code from the link on stack overflow.
I'm compiling using g++ 5.4.0 20160609.
Any ideas or additional resources you can point me to would be greatly appreciated.
You can see at http://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html:
Second, the ‘##’ token paste operator has a special meaning when placed between a comma and a variable argument. If you write
#define eprintf(format, ...) fprintf (stderr, format, ##__VA_ARGS__)
and the variable argument is left out when the eprintf macro is used, then the comma before the ‘##’ will be deleted. This does not happen if you pass an empty argument, nor does it happen if the token preceding ‘##’ is anything other than a comma.
eprintf ("success!\n")
→ fprintf(stderr, "success!\n");
The above explanation is ambiguous about the case where the only macro parameter is a variable arguments parameter, as it is meaningless to try to distinguish whether no argument at all is an empty argument or a missing argument. CPP retains the comma when conforming to a specific C standard. Otherwise the comma is dropped as an extension to the standard.
So, (unless appropriate extension used) OUTPUT_ARGS_COUNT() is counted as 1 empty argument (comma kept with ##__VA_ARGS__).
The C standard specifies
If the identifier-list in the macro definition does not end with an ellipsis, [...]. Otherwise, there shall be more arguments in the invocation than there are parameters in the macro definition (excluding the ...)
(C2011 6.10.3/4; emphasis added)
C++11 contains language to the same effect in paragraph 16.3/4.
In both cases, then, if your macro invocation were interpreted to have zero arguments then your program would be non-conforming. On the other hand, the preprocessor does recognize and support empty macro arguments -- that is, arguments consisting of zero preprocessing tokens. In principle, then, there is an ambiguity here between no argument and a single empty argument, but in practice, only the latter interpretation results in a conforming program.
That g++ opts for the latter interpretation (the other answer quotes its documentation to that effect) is thus reasonable and appropriate, but it is not safe to rely upon it if you want your code to be portable. A compiler that takes the alternative interpretation would behave differently, possibly by providing the behavior you expected, but also possibly by rejecting the code.

Macro expansion order confusion between compilers

This piece of code compiles in Visual Studio 2015, but not in Clang:
#define COMMA ,
#define MC(a) a
#define MA(a,b,c) MC(a b c)
map <MA(int,COMMA,int)> FF;
It appears that Clang expands the COMMA macro before submitting it to the MC() macro. "Who is right" according to the C++ standard? Also, how can I make Clang behave like Visual Studio?
EDIT: Simplified the example, and changed some macro names.
Clang conforms to the standard; Visual Studio doesn't. I think you will have a lot of trouble getting Clang to not conform to the standard, so I won't attempt to answer "how do I get Clang to act like Visual Studio?". Maybe that wasn't really what you wanted to know.
When the compiler identifies the invocation of a function-like macro (that is, a macro with parameters) it expands the macro using the procedure explained in detail in §16.3 [cpp.replace] of the C++ standard. In the following, I've simplified the procedure by not considering the # and ## operators, because they do not appear in your example and the full procedure is more complicated.
We'll examine the invocation MC(int, COMMA, int). Here's what happens after the compiler sees the tokens MC and (, which indicate an invocation of the macro.
The compiler identifies what the arguments are, which involves finding the closing parenthesis. There are three arguments, which corresponds to the number of parameters, so that's OK. The arguments have not yet been expanded, so the compiler only sees the punctuation actually in the source file. It identifies the arguments as int, COMMA and int.
Every argument (except the ones whose corresponding parameter participates in token concatenation or stringification -- but, as I said, I'm not going to go into that scenario here) are then fully expanded. This happens before they are substituted into the macro body, so that the names of the macro's parameters don't leak out of the macro. So now the three arguments are int, , and int.
A copy of the macro body is made, in which each parameter is substituted with the corresponding (fully expanded) argument. The macro body ("replacement list", in standardese) was MC(A B C); after substituting the arguments, that becomes MC(A , C).
The sequence of tokens created in step 3 is inserted into the input in place of the macro invocation, and preprocessing continues.
At this point, the compiler will see the invocation of the function-like macro MC(A, B), and will proceed as above. However, this time the first step fails because two arguments are identified but the macro MC only has one parameter.

C++ Macro's Token-Paster as argument of a function

I was searching for a while on the net and unfortunately i didn't find an answer or a solution for my problem, in fact, let's say i have 2 functions named like this :
1) function1a(some_args)
2) function2b(some_args)
what i want to do is to write a macro that can recognize those functions when feeded with the correct parameter, just that the thing is, this parameter should be also a parameter of a C/C++ function, here is what i did so far.
#define FUNCTION_RECOGNIZER(TOKEN) function##TOKEN()
void function1a()
{
}
void function2a()
{
}
void anotherParentFunction(const char* type)
{
FUNCTION_RECOGNIZER(type);
}
clearly, the macro is recognizing "functiontype" and ignoring the argument of anotherParentFunction, i'm asking if there is/exist a trick or anything to perform this way of pasting.
thank you in advance :)
If you insist on using a macro: Skip the anotherParentFunction() function and use the macro directly instead. When called with constant strings, i.e.
FUNCTION_RECOGNIZER( "1a");
it should work.
A more C++ like solution would be to e.g use an enum, then implement anotherParentFunction() with the enum as parameter and a switch that calls the corresponding function. Of course you need to change the enum and the switch statement then every time you add a new function, but you would be more flexible in choosing the names of the functions.
There are many more solutions to achieve something similar, the question really is: What is your use case? What do want to achieve?
In 16.1.5 the standard says:
The implementation can process and skip sections of source files conditionally, include other source files, and replace macros. These capabilities are called preprocessing, because conceptually they occur before translation of the resulting translation unit.
[emphasis mine]
Originally pre-processing was done by a separate app, it is essentially an independent language.
Today, the pre-processor is often part of the compiler, but - for example - you can't see macros etc in the Clang AST tree.
The significance of this is that the pre-processor knows nothing about types or functions or arguments.
Your function definition
void anotherParentFunction(const char* type)
means nothing to the pre-processor and is completely ignored by it.
FUNCTION_RECOGNIZER(type);
this is recognized as a defined macro, but type is not a recognized pre-processor symbol so it is treated as a literal, the pre-processor does not consult the C++ parser or interact with it's AST tree.
It consults the macro definition:
#define FUNCTION_RECOGNIZER(TOKEN) function##TOKEN()
The argument, literal type, is tokenized as TOKEN. The word function is taken as a literal and copied to the result string, the ## tells the processor to copy the value of the token TOKEN literally, production functiontype in the result string. Because TOKEN isn't recognized as a macro, the ()s end the token and the () is appended as a literal to the result string.
Thus, the pre-processor substitutes
FUNCTION_RECOGNIZER(type);
with
functiontype();
So the bad news is, no there is no way to do what you were trying to do, but this may be an XY Problem and perhaps there's a solution to what you were trying to achieve instead.
For instance, it is possible to overload functions based on argument type, or to specialize template functions based on parameters, or you can create a lookup table based on parameter values.

What is the purpose of the ## operator in C++, and what is it called?

I was looking through the DXUTCore project that comes with the DirectX March 2009 SDK, and noticed that instead of making normal accessor methods, they used macros to create the generic accessors, similar to the following:
#define GET_ACCESSOR( x, y ) inline x Get##y() { DXUTLock l; return m_state.m_##y;};
...
GET_ACCESSOR( WCHAR*, WindowTitle );
It seems that the ## operator just inserts the text from the second argument into the macro to create a function operating on a variable using that text. Is this something that is standard in C++ (i.e. not Microsoft specific)? Is its use considered good practice? And, what is that operator called?
Token-pasting operator, used by the pre-processor to join two tokens into a single token.
This is also standard C++, contrary to what Raldolpho stated.
Here is the relevant information:
16.3.3 The ## operator [cpp.concat]
1 A ## preprocessing token shall not
occur at the beginning or at the end
of a replacement list for either form
of macro definition.
2 If, in the
replacement list, a parameter is
immediately preceded or followed by a
## preprocessing token, the parameter is replaced by the corresponding
argument’s preprocessing token
sequence.
3 For both object-like and
function-like macro invocations,
before the replacement list is
reexamined for more macro names to
replace, each instance of a ##
preprocessing token in the replacement
list (not from an argument) is deleted
and the preceding preprocessing token
is concatenated with the following
preprocessing token. If the result is
not a valid preprocessing token, the
behavior is undefined. The resulting
token is available for further macro
replacement. The order of evaluation
of ## operators is unspecified.
It's a preprocessing operator that concatenates left and right operands (without inserting whitespace). I don't think it's Microsoft specific.
This isn't Standard C++, it's Standard C. Check out this Wikipedia article.
And is it a good practice? In general, I hate pre-processor macros and think they're as bad as (if not worse than) Goto.
Edit: Apparently I'm being misunderstood by what I meant by "This isn't Standard C++, it's Standard C". Many people are reading the first phrase and failing to read the second. My intent is to point out that macros were inherited by C++ from C.
As Mehrdad said, it concatenates the operands, like:
#define MyMacro(A,B) A ## B
MyMacro(XYZ, 123) // Equivalent to XYZ123
Note that MISRA C suggests that this operand (and the # 'stringify' operand) should not be used due to the compiler dependent order of calculation.
It is token pasting operator allowed by Standard C++ (see 16.3.3 for details).
As for good practice: using macro is not a good practice IMHO (in C++).
it's the concatenation for macro arguments i.e.
GET_ACCESSOR (int, Age);
will be expended to
inline int GetAge() { DXUTLock l; return m_state.m_Age;};