I am studying the C++ standard on how the C++ preprocessor handles macro substitution in detail (I need to implement a subset of the C++ preprocessor myself). And here is an example I created for my studying:
#define a x
#define x(x,y) x(x+a, y+1)
a(x(90, 80), a(1,2))
By asking VC++ 2010 to generate the preprocessor output file, I found that the above a(x(90, 80), a(1,2)) becomes this:
90(90+x, 80+1)(90(90+x, 80+1)+x, 1(1+x, 2+1)+1);
But how does the preprocessor come up with this output? The rules are too complicated to comprehend. Can someone explain all the steps the preprocessor has done to come up with such a result?
Old answer, order is not exact (see edit):
Let's start from your expression:
a(x(90, 80), a(1, 2))
Now, since we have #define a x, it gets expanded to:
x(x(90, 80), x(1, 2))
// ^^^^^^^^^ ^^^^^^^
// arg 'x' arg 'y'
We can apply the definition of x(x,y), which is #define x(x,y) x(x+a, y+1):
x(90, 80)(x(90, 80)+a, x(1, 2)+1)
There is another pass that will expand x(...). You can also notice that the +a that > was in the previous expression was expanded to +x:
90(90+a, 80+1)(90(90+a, 80+1)+x, 1(1+a, 2+1)+1)
// ^^
// expanded
Last: the +a that remains are expanded to +x:
90(90+x, 80+1)(90(90+x, 80+1)+x, 1(1+x, 2+1)+1)
// ^^ ^^ ^^
// expanded expanded expanded
I hope there are no errors.
Please note that your definition of x(x,y) is quite ambiguous (for humans): the macro name and a parameter share the same name. Note that even withouth that, macros are not recursive, so if you had
#define x(u,v) x(u+a, b+1)
it would not expand to something like
x(u+a+a+a+a, b+1+1+1+1)
This is because when the macro x is defined, its name is not 'available' to the inner macro definition.
Another small note: for gcc, the output is not exactly the same, as gcc add spaces between replaced tokens (but if you remove them it will be the same as msvc).
EDIT: from the comments of dyp, this order is not the exact one. In fact, parameters are expanded first and then substituted in the macro expression. The last part of the sentence is important: that means that the macro parameter list is not re-evalued. Think of it as: macro gets expanded with placeholders in lieu of the parameters, then arguments are expanded, and then the placeholders are replaced by their respective argument. So, in short, that is equivalent to what I explained before, but here is the right order (detailed operations):
> Expansion of a(x(90, 80), a(1, 2))
> Substitution of 'a' into 'x' (now: 'x(x(90, 80), a(1, 2))')
> Expansion of x(x(90, 80), a(1, 2)) [re-scan]
> Macro 'x(X, Y)' is expanded to 'X(X+a,Y+1)'
> Expansion of 'x(90,80)' (first argument)
> Macro 'x(X,Y)' is expanded to 'X(X+a,Y+1)'
> Argument '90' does not need expansion (ie, expanded to same)
> Argument '80' does not need expansion (ie, expanded to same)
> Substitution with 'X=90' and 'Y=80': '90(90+a, 80+1)'
> Re-scan of result (ignoring macro name 'x')
> Substitution of 'a' into 'x': '90(90+x, 80+1)'
> Expansion of 'a(1,2)' (second argument)
> Substitution of 'a' into 'x'
> Expansion of 'x(1,2)' [re-scan]
> Macro 'x(X,Y)' is expanded to 'X(X+a,Y+1)'
> Argument '1' does not need expansion (ie, expanded to same)
> Argument '2' does not need expansion (ie, expanded to same)
> Substitution with 'X=1' and 'Y=2': '1(1+a, 2+1)'
> Re-scan of result (ignoring macro name 'x')
> Substitution of 'a' into 'x': '1(1+x, 2+1)'
> Substitution with X='90(90+x, 80+1)' and Y='1(1+x, 2+1)'
Result: '90(90+x, 80+1)(90(90+x, 80+1)+a, 1(1+x, 2+1)+1)'
> Re-scan of result
> Substitution of 'a' into 'x'
Result: '90(90+x, 80+1)(90(90+x, 80+1)+x, 1(1+x, 2+1)+1)'
Last result is result of whole expansion:
90(90+x, 80+1)(90(90+x, 80+1)+x, 1(1+x, 2+1)+1)
Related
I wonder why this macro is expanding so much.
#define CONCAT_IMPL(A, B) A##B
#define CONCAT(A, B) CONCAT_IMPL(A, B)
#define EAT(...)
#define TEST(ARG) EXPANDED, ARG) EAT(
#define GET_LAST(A, B) B
int result = 0;
result = GET_LAST(CONCAT(TEST, (1)), 2); // result is 2
result = GET_LAST(TEST(1), 2); // result is 2
result = GET_LAST(EXPANDED, 1) EAT(, 2); // result is 1
I want GET_LAST(CONCAT(TEST, (1)), 2); evaluated value 1.
I'd appreciate it if you could tell me if it's possible on MSVC or if something's missing.
C11 draft:
The sequence of preprocessing tokens bounded by the outside-most matching parentheses forms the list of arguments for the function-like macro. The individual arguments within the list are separated by comma preprocessing tokens, but comma preprocessing tokens between matching inner parentheses do not separate arguments.
GET_LAST(CONCAT(TEST, (1)), 2) is an invocation of the macro GET_LAST with a list of two arguments. One is CONCAT(TEST, (1)) and the other one is 2.
After the arguments for the invocation of a function-like macro have been identified, argument substitution takes place. A parameter in the replacement list, unless preceded by a # or ## preprocessing token or followed by a ## preprocessing token (see below), is replaced by the corresponding argument after all macros contained therein have been expanded. Before being substituted, each argument's preprocessing tokens are completely macro replaced as if they formed the rest of the preprocessing file; no other preprocessing tokens are available.
The first parameter A does not occur in the replacement list of the macro, so nothing is done with the corresponding argument. The second parameter B occurs, so the corresponding argument macro-expands to 2, and the occurrence of B in the replacement list is substituted with the expansion.
I just noticed an interesting thing about the expansion of the macro parameters in C++.
I defined 4 macros; 2 of them turn given parameter into string and another 2 try to separate 2 arguments. I passed them argument with macro which expands into , and got the following results:
#define Quote(x) #x
#define String(x) Quote(x)
#define SeparateImpl(first, second) first + second
#define Separate(pair) SeparateImpl(pair)
#define comma ,
int main(){
Quote(1 comma 2); // -> "1 comma 2"
String(1 comma 2); // -> "1 , 2"
SeparateImpl(1 comma 2); // -> 1 , 2 + *empty arg*
Separate(1 comma 2); // -> 1 , 2 + *empty arg*
return 0;
}
So, as we see macro String turned into "1 , 2", that means macro comma had been unpacked first. However, macro Separate turned into 1 , 2 + **empty arg**, that means macro comma hadn't been unpacked first and I wonder why? I tried this in VS2019.
#define Quote(x) #x
#define String(x) Quote(x)
#define SeparateImpl(first, second) first + second
#define Separate(pair) SeparateImpl(pair)
#define comma ,
Macro invocation proceeds as follows:
Argument substitution (a.s), where if a parameter is mentioned in the replacement list and said parameter does not participate in a paste or stringification, it is fully expanded and said mentions of the parameter in the replacement list are substituted with the result.
Stringification
Pastes
Rescan and further replacement (r.a.f.r.), where the resulting replacement list is rescanned, during which the macro's name is marked as invalid for expansion ("painted blue").
Here's how each case should expand:
Quote(1 comma 2)
a.s. no action (only mention of parameter is stringification). Stringification applies. Result: "1 comma 2".
String(1 comma 2)
a.s. applies; yielding Quote(1 , 2). During r.a.f.r., Quote identified as a macro, but the argument count doesn't match. This is invalid. But see below.
SeparateImpl(1 comma 2)
Invalid macro call. The macro is being invoked with one argument, but it should have 2. Note that comma being defined as a macro is irrelevant; at the level of macro invocation you're just looking at the tokens.
Separate(1 comma 2)
a.s. applies; yielding SeparateImpl(1 , 2). During r.a.f.r., SeparateImpl is invoked... that invocation's a.s. applies, yielding 1 + 2.
I tried this in VS2019.
I could tell from a glance it was VS something before 2020, where the walls tells me they're finally going to work on preprocessor compliance. VS in particular seems to have this strange state in which tokens with commas in them nevertheless are treated as single arguments (it's as if argument identification occurs before expansion but continues to apply or something); so in this case, 1 , 2 would be that strange thing in your String(1 comma 2) call; i.e., Quote is being called with 1 , 2 but in that case it's actually one argument.
I'm trying to conditionally expand a macro to either "( a" or "b )", but the naive way of doing so doesn't work on either of the compilers I'm using (Microsoft C/C++ and the NDK compiler). Example:
// This works on both compilers, expands to ( a ) as expected
#define PARENS_AND_SUCH BOOST_PP_IF(1, BOOST_PP_LPAREN() a BOOST_PP_RPAREN(), b)
// MSVC: syntax error/unexpected end of file in macro expansion
// NDK: unterminated argument list
#define PARENS_AND_SUCH BOOST_PP_IF(1, BOOST_PP_LPAREN() a, b)
// Desired expansion: ( a
// MSVC expansion: ( a, b )
// NDK: error: macro "BOOST_PP_IIF" requires 3 arguments, but only 2 given
#define PARENS_AND_SUCH BOOST_PP_IF(1, BOOST_PP_LPAREN() a, b BOOST_PP_RPAREN())
What am I doing wrong?
You could force the order of evaluation to conform to the expected one by abstracting out the branches of the IF to subdefinitions, and delay their expansion until the conditional returns a branch:
#define PARENS_AND_SUCH BOOST_PP_CAT(PAS_, BOOST_PP_IF(1, THEN, ELSE))
#define PAS_THEN BOOST_PP_LPAREN() a
#define PAS_ELSE b BOOST_PP_RPAREN()
Since THEN and ELSE aren't complete names, the branches will not be expanded before the IF is expanded; when it returns, the value is combined with PAS_ to form a new valid definition and will expand at that time.
You could also parameterise the THEN and ELSE macros and make this technique more general (and IMO more elegant): passing parameters to an incomplete name essentially forms a thunk, and works pretty much the same way (the incomplete function-like macro name will be passed around plus parameter list until it's completed).
I saw this below code in an website.
I could not able to understsnd how the result is coming as 11, instead of 25 or 13.
Why I am thinking 25 because SQ(5) 5*5
or 13 because
SQ(2) = 4;
SQ(3) = 9;
may be final result will be 13 (9 + 4)
But surprised to see result as 11.
How the result is coming as 11?
using namespace std;
#define SQ(a) (a*a)
int main()
{
int ans = SQ(2 + 3);
cout << ans << endl;
system("pause");
}
The preprocessor does a simple text substitution on the source code. It knows nothing about the underlying language or its rules.
In your example, SQ(2 + 3) expands to (2 + 3*2 + 3), which evaluates to 11.
A more robust way to define SQ is:
#define SQ(a) ((a)*(a))
Now, SQ(2 + 3) would expand to ((2 + 3)*(2 + 3)), giving 25.
Even though this definition is an improvement, it is still not bullet-proof. If SQ() were applied to an expression with side effects, this could have undesired consequences. For example:
If f() is a function that prints something to the console and returns an int, SQ(f()) would result in the output being printed twice.
If i is an int variable, SQ(i++) results in undefined behaviour.
For further examples of difficulties with macros, see Macro Pitfalls.
For these reasons it is generally preferable to use functions rather than macros.
#define expansions kick in before the compiler sees the source code. That is why they are called pre-processor directives, the processor here is the compiler that translates C to machine readable code.
So, this is what the macro pre-processor is passing on to the compiler:
SQ(2 + 3) is expanded as (2 + 3*2 + 3)
So, this is really 2 + 6 + 3 = 11.
How can you make it do what you expect?
Enforce the order of evaluation. Use (), either in the macro definition or in the macro call.
OR
Write a simple function that does the job
The C preprocessor does textual substitution before the compiler interprets expressions and C syntax in general. Consequently, running the C preprocessor on this code converts:
SQ(2 + 3)
into:
2 + 3*2 + 3
which simplifies to:
2 + 6 + 3
which is 11.
#define preprocesor
Syntax :
# define identifier replacement
When the preprocessor encounters this directive, it replaces any occurrence of identifier in the rest of the code by replacement.
This replacement can be an expression, a statement, a block or simply anything.
The preprocessor does not understand C, it simply replaces any occurrence of identifier by replacement.
# define can work also with parameters to define function macros:
# define SQ(a) (a*a)
will replace any occurance of SQ(a) with a*a at compile time.
Hence,
SQ(2+3) will be replaces by 2+3*2+3
The computation is performed after the replacement is done.
hence answer 2+3*2+3=11
For your implementation, the value will expand to 2+3 * 2+3 which will result into 2+6+3=11.
You should define it as:
#define SQ(x) ({typeof(x) y=x; y*y;})
Tested on gcc, for inputs like
constants,
variable,
constant+const
const+variable
variable++ / ++variable
function call, containing printf.
Note: typeof is GNU addition to standard C. May not be available in some compilers.
It's just a replacement before compilation
so you should try this out :
#define SQ(a) ((a)*(a))
In your case , SQ(2 + 3) is equivalent to (2+3*2+3) which is 11.
But correcting it to as I wrote above, it will be like, ((2+3)*(2+3)) which is 5*5 = 25 that's the answer you want.
At: http://www.learncpp.com/cpp-tutorial/110-a-first-look-at-the-preprocessor/
It mentions a directive called "Macro defines". What do we mean when we say "Macro"?
Thanks.
A macro is a preprocessor directive that defines a name that is to be replaced (or removed) by the preprocessor right before compilation.
Example:
#define MY_MACRO1 somevalue
#define MY_MACRO2
#define SUM(a, b) (a + b)
then if anywhere in the code (except in the string literals) there is a mention of MY_MACRO1 or MY_MACRO2 the preprocessor replaces this with whatever comes after the name in the #define line.
There can also be macros with parameters (like the SUM). In that case the preprocessor recognizes the arguments, example:
int x = 1, y = 2;
int z = SUM(x, y);
preprocessor replaces like this:
int x = 1, y = 2;
int z = (x + y);
only after this replacement the compiler gets to compile the resulting code.
A macro is a code fragment that gets substituted into your program by the preprocessor (before compilation proper begins). This may be a function-like block, or it may be a constant value.
A warning when using a function-like macro. Consider the following code:
#define foo(x) x*x
If you call foo(3), it will become (and be compiled as) 3*3 (=9). If, instead, you call foo(2+3), it will become 2+3*2+3, (=2+6+3=11), which is not what you want. Also, since the code is substituted in place, foo(bar++) becomes bar++ * bar++, incrementing bar twice.
Macros are powerful tools, but it can be easy to shoot yourself in the foot while trying to do something fancy with them.
"Macro defines" merely indicate how they are specified (with #define directives), while "Macro" is the function or expression that is defined.
There is little difference between them aside from semantics, however.