Why can't C++ be parsed with a LR(1) parser? - c++

I was reading about parsers and parser generators and found this statement in wikipedia's LR parsing -page:
Many programming languages can be parsed using some variation of an LR parser. One notable exception is C++.
Why is it so? What particular property of C++ causes it to be impossible to parse with LR parsers?
Using google, I only found that C can be perfectly parsed with LR(1) but C++ requires LR(∞).

LR parsers can't handle ambiguous grammar rules, by design. (Made the theory easier back in the 1970s when the ideas were being worked out).
C and C++ both allow the following statement:
x * y ;
It has two different parses:
It can be the declaration of y, as pointer to type x
It can be a multiply of x and y, throwing away the answer.
Now, you might think the latter is stupid and should be ignored.
Most would agree with you; however, there are cases where it might
have a side effect (e.g., if multiply is overloaded). but that isn't the point.
The point is there are two different parses, and therefore a program
can mean different things depending on how this should have been parsed.
The compiler must accept the appropriate one under the appropriate circumstances, and in the absence of any other information (e.g., knowledge of the type of x) must collect both in order to decide later what to do. Thus a grammar must allow this. And that makes the grammar ambiguous.
Thus pure LR parsing can't handle this. Nor can many other widely available parser generators, such as Antlr, JavaCC, YACC, or traditional Bison, or even PEG-style parsers, used in a "pure" way.
There are lots of more complicated cases (parsing template syntax requires arbitrary lookahead, whereas LALR(k) can look ahead at most k tokens), but only it only takes one counterexample to shoot down pure LR (or the others) parsing.
Most real C/C++ parsers handle this example by using some
kind of deterministic parser with an extra hack: they intertwine parsing with symbol table
collection... so that by the time "x" is encountered,
the parser knows if x is a type or not, and can thus
choose between the two potential parses. But a parser
that does this isn't context free, and LR parsers
(the pure ones, etc.) are (at best) context free.
One can cheat, and add per-rule reduction-time semantic checks in the
to LR parsers to do this disambiguation. (This code often isn't simple). Most of the other parser types
have some means to add semantic checks at various points
in the parsing, that can be used to do this.
And if you cheat enough, you can make LR parsers work for
C and C++. The GCC guys did for awhile, but gave it
up for hand-coded parsing, I think because they wanted
better error diagnostics.
There's another approach, though, which is nice and clean
and parses C and C++ just fine without any symbol table
hackery: GLR parsers.
These are full context free parsers (having effectively infinite
lookahead). GLR parsers simply accept both parses,
producing a "tree" (actually a directed acyclic graph that is mostly tree like)
that represents the ambiguous parse.
A post-parsing pass can resolve the ambiguities.
We use this technique in the C and C++ front ends for our
DMS Software Reengineering Tookit (as of June 2017
these handle full C++17 in MS and GNU dialects).
They have been used to process millions of lines
of large C and C++ systems, with complete, precise parses producing ASTs with complete details of the source code. (See the AST for C++'s most vexing parse.)

There is an interesting thread on Lambda the Ultimate that discusses the LALR grammar for C++.
It includes a link to a PhD thesis that includes a discussion of C++ parsing, which states that:
"C++ grammar is ambiguous,
context-dependent and potentially
requires infinite lookahead to resolve
some ambiguities".
It goes on to give a number of examples (see page 147 of the pdf).
The example is:
int(x), y, *const z;
meaning
int x;
int y;
int *const z;
Compare to:
int(x), y, new int;
meaning
(int(x)), (y), (new int));
(a comma-separated expression).
The two token sequences have the same initial subsequence but different parse trees, which depend on the last element. There can be arbitrarily many tokens before the disambiguating one.

The problem is never defined like this, whereas it should be interesting :
what is the smallest set of modifications to C++ grammar that would be necessary so that this new grammar could be perfectly parsed by a "non-context-free" yacc parser ? (making use only of one 'hack' : the typename/identifier disambiguation, the parser informing the lexer of every typedef/class/struct)
I see a few ones:
Type Type; is forbidden. An identifier declared as a typename cannot become a non-typename identifier (note that struct Type Type is not ambiguous and could be still allowed).
There are 3 types of names tokens :
types : builtin-type or because of a typedef/class/struct
template-functions
identifiers : functions/methods and variables/objects
Considering template-functions as different tokens solves the func< ambiguity. If func is a template-function name, then < must be the beginning of a template parameter list, otherwise func is a function pointer and < is the comparison operator.
Type a(2); is an object instantiation.
Type a(); and Type a(int) are function prototypes.
int (k); is completely forbidden, should be written int k;
typedef int func_type(); and
typedef int (func_type)(); are forbidden.
A function typedef must be a function pointer typedef : typedef int (*func_ptr_type)();
template recursion is limited to 1024, otherwise an increased maximum could be passed as an option to the compiler.
int a,b,c[9],*d,(*f)(), (*g)()[9], h(char); could be forbidden too, replaced by int a,b,c[9],*d;
int (*f)();
int (*g)()[9];
int h(char);
one line per function prototype or function pointer declaration.
An highly preferred alternative would be to change the awful function pointer syntax,
int (MyClass::*MethodPtr)(char*);
being resyntaxed as:
int (MyClass::*)(char*) MethodPtr;
this being coherent with the cast operator (int (MyClass::*)(char*))
typedef int type, *type_ptr; could be forbidden too : one line per typedef. Thus it would become
typedef int type;
typedef int *type_ptr;
sizeof int, sizeof char, sizeof long long and co. could be declared in each source file.
Thus, each source file making use of the type int should begin with
#type int : signed_integer(4)
and unsigned_integer(4) would be forbidden outside of that #type directive
this would be a big step into the stupid sizeof int ambiguity present in so many C++ headers
The compiler implementing the resyntaxed C++ would, if encountering a C++ source making use of ambiguous syntax, move source.cpp too an ambiguous_syntax folder, and would create automatically an unambiguous translated source.cpp before compiling it.
Please add your ambiguous C++ syntaxes if you know some!

As you can see in my answer here, C++ contains syntax that cannot be deterministically parsed by an LL or LR parser due to the type resolution stage (typically post-parsing) changing the order of operations, and therefore the fundamental shape of the AST (typically expected to be provided by a first-stage parse).

I think you are pretty close to the answer.
LR(1) means that parsing from left to right needs only one token to look-ahead for the context, whereas LR(∞) means an infinite look-ahead. That is, the parser would have to know everything that was coming in order to figure out where it is now.

The "typedef" problem in C++ can be parsed with an LALR(1) parser that builds a symbol table while parsing (not a pure LALR parser). The "template" problem probably cannot be solved with this method. The advantage of this kind of LALR(1) parser is that the grammar (shown below) is an LALR(1) grammar (no ambiguity).
/* C Typedef Solution. */
/* Terminal Declarations. */
<identifier> => lookup(); /* Symbol table lookup. */
/* Rules. */
Goal -> [Declaration]... <eof> +> goal_
Declaration -> Type... VarList ';' +> decl_
-> typedef Type... TypeVarList ';' +> typedecl_
VarList -> Var /','...
TypeVarList -> TypeVar /','...
Var -> [Ptr]... Identifier
TypeVar -> [Ptr]... TypeIdentifier
Identifier -> <identifier> +> identifier_(1)
TypeIdentifier -> <identifier> =+> typedefidentifier_(1,{typedef})
// The above line will assign {typedef} to the <identifier>,
// because {typedef} is the second argument of the action typeidentifier_().
// This handles the context-sensitive feature of the C++ language.
Ptr -> '*' +> ptr_
Type -> char +> type_(1)
-> int +> type_(1)
-> short +> type_(1)
-> unsigned +> type_(1)
-> {typedef} +> type_(1)
/* End Of Grammar. */
The following input can be parsed without a problem:
typedef int x;
x * y;
typedef unsigned int uint, *uintptr;
uint a, b, c;
uintptr p, q, r;
The LRSTAR parser generator reads the above grammar notation and generates a parser that handles the "typedef" problem without ambiguity in the parse tree or AST. (Disclosure: I am the guy who created LRSTAR.)

Related

What does a parser for C++ do until it can differentiate between comparisons and template instantiations?

After reading this question I am left wondering what happens (regarding the AST) when major C++ compilers parse code like this:
struct foo
{
void method() { a<b>c; }
// a b c may be declared here
};
Do they handle it like a GLR parser would or in a different way? What other ways are there to parse this and similar cases?
For example, I think it's possible to postpone parsing the body of the method until the whole struct has been parsed, but is this really possible and practical?
Although it is certainly possible to use GLR techniques to parse C++ (see a number of answers by Ira Baxter), I believe that the approach commonly used in commonly-used compilers such as gcc and clang is precisely that of deferring the parse of function bodies until the class definition is complete. (Since C++ source code passes through a preprocessor before being parsed, the parser works on streams of tokens and that is what must be saved in order to reparse the function body. I don't believe that it is feasible to reparse the source code.)
It's easy to know when a function definition is complete, since braces ({}) must balance even if it is not known how angle brackets nest.
C++ is not the only language in which it is useful to defer parsing until declarations have been handled. For example, a language which allows users to define new operators with different precedences would require all expressions to be (re-)parsed once the names and precedences of operators are known. A more pathological example is COBOL, in which the precedence of OR in a = b OR c depends on whether c is an integer (a is equal to one of b or c) or a boolean (a is equal to b or c is true). Whether designing languages in this manner is a good idea is another question.
The answer will obviously depend on the compiler, but the article How Clang handles the type / variable name ambiguity of C/C++ by Eli Bendersky explains how Clang does it. I will simply note some key points from the article:
Clang has no need for a lexer hack: the information goes in a single direction from lexer to parser
Clang knows when an identifier is a type by using a symbol table
C++ requires declarations to be visible throughout the class, even in code that appears before it
Clang gets around this by doing a full parsing/semantic analysis of the declaration, but leaving the definition for later; in other words, it's lexed but parsed after all the declared types are available

Template Metaprogramming To Check the Meaning of an Identifier

Problem:
I am using VC++2010, so except a few supported features like decltype pre-C++11 is required.
Given a C++ identifier, is it possible to use some meta-programming techniques to check if that identifier is a type name or variable name. In order words, given the code below :
void f() {
s=(uint32)-1;
}
is it possible to somehow identify if uint32 is:
the name of a variable which means the RHS of the assignment is a subtraction;or
a type name where the RHS operand is literal -1 typecasted to (uint32)
by something like mytemplate<uint32> or similiar?
Rationale: I am using my own in-house developed mini parser to analyze/instrument C++ source code. But my mini parser lacks many features like building a table of identifier types so it always interpret the above code as either subtraction or typecast. My parser is able to modify the source code so that I can insert/modify anything surrounding the uint32 e.g.
void f() {
s=(mytemplate<uint32>(...))-1;
s=myfunc(uint32)-1;
}
But my inserted code will cause syntax error depending on the meaning of the identifier uint32 (type name Vs variable name). I am looking for some generic code that I can insert to cater for both cases.

Compiler: limitation of lexical analysis

In classic Compiler theory, the first 2 phases are Lexical Analysis and Parsing. They're in a pipeline. Lexical Analysis recognizes tokens as the input of Parsing.
But I came across some cases which are hard to be correctly recognized in Lexical Analysis. For example, the following code about C++ template:
map<int, vector<int>>
the >> would be recognized as bitwise right shift in a "regular" Lexical Analysis, but it's not correct. My feeling is it's hard to divide the handling of this kind of grammars into 2 phases, the lexing work has to be done in the parsing phase, because correctly parsing the >> relies on the grammar, not only the simple lexical rule.
I'd like to know the theory and practice about this problem. Also, I'd like to know how does C++ compiler handle this case?
The C++ standard requires that an implementation perform lexical analysis to produce a stream of tokens, before the parsing stage. According to the lexical analysis rules, two consecutive > characters (not followed by =) will always be interpreted as one >> token. The grammar provided with the C++ standard is defined in terms of these tokens.
The requirement that in certain contexts (such as when expecting a > within a template-id) the implementation should interpret >> as two > is not specified within the grammar. Instead the rule is specified as a special case:
14.2 Names of template specializations [temp.names] ###
After name lookup (3.4) finds that a name is a template-name or that an operator-function-id or a literal-operator-id refers to a set of overloaded functions any member of which is a function template if this is
followed by a <, the < is always taken as the delimiter of a template-argument-list and never as the less-than
operator. When parsing a template-argument-list, the first non-nested > is taken as the ending delimiter
rather than a greater-than operator. Similarly, the first non-nested >> is treated as two consecutive but
distinct > tokens, the first of which is taken as the end of the template-argument-list and completes the
template-id. [ Note: The second > token produced by this replacement rule may terminate an enclosing
template-id construct or it may be part of a different construct (e.g. a cast).—end note ]
Note the earlier rule, that in certain contexts < should be interpreted as the < in a template-argument-list. This is another example of a construct that requires context in order to disambiguate the parse.
The C++ grammar contains many such ambiguities which cannot be resolved during parsing without information about the context. The most well known of these is known as the Most Vexing Parse, in which an identifier may be interpreted as a type-name depending on context.
Keeping track of the aforementioned context in C++ requires an implementation to perform some semantic analysis in parallel with the parsing stage. This is commonly implemented in the form of semantic actions that are invoked when a particular grammatical construct is recognised in a given context. These semantic actions then build a data structure that represents the context and permits efficient queries. This is often referred to as a symbol table, but the structure required for C++ is pretty much the entire AST.
These kind of context-sensitive semantic actions can also be used to resolve ambiguities. For example, on recognising an identifier in the context of a namespace-body, a semantic action will check whether the name was previously defined as a template. The result of this will then be fed back to the parser. This can be done by marking the identifier token with the result, or replacing it with a special token that will match a different grammar rule.
The same technique can be used to mark a < as the beginning of a template-argument-list, or a > as the end. The rule for context-sensitive replacement of >> with two > poses essentially the same problem and can be resolved using the same method.
You are right, the theoretical clean distinction between lexer and parser is not always possible. I remember a porject I worked on as a student. We were to implement a C compiler, and the grammar we used as a basis would treat typedefined names as types in some cases, as identifiers in others. So the lexer had to switch between these two modes. The way I implemented this back then was using special empty rules, which reconfigured the lexer depending on context. To accomplish this, it was vital to know that the parser would always use exactly one token of look-ahead. So any change to lexer behaviour would have to occur at least one lexiacal token before the affected location. In the end, this worked quite well.
In the C++ case of >> you mention, I don't know what compilers actually do. willj quoted how the specification phrases this, but implementations are allowed to do things differently internally, as long as the visible result is the same. So here is how I'd try to tackle this: upon reading a >, the lexer would emit token GREATER, but also switch to a state where each subsequent > without a space in between would be lexed to GREATER_REPEATED. Any other symbol would switch the state back to normal. Instead of state switches, you could also do this by lexing the regular expression >+, and emitting multiple tokens from this rule. In the parser, you could then use rules like the following:
rightAngleBracket: GREATER | GREATER_REPEATED;
rightShift: GREATER GREATER_REPEATED;
With a bit of luck, you could make template argument rules use rightAngleBracket, while expressions would use rightShift. Depending on how much look-ahead your parser has, it might be neccessary to introduce additional non-terminals to hold longer sequences of ambiguous content, until you encounter some context which allows you to eventually make the decision between these cases.

Why using define or static const? [duplicate]

Is it better to use static const vars than #define preprocessor? Or maybe it depends on the context?
What are advantages/disadvantages for each method?
Pros and cons between #defines, consts and (what you have forgot) enums, depending on usage:
enums:
only possible for integer values
properly scoped / identifier clash issues handled nicely, particularly in C++11 enum classes where the enumerations for enum class X are disambiguated by the scope X::
strongly typed, but to a big-enough signed-or-unsigned int size over which you have no control in C++03 (though you can specify a bit field into which they should be packed if the enum is a member of struct/class/union), while C++11 defaults to int but can be explicitly set by the programmer
can't take the address - there isn't one as the enumeration values are effectively substituted inline at the points of usage
stronger usage restraints (e.g. incrementing - template <typename T> void f(T t) { cout << ++t; } won't compile, though you can wrap an enum into a class with implicit constructor, casting operator and user-defined operators)
each constant's type taken from the enclosing enum, so template <typename T> void f(T) get a distinct instantiation when passed the same numeric value from different enums, all of which are distinct from any actual f(int) instantiation. Each function's object code could be identical (ignoring address offsets), but I wouldn't expect a compiler/linker to eliminate the unnecessary copies, though you could check your compiler/linker if you care.
even with typeof/decltype, can't expect numeric_limits to provide useful insight into the set of meaningful values and combinations (indeed, "legal" combinations aren't even notated in the source code, consider enum { A = 1, B = 2 } - is A|B "legal" from a program logic perspective?)
the enum's typename may appear in various places in RTTI, compiler messages etc. - possibly useful, possibly obfuscation
you can't use an enumeration without the translation unit actually seeing the value, which means enums in library APIs need the values exposed in the header, and make and other timestamp-based recompilation tools will trigger client recompilation when they're changed (bad!)
consts:
properly scoped / identifier clash issues handled nicely
strong, single, user-specified type
you might try to "type" a #define ala #define S std::string("abc"), but the constant avoids repeated construction of distinct temporaries at each point of use
One Definition Rule complications
can take address, create const references to them etc.
most similar to a non-const value, which minimises work and impact if switching between the two
value can be placed inside the implementation file, allowing a localised recompile and just client links to pick up the change
#defines:
"global" scope / more prone to conflicting usages, which can produce hard-to-resolve compilation issues and unexpected run-time results rather than sane error messages; mitigating this requires:
long, obscure and/or centrally coordinated identifiers, and access to them can't benefit from implicitly matching used/current/Koenig-looked-up namespace, namespace aliases etc.
while the trumping best-practice allows template parameter identifiers to be single-character uppercase letters (possibly followed by a number), other use of identifiers without lowercase letters is conventionally reserved for and expected of preprocessor defines (outside the OS and C/C++ library headers). This is important for enterprise scale preprocessor usage to remain manageable. 3rd party libraries can be expected to comply. Observing this implies migration of existing consts or enums to/from defines involves a change in capitalisation, and hence requires edits to client source code rather than a "simple" recompile. (Personally, I capitalise the first letter of enumerations but not consts, so I'd be hit migrating between those two too - maybe time to rethink that.)
more compile-time operations possible: string literal concatenation, stringification (taking size thereof), concatenation into identifiers
downside is that given #define X "x" and some client usage ala "pre" X "post", if you want or need to make X a runtime-changeable variable rather than a constant you force edits to client code (rather than just recompilation), whereas that transition is easier from a const char* or const std::string given they already force the user to incorporate concatenation operations (e.g. "pre" + X + "post" for string)
can't use sizeof directly on a defined numeric literal
untyped (GCC doesn't warn if compared to unsigned)
some compiler/linker/debugger chains may not present the identifier, so you'll be reduced to looking at "magic numbers" (strings, whatever...)
can't take the address
the substituted value need not be legal (or discrete) in the context where the #define is created, as it's evaluated at each point of use, so you can reference not-yet-declared objects, depend on "implementation" that needn't be pre-included, create "constants" such as { 1, 2 } that can be used to initialise arrays, or #define MICROSECONDS *1E-6 etc. (definitely not recommending this!)
some special things like __FILE__ and __LINE__ can be incorporated into the macro substitution
you can test for existence and value in #if statements for conditionally including code (more powerful than a post-preprocessing "if" as the code need not be compilable if not selected by the preprocessor), use #undef-ine, redefine etc.
substituted text has to be exposed:
in the translation unit it's used by, which means macros in libraries for client use must be in the header, so make and other timestamp-based recompilation tools will trigger client recompilation when they're changed (bad!)
or on the command line, where even more care is needed to make sure client code is recompiled (e.g. the Makefile or script supplying the definition should be listed as a dependency)
My personal opinion:
As a general rule, I use consts and consider them the most professional option for general usage (though the others have a simplicity appealing to this old lazy programmer).
Personally, I loathe the preprocessor, so I'd always go with const.
The main advantage to a #define is that it requires no memory to store in your program, as it is really just replacing some text with a literal value. It also has the advantage that it has no type, so it can be used for any integer value without generating warnings.
Advantages of "const"s are that they can be scoped, and they can be used in situations where a pointer to an object needs to be passed.
I don't know exactly what you are getting at with the "static" part though. If you are declaring globally, I'd put it in an anonymous namespace instead of using static. For example
namespace {
unsigned const seconds_per_minute = 60;
};
int main (int argc; char *argv[]) {
...
}
If this is a C++ question and it mentions #define as an alternative, then it is about "global" (i.e. file-scope) constants, not about class members. When it comes to such constants in C++ static const is redundant. In C++ const have internal linkage by default and there's no point in declaring them static. So it is really about const vs. #define.
And, finally, in C++ const is preferable. At least because such constants are typed and scoped. There are simply no reasons to prefer #define over const, aside from few exceptions.
String constants, BTW, are one example of such an exception. With #defined string constants one can use compile-time concatenation feature of C/C++ compilers, as in
#define OUT_NAME "output"
#define LOG_EXT ".log"
#define TEXT_EXT ".txt"
const char *const log_file_name = OUT_NAME LOG_EXT;
const char *const text_file_name = OUT_NAME TEXT_EXT;
P.S. Again, just in case, when someone mentions static const as an alternative to #define, it usually means that they are talking about C, not about C++. I wonder whether this question is tagged properly...
#define can lead to unexpected results:
#include <iostream>
#define x 500
#define y x + 5
int z = y * 2;
int main()
{
std::cout << "y is " << y;
std::cout << "\nz is " << z;
}
Outputs an incorrect result:
y is 505
z is 510
However, if you replace this with constants:
#include <iostream>
const int x = 500;
const int y = x + 5;
int z = y * 2;
int main()
{
std::cout << "y is " << y;
std::cout << "\nz is " << z;
}
It outputs the correct result:
y is 505
z is 1010
This is because #define simply replaces the text. Because doing this can seriously mess up order of operations, I would recommend using a constant variable instead.
Using a static const is like using any other const variables in your code. This means you can trace wherever the information comes from, as opposed to a #define that will simply be replaced in the code in the pre-compilation process.
You might want to take a look at the C++ FAQ Lite for this question:
http://www.parashift.com/c++-faq-lite/newbie.html#faq-29.7
A static const is typed (it has a type) and can be checked by the compiler for validity, redefinition etc.
a #define can be redifined undefined whatever.
Usually you should prefer static consts. It has no disadvantage. The prprocessor should mainly be used for conditional compilation (and sometimes for really dirty trics maybe).
Defining constants by using preprocessor directive #define is not recommended to apply not only in C++, but also in C. These constants will not have the type. Even in C was proposed to use const for constants.
Always prefer to use the language features over some additional tools like preprocessor.
ES.31: Don't use macros for constants or "functions"
Macros are a major source of bugs. Macros don't obey the usual scope
and type rules. Macros don't obey the usual rules for argument
passing. Macros ensure that the human reader sees something different
from what the compiler sees. Macros complicate tool building.
From C++ Core Guidelines
As a rather old and rusty C programmer who never quite made it fully to C++ because other things came along and is now hacking along getting to grips with Arduino my view is simple.
#define is a compiler pre processor directive and should be used as such, for conditional compilation etc.. E.g. where low level code needs to define some possible alternative data structures for portability to specif hardware. It can produce inconsistent results depending on the order your modules are compiled and linked. If you need something to be global in scope then define it properly as such.
const and (static const) should always be used to name static values or strings. They are typed and safe and the debugger can work fully with them.
enums have always confused me, so I have managed to avoid them.
Please see here: static const vs define
usually a const declaration (notice it doesn't need to be static) is the way to go
If you are defining a constant to be shared among all the instances of the class, use static const. If the constant is specific to each instance, just use const (but note that all constructors of the class must initialize this const member variable in the initialization list).

what are computational macros and syntax macros

I was reading the paper "An Object oriented preprocessor fit for C++".
"http://www.informatik.uni-bremen.de/st/lehre/Arte-fakt/Seminar/papers/17/An%20Object-Oriented%20preprocessor%20fit%20for%20C++.pdf"
It discusses three different types of macros.
text macros. // pretty much the same as C preprocessor
computational macros // text replaced as a result of computation
syntax macros. // text replaced by the syntax tree representating a linguistically consistent construct.
Can somebody please explain the last two type of macros in an elaborate way.
It says that inline functions and templates are examples of computational macros, how ?
Looking at the original Cheatham's paper from 1966 that the Willink's and Muchnick's paper refer to I'd summarize the different macro types like this:
Text macros do text replacements before scanning and parsing.
Syntactic macros are processed during scanning and parsing. Calling a syntax macro replaces the macro call with another piece of AST.
Computational macros can happen at any point after the AST has been built by the scanner and the parser. The point is that at this point we are no longer processing any text but instead manipulating the nodes of the AST i.e., we are dealing with objects that might already even have semantic information attached to them.
I'm no C++ internals expert but I'd assume that the inlining of function calls and instantiating templates is about manipulating the syntax tree before, while and after it's been annotated with semantic information necessary to compile it properly as both of those seem to assume knowing a lot of stuff (like type info and if something is good to be inlined) that is not yet known during scanning and parsing.
By 2. it sounds like they mean that some computation is done at compile time and the resulting instructions executed at runtime only involve the result. I wouldn't think inline functions particularly represent this, but template meta-programming does exactly this. Also constexpr in C++11.
I think 3. could also be represented by the use of templates. A template does represent a syntax tree, and instantiating it involves taking the generic syntax tree, filling in the parameterized, unknown bits, and using the resulting syntax tree.