Couln't someone explain the grammar of typedef once and for all?

Couln't someone explain the grammar of typedef once and for all? - c++

I know how to declare aliases for simple types, like class types, primitive types and, say, pointers to functions returning the value of that types. Actually:
typedef int T; //T := int
typedef int* T; // T := int*
typedef int (*T)() //T := int (*)(). OK, but it's a bit unclear to me.
//Seems a little bit confused
typedef int (*T[])() // T := array of int(*)(). Totally confused. What the hell is going on?
I can't understand how the compiler should parse such typedef declarations. Maybe someone can explain on the simple example that I cited? I know, that c++ introduced alias decalrtion as follows:
using T = int*;
It could be more readable, but now I'm interested in only typedef decalration.

The grammar of a typedef is exactly the same as that of a variable
declaration; the only difference is that the name being declared becomes
an alias for the type, rather than an object, reference or function.
Note that typedef is part of the decl-specifier-seq of the
declaration; a full declaration consists of three parts: an
attribute-specifier-seq (new to C++11), a decl-specifier-seq, and an
init-declarator-list, in that order. All parts may in principle be
empty, but only for certain types of declarations; in the case of a
typedef, for example, only the attribute-specifier-seq may be empty.
To understand a declaration, you have to first break it down into the
three parts: the attribute-specifier-seq is easy: it will always be
within [[...]] and you won't see it too often, since it is very new,
and only for special uses. We'll ignore it for now. The
decl-specifier-seq is a sequence of keywords or symbols which name a
type (although there are special cases after some keywords, like
struct or enum); just collect all of the symbols until you encounter
something which isn't a keyword or a type. typedef included. Order
here isn't important, so:
int typedef const CI;
would be perfectly legal, although certainly not typical. If the
keyword typedef is present, the declaration is a typedef (which
means that some other keywords, like extern or static, aren't
allowed). The decl-specifier gives the final type in an English
expression of the type.
Everything that follows is part of the init-declarator-seq which is a
comma separated list of init-declarator. A typedef requires at
least one init-declarator, and in fact, doesn't allow the init part,
so it is just a declarator (but there can in fact be several, although
Microsoft is the only ones I know that go in for this bit of
obfuscation). Each declarator is basically an expression, with
the operators on the right (() and []) having precedence over the
operators on the left (* and &), and parentheses being used to
modify the precedence. So if you have something like (&ra)[10], ra
is a reference to an array[10] of... what:w
ever type is specified by the
decl-specified. Or where precedence is not given by parentheses:
*ra[10] is an array[10] of pointers to...

typedefs follow the same rule as variable declarations, so I will first cover these. The intended principle is: if you type the declaration as an expression, you will get the type. So let's analyse this variable:
int (*a[])();
Now we proceed step by step:
Typing (*a[42])() gives an int. Substitute x1 for (*a[42]). x1() is of type int, so clearly x1 (which is (*a[42])) is a function taking no parameters and returning an int.
Therefore, a[42] must be a pointer to a "function taking no parameters and returning an int."
Therefore, a must be an array of "pointers to a function taking no parameters and returning an int."
With typedefs, the only difference is that instead of the variable a, we're talking about its type. So typedef int (*T[])(); means:
T is the type a variable a would have if it were declared as int (*a[])();
So in your case, T is "an array of pointers to function-taking-no-parameter-and-returning-int."

I can't understand how the compiler should parse such typedef declarations.
Since this seemed to be a main topic of concern I decided to add an important note regarding the higher-level overview of how this is possible.
Since there is now context in the language you need to have information available about both the syntax and the semantics of the language when parsing.
In your example the original solution used the lex hack, which is the actual name for the method that C++ compiler designers implemented to solve the problem of the language no longer being context-free due to the typedef token. The basic idea of this 'hack' is to have an additional backchannel from the semantic analyzer to the lexer to provide the needed context.
There are also other ways to solve the problem of parsing context-sensitive grammars such has lexerless parsing.

Related

Anonymous callable object is not executed by a std::thread [duplicate]

On Wikipedia I found this:
A a( A() );
[This] could be disambiguated either as
a variable definition of class [A], taking an anonymous instance of class [A] or
a function declaration for a function which returns an object of type [A] and takes a single (unnamed) argument which is a function returning type [A] (and taking no input).
Most programmers expect the first, but the C++ standard requires it to be interpreted as the second.
But why? If the majority of the C++ community expects the former behavior, why not make it the standard? Besides, the above syntax is consistent if you don't take into account the parsing ambiguity.
Can someone please enlighten me? Why does the standard make this a requirement?

Let's say MVP didn't exist.
How would you declare a function?
A foo();
would be a variable definition, not a method declaration. Would you introduce a new keyword? Would you have a more awkward syntax for a function declaration? Or would you rather have
A foo;
define a variable and
A foo();
declare a function?
Your slightly more complicated example is just for consistency with this basic one. It's easier to say "everything that can be interpreted as a declaration, will be interpreted as a declaration" rather than "everything that can be interpreted as a declaration, will be interpreted as a declaration, unless it's a single variable definition, in which case it's a variable definition".
This probably isn't the motivation behind it though, but a reason it's a good thing.

For C++, it's pretty simple: because the rule was made that way in C.
In C, the ambiguity only arises with a typedef and some fairly obscure code. Almost nobody ever triggers it by accident -- in fact, it probably qualifies as rare except in code designed specifically to demonstrate the possibility. For better or worse, however, the mere possibility of the ambiguity meant somebody had to resolve it -- and if memory serves, it was resolved by none other than Dennis Ritchie, who decreed that anything that could be interpreted as a declaration would be a declaration, even if there was also an ambiguous interpretation as a definition.
C++ added the ability to use parentheses for initialization as well as function calls as grouping, and this moved the ambiguity from obscure to common. Changing it, however, would have required breaking the rule as it came from C. Resolving this particular ambiguity as most would expect, without creating half a dozen more that were even more surprising would probably have been fairly non-trivial as well, unless you were willing to throw away compatibility with C entirely.

This is just a guess, but it may be due to the fact that with the given approach you can get both behaviors:
A a( A() ); // this is a function declaration
A a( (A()) ); // this is a variable definition
If you were to change its behavior to be a variable definition, then function declarations would be considerably more complex.
typedef A subfunction_type();
A a( A() ); // this would be a variable declaration
A a( subfunction_type ); // this would be a function declaration??

It's a side-effect of the grammar being defined recursively.
It was not designed intentionally like that. It was discovered and documented as the most vexing parse.

Consider if the program were like so:
typedef struct A { int m; } A;
int main() { A a( A() ); }
This would be valid C, and there is only one possible interpretation allowed by the grammar of C: a is declared as a function. C only allows initialization using = (not parentheses), and does not allow A() to be interpreted as an expression. (Function-style casts are a C++-only feature.) This is not a "vexing parse" in C.
The grammar of C++ makes this example ambiguous, as Wikipedia points out. However, if you want C++ to give this program the same meaning as C, then, obviously, C++ compilers are going to have to interpret a as a function just like C compilers. Sure, C++ could have changed the meaning of this program, making a the definition of a variable of type A. However, incompatibilities with C were introduced into C++ only when there was a good reason to do it, and I would imagine that Stroustrup particularly wanted to avoid potentially silent breakages such as this, as they would cause great frustration for C users migrating to C++.
Thus, C++ interprets it as a function declaration too, and not a variable definition; and more generally, adopted the rule that if something that looks like a function-style cast can be interpreted as a declaration instead in its syntactic context, then it shall be. This eliminates potential for incompatibility with C for all vexing-parse situations, by ensuring that the interpretation that is not available in C (i.e. the one involving a function-style cast) is not taken.
Cfront 2.0 Selected Readings (page 1-42) mentions the C compatibility issue in the case of expression-declaration ambiguity, which is a related type of most vexing parse.

No particular reason, other than [possibly] the case that K-ballo identifies.
It's just legacy. There was already the int x; construction form so it never seemed like a reach to require T x; when no ctor args are in play.
In hindsight I'd imagine that if the language were designed from scratch today, then the MVP wouldn't exist... along with a ton of other C++ oddities.
Recall that C++ evolved over decades and, even now, is designed only by committee (see also: camel).

Difference between class_name object_name() , class object_name = class_name() and class_name object [duplicate]

On Wikipedia I found this:
A a( A() );
[This] could be disambiguated either as
a variable definition of class [A], taking an anonymous instance of class [A] or
a function declaration for a function which returns an object of type [A] and takes a single (unnamed) argument which is a function returning type [A] (and taking no input).
Most programmers expect the first, but the C++ standard requires it to be interpreted as the second.
But why? If the majority of the C++ community expects the former behavior, why not make it the standard? Besides, the above syntax is consistent if you don't take into account the parsing ambiguity.
Can someone please enlighten me? Why does the standard make this a requirement?

Let's say MVP didn't exist.
How would you declare a function?
A foo();
would be a variable definition, not a method declaration. Would you introduce a new keyword? Would you have a more awkward syntax for a function declaration? Or would you rather have
A foo;
define a variable and
A foo();
declare a function?
Your slightly more complicated example is just for consistency with this basic one. It's easier to say "everything that can be interpreted as a declaration, will be interpreted as a declaration" rather than "everything that can be interpreted as a declaration, will be interpreted as a declaration, unless it's a single variable definition, in which case it's a variable definition".
This probably isn't the motivation behind it though, but a reason it's a good thing.

For C++, it's pretty simple: because the rule was made that way in C.
In C, the ambiguity only arises with a typedef and some fairly obscure code. Almost nobody ever triggers it by accident -- in fact, it probably qualifies as rare except in code designed specifically to demonstrate the possibility. For better or worse, however, the mere possibility of the ambiguity meant somebody had to resolve it -- and if memory serves, it was resolved by none other than Dennis Ritchie, who decreed that anything that could be interpreted as a declaration would be a declaration, even if there was also an ambiguous interpretation as a definition.
C++ added the ability to use parentheses for initialization as well as function calls as grouping, and this moved the ambiguity from obscure to common. Changing it, however, would have required breaking the rule as it came from C. Resolving this particular ambiguity as most would expect, without creating half a dozen more that were even more surprising would probably have been fairly non-trivial as well, unless you were willing to throw away compatibility with C entirely.

This is just a guess, but it may be due to the fact that with the given approach you can get both behaviors:
A a( A() ); // this is a function declaration
A a( (A()) ); // this is a variable definition
If you were to change its behavior to be a variable definition, then function declarations would be considerably more complex.
typedef A subfunction_type();
A a( A() ); // this would be a variable declaration
A a( subfunction_type ); // this would be a function declaration??

It's a side-effect of the grammar being defined recursively.
It was not designed intentionally like that. It was discovered and documented as the most vexing parse.

Consider if the program were like so:
typedef struct A { int m; } A;
int main() { A a( A() ); }
This would be valid C, and there is only one possible interpretation allowed by the grammar of C: a is declared as a function. C only allows initialization using = (not parentheses), and does not allow A() to be interpreted as an expression. (Function-style casts are a C++-only feature.) This is not a "vexing parse" in C.
The grammar of C++ makes this example ambiguous, as Wikipedia points out. However, if you want C++ to give this program the same meaning as C, then, obviously, C++ compilers are going to have to interpret a as a function just like C compilers. Sure, C++ could have changed the meaning of this program, making a the definition of a variable of type A. However, incompatibilities with C were introduced into C++ only when there was a good reason to do it, and I would imagine that Stroustrup particularly wanted to avoid potentially silent breakages such as this, as they would cause great frustration for C users migrating to C++.
Thus, C++ interprets it as a function declaration too, and not a variable definition; and more generally, adopted the rule that if something that looks like a function-style cast can be interpreted as a declaration instead in its syntactic context, then it shall be. This eliminates potential for incompatibility with C for all vexing-parse situations, by ensuring that the interpretation that is not available in C (i.e. the one involving a function-style cast) is not taken.
Cfront 2.0 Selected Readings (page 1-42) mentions the C compatibility issue in the case of expression-declaration ambiguity, which is a related type of most vexing parse.

No particular reason, other than [possibly] the case that K-ballo identifies.
It's just legacy. There was already the int x; construction form so it never seemed like a reach to require T x; when no ctor args are in play.
In hindsight I'd imagine that if the language were designed from scratch today, then the MVP wouldn't exist... along with a ton of other C++ oddities.
Recall that C++ evolved over decades and, even now, is designed only by committee (see also: camel).

In OCaml Menhir, how to write a parser for C++/Rust/Java-style generics

In C++, a famous parsing ambiguity happens with code like
x<T> a;
Is it if T is a type, it is what it looks like (a declaration of a variable a of type x<T>, otherwise it is (x < T) > a (<> are comparison operators, not angle brackets).
In fact, we could make a change to make this become unambiguous: we can make < and > nonassociative. So x < T > a, without brackets, would not be a valid sentence anyway even if x, T and a were all variable names.
How could one resolve this conflict in Menhir? At first glance it seems we just can't. Even with the aforementioned modification, we need to lookahead an indeterminate number of tokens before we see another closing >, and conclude that it was a template instantiation, or otherwise, to conclude that it was an expression. Is there any way in Menhir to implement such an arbitrary lookahead?

Different languages (including the ones listed in your title) actually have very different rules for templates/generics (like what type of arguments there can be, where templates/generics can appear, when they are allowed to have an explicit argument list and what the syntax for template/type arguments on generic methods is), which strongly affect the options you have for parsing. In no language that I know is it true that the meaning of x<T> a; depends on whether T is a type.
So let's go through the languages C++, Java, Rust and C#:
In all four of those languages both types and functions/methods can be templates/generic. So we'll not only have to worry about an ambiguity with variable declarations, but also function/method calls: is f<T>(x) a function/method call with an explicit template/type argument or is it two relational operators with the last operand parenthesized? In all four languages template/generic functions/methods can be called without template/type when those can be inferred, but that inference isn't always possible, so just disallowing explicit template/type arguments for function/method calls is not an option.
Even if a language does not allow relational operators to be chained, we could get an ambiguity in expressions like this: f(a<b, c, d>(e)). Is this calling f with the three arguments a<b, c and d>e or with the single argument a<b, c, d>(e) calling a function/method named a with the type/template arguments b,c,d?
Now beyond this common foundation, most everything else is different between these languages:
Rust
In Rust the syntax for a variable declaration is let variableName: type = expr;, so x<T> a; couldn't possibly be a variable declaration because that doesn't match the syntax at all. In addition it's also not a valid expression statement (anymore) because comparison operators can't be chained (anymore).
So there's no ambiguity here or even a parsing difficulty. But what about function calls? For function calls, Rust avoided the ambiguity by simply choosing a different syntax to provide type arguments: instead of f<T>(x) the syntax is f::<T>(x). Since type arguments for function calls are optional when they can be inferred, this ugliness is thankfully not necessary very often.
So in summary: let a: x<T> = ...; is a variable declaration, f(a<b, c, d>(e)); calls f with three arguments and f(a::<b, c, d>(e)); calls a with three type arguments. Parsing is easy because all of these are sufficiently different to be distinguished with just one token of lookahead.
Java
In Java x<T> a; is in fact a valid variable declaration, but it is not a valid expression statement. The reason for that is that Java's grammar has a dedicated non-terminal for expressions that can appear as an expression statement and applications of relational operators (or any other non-assignment operators) are not matched by that non-terminal. Assignments are, but the left side of assignment expressions is similarly restricted. In fact, an identifier can only be the start of an expression statement if the next token is either a =, ., [ or (. So an identifier followed by a < can only be the start of a variable declaration, meaning we only need one token of lookahead to parse this.
Note that when accessing static members of a generic class, you can and must refer to the class without type arguments (i.e. FooClass.bar(); instead of FooClass<T>.bar()), so even in that case the class name would be followed by a ., not a <.
But what about generic method calls? Something like y = f<T>(x); could still run into the ambiguity because relational operators are of course allowed on the right side of =. Here Java chooses a similar solution as Rust by simply changing the syntax for generic method calls. Instead of object.f<T>(x) the syntax is object.<T>f(x) where the object. part is non-optional even if the object is this. So to call a generic method with an explicit type argument on the current object, you'd have to write this.<T>f(x);, but like in Rust the type argument can often be inferred, allowing you to just write f(x);.
So in summary x<T> a; is a variable declaration and there can't be expression statements that start with relational operations; in general expressions this.<T>f(x) is a generic method call and f<T>(x); is a comparison (well, a type error, actually). Again, parsing is easy.
C#
C# has the same restrictions on expression statements as Java does, so variable declarations aren't a problem, but unlike the previous two languages, it does allow f<T>(x) as the syntax for function calls. In order to avoid ambiguities, relational operators need to be parenthesized when used in a way that could also be valid call of a generic function. So the expression f<T>(x) is a method call and you'd need to add parentheses f<(T>(x)) or (f<T)>(x) to make it a comparison (though actually those would be type errors because you can't compare booleans with < or >, but the parser doesn't care about that) and similarly f(a<b, c, d>(e)) calls a generic method named a with the type arguments b,c,d whereas f((a<b), c, (d<e)) would involve two comparisons (and you can in fact leave out one of the two pairs of parentheses).
This leads to a nicer syntax for method calls with explicit type arguments than in the previous two languages, but parsing becomes kind of tricky. Considering that in the above example f(a<b, c, d>(e)) we can actually place an arbitrary number of arguments before d>(e) and a<b is a perfectly valid comparison if not followed by d>(e), we actually need an arbitrary amount of lookahead, backtracking or non-determinism to parse this.
So in summary x<T> a; is a variable declaration, there is no expression statement that starts with a comparison, f<T>(x) is a method call expression and (f<T)>(x) or f<(T>(x)) would be (ill-typed) comparisons. It is impossible to parse C# with menhir.
C++
In C++ a < b; is a valid (albeit useless) expression statement, the syntax for template function calls with explicit template arguments is f<T>(x) and a<b>c can be a perfectly valid (even well-typed) comparison. So statements like a<b>c; and expressions like a<b>(c) are actually ambiguous without additional information. Further, template arguments in C++ don't have to be types. That is, Foo<42> x; or even Foo<c> x; where c is defined as const int x = 42;, for example, could be perfectly valid instantiations of the Foo template if Foo is defined to take an integer as a template argument. So that's a bummer.
To resolve this ambiguity, the C++ grammar refers to the rule template-name instead of identifier in places where the name of a template is expected. So if we treated these as distinct entities, there'd be no ambiguity here. But of course template-name is defined simply as template-name: identifier in the grammar, so that seems pretty useless, ... except that the standard also says that template-name should only be matched when the given identifier names a template in the current scope. Similarly it says that identifiers should only be interpreted as variable names when they don't refer to a template (or type name).
Note that, unlike the previous three languages, C++ requires all types and templates to be declared before they can be used. So when we see the statement a<b>c;, we know that it can only be a template instantiation if we've previously parsed a declaration for a template named a and it is currently in scope.
So, if we keep track of scopes while parsing, we can simply use if-statements to check whether the name a refers to a previously parsed template or not in a hand-written parser. In parser generators that allow semantic predicates, we can do the same thing. Doing this does not even require any lookahead or backtracking.
But what about parser generators like yacc or menhir that don't support semantic predicates? For these we can use something known as the lexer hack, meaning we make the lexer generate different tokens for type names, template names and ordinary identifiers. Then we have a nicely unambiguous grammar that we can feed our parser generator. Of course the trick is getting the lexer to actually do that. In order to accomplish that, we need to keep track of which templates and types are currently in scope using a symbol table and then access that symbol table from the lexer. We'll also need to tell the lexer when we're reading the name of a definition, like the x in int x;, because then we want to generate a regular identifier even if a template named x is currently in scope (the definition int x; would shadow the template until the variable goes out of scope).
This same approach is used to resolve the casting ambiguity (is (T)(x) a cast of x to type T or a function call of a function named T?) in C and C++.
So in summary, foo<T> a; and foo<T>(x) are template instantiations if and only if foo is a template. Parsing's a bitch, but possible without arbitrary lookahead or backtracking and even using menhir when applying the lexer hack.

AFAIK C++'s template syntax is a well-known example of real-world non-LR grammar. Strictly speaking, it is not LR(k) for any finite k... So C++ parsers are usually hand-written with hacks (like clang) or generated by a GLR grammar (LR with branching). So in theory it is impossible to implement a complete C++ parser in Menhir, which is LR.
However even the same syntax for generics can be different. If generic types and expressions involving comparison operators never appear under the same context, the grammar may still be LR compatible. For example, consider the rust syntax for variable declaration (for this part only):
let x : Vec<T> = ...
The : token indicates that a type, rather than an expression follows, so in this case the grammar can be LR, or even LL (not verified).
So the final answer is, it depends. But for the C++ case it should be impossible to implement the syntax in Menhir.

C++ type suffix _t, _type or none

C++ sometimes uses the suffix _type on type definitions (e.g. std::vector<T>::value_type),
also sometimes _t (e.g. std::size_t), or no suffix (normal classes, and also typedefs like std::string which is really std::basic_string<...>)
Are there any good conventions on when to use which name?

As #MarcoA.'s answer correctly points out, the suffix _t is largely inherited from C (and in the global namespace - reserved for POSIX).
This leaves us with "no suffix" and _type.
Notice that there is no namespace-scope name in std ending in _type*; all such names are members of classes and class templates (or, in the case of regex-related types, of a nested namespace which largely plays a role of a class). I think that's the distinction: types themselves don't use the _type suffix.
The suffix _type is only used on members which denote types, and moreover, usually when they denote a type somewhat "external" to the containing class. Compare std::vector<T>::value_type and std::vector<T>::size_type, which come from the vector's template parameters T and Allocator, respectively, against std::vector<T>::iterator, which is "intrinsic" to the vector class template.
* Not entirely true, there are a few such names (also pointed out in a comment by #jrok): common_type, underlying_type, is_literal_type, true_type, false_type. In the first three, _type is not really a suffix, it's an actual part of the name (e.g. a metafunction to give the common type or the underlying type). With true_type and false_type, it is indeed a suffix (since true and false are reserved words). I would say it's a type which represents a true/false value in the type-based metaprogramming sense.

As a C heritage the _t (that used to mean "defined via typedef") syntax has been inherited (they're also SUS/POSIX-reserved in the global namespace).
Types added in C++ and not present in the original C language (e.g. size_type) don't need to be shortened.
Keep in mind that to the best of my knowledge this is more of an observation on an established convention rather than a general rule.

Member types are called type or something_type in the C++ standard library. This is readable and descriptive, and the added verbosity is not usually a problem because users don't normally spell out those type names: most of them are used in function signatures, then auto takes care of member function return types, and in C++14 the _t type aliases take care of type trait static type members.
That leads to the second point: Free-standing, non-member types are usually called something_t: size_t, int64_t, decay_t, etc. There is certainly an element of heritage from C in there, but the convention is maintained in the continuing evolution of C++. Presumably, succinctness is still a useful quality here, since those types are expected to be spelled out in general.
Finally, all the above only applies to what I might call "generic type derivation": Given X, give me some related type X::value_type, or given an integer, give me the 64-bit variant. The convention is thus restricted to common, vocabulary-type names. The class names of your actual business logic (including std::string) presumably do not warrant such a naming pattern, and I don't think many people would like to have to mangle every type name.
If you will, the _t and _type naming conventions apply primarily to the standard library and to certain aspects of the standard library style, but you do not need to take them as some kind of general mandate.

My answer is only relevant for type names within namespaces (that aren't std).
Use no suffix usually, and _type for enums
So, here's the thing: the identifier foo_type can be interpreted as
"the identifier of the type for things which are foo's" (e.g. size_type overall_size = v1.size() + v2.size();)
"the identifier of the type for things which are kinds, or types, of foo" (e.g. employment_type my_employment_type = FIXED_TERM;)
When you have typedef'ed enums in play, I think you would tend towards the second interpretation - otherwise, what would you call your enum types?
The common aversion to using no suffix is that seeing the identifier foo is confusing: Is it a variable, a specific foo? Or is it the type for foos? ... luckily, that's not an issue when you're in a namespace: my_ns::foo is obviously a type - you can't get it wrong (assuming you don't use global variables...); so no need for a prefix there.
PS - I employ the practice of suffixing my typedef's within classes with _type (pointer_type, value_type, reference_type etc.) I know that contradicts my advice above, but I somehow feel bad breaking with tradition on this point.
Now, you could ask - what happens if you have enums within classes? Well, I try to avoid those, and place my enum inside the surrounding namespace.

What is the purpose of the Most Vexing Parse?

On Wikipedia I found this:
A a( A() );
[This] could be disambiguated either as
a variable definition of class [A], taking an anonymous instance of class [A] or
a function declaration for a function which returns an object of type [A] and takes a single (unnamed) argument which is a function returning type [A] (and taking no input).
Most programmers expect the first, but the C++ standard requires it to be interpreted as the second.
But why? If the majority of the C++ community expects the former behavior, why not make it the standard? Besides, the above syntax is consistent if you don't take into account the parsing ambiguity.
Can someone please enlighten me? Why does the standard make this a requirement?

Let's say MVP didn't exist.
How would you declare a function?
A foo();
would be a variable definition, not a method declaration. Would you introduce a new keyword? Would you have a more awkward syntax for a function declaration? Or would you rather have
A foo;
define a variable and
A foo();
declare a function?
Your slightly more complicated example is just for consistency with this basic one. It's easier to say "everything that can be interpreted as a declaration, will be interpreted as a declaration" rather than "everything that can be interpreted as a declaration, will be interpreted as a declaration, unless it's a single variable definition, in which case it's a variable definition".
This probably isn't the motivation behind it though, but a reason it's a good thing.

For C++, it's pretty simple: because the rule was made that way in C.
In C, the ambiguity only arises with a typedef and some fairly obscure code. Almost nobody ever triggers it by accident -- in fact, it probably qualifies as rare except in code designed specifically to demonstrate the possibility. For better or worse, however, the mere possibility of the ambiguity meant somebody had to resolve it -- and if memory serves, it was resolved by none other than Dennis Ritchie, who decreed that anything that could be interpreted as a declaration would be a declaration, even if there was also an ambiguous interpretation as a definition.
C++ added the ability to use parentheses for initialization as well as function calls as grouping, and this moved the ambiguity from obscure to common. Changing it, however, would have required breaking the rule as it came from C. Resolving this particular ambiguity as most would expect, without creating half a dozen more that were even more surprising would probably have been fairly non-trivial as well, unless you were willing to throw away compatibility with C entirely.

This is just a guess, but it may be due to the fact that with the given approach you can get both behaviors:
A a( A() ); // this is a function declaration
A a( (A()) ); // this is a variable definition
If you were to change its behavior to be a variable definition, then function declarations would be considerably more complex.
typedef A subfunction_type();
A a( A() ); // this would be a variable declaration
A a( subfunction_type ); // this would be a function declaration??

It's a side-effect of the grammar being defined recursively.
It was not designed intentionally like that. It was discovered and documented as the most vexing parse.

Consider if the program were like so:
typedef struct A { int m; } A;
int main() { A a( A() ); }
This would be valid C, and there is only one possible interpretation allowed by the grammar of C: a is declared as a function. C only allows initialization using = (not parentheses), and does not allow A() to be interpreted as an expression. (Function-style casts are a C++-only feature.) This is not a "vexing parse" in C.
The grammar of C++ makes this example ambiguous, as Wikipedia points out. However, if you want C++ to give this program the same meaning as C, then, obviously, C++ compilers are going to have to interpret a as a function just like C compilers. Sure, C++ could have changed the meaning of this program, making a the definition of a variable of type A. However, incompatibilities with C were introduced into C++ only when there was a good reason to do it, and I would imagine that Stroustrup particularly wanted to avoid potentially silent breakages such as this, as they would cause great frustration for C users migrating to C++.
Thus, C++ interprets it as a function declaration too, and not a variable definition; and more generally, adopted the rule that if something that looks like a function-style cast can be interpreted as a declaration instead in its syntactic context, then it shall be. This eliminates potential for incompatibility with C for all vexing-parse situations, by ensuring that the interpretation that is not available in C (i.e. the one involving a function-style cast) is not taken.
Cfront 2.0 Selected Readings (page 1-42) mentions the C compatibility issue in the case of expression-declaration ambiguity, which is a related type of most vexing parse.

No particular reason, other than [possibly] the case that K-ballo identifies.
It's just legacy. There was already the int x; construction form so it never seemed like a reach to require T x; when no ctor args are in play.
In hindsight I'd imagine that if the language were designed from scratch today, then the MVP wouldn't exist... along with a ton of other C++ oddities.
Recall that C++ evolved over decades and, even now, is designed only by committee (see also: camel).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Couln't someone explain the grammar of typedef once and for all? - c++

Related

Anonymous callable object is not executed by a std::thread [duplicate]

Difference between class_name object_name() , class object_name = class_name() and class_name object [duplicate]

In OCaml Menhir, how to write a parser for C++/Rust/Java-style generics

C++ type suffix _t, _type or none

What is the purpose of the Most Vexing Parse?

Categories

Resources