Double closing angle brackets (>>) generate syntax error in SPECIFIC case - c++

Eclipse (Luna, 4.4.2) tells me that I have a syntax error on the following line:
static_cast<Vec<int, DIM>>(a.mul(b));
I remembered that double closing angle brackets >> can lead to problems with some compilers, so I put a blank in between: > >. The syntax error disappears.
BUT I have many >> in my program where no syntax error is detected, such as:
Node<Element<DIM>> * e= a.get();
Why do I get an error the above mentioned specific case? This is NOT a duplicate to error: 'varName' was not declared in this scope, since I'm specifically asking why my compiler does accept a >> sometimes, but not always.

You have used a pre c++11 standard compiler. The older standard had a problem letting the parser disambiguate a pair of closing angle brackets >> used in a nested template type specifier, from the operator>>(). Thus you had to write a space between them.
The samples like >>> or >>* are falling under a different case for the old parsers, thus they work without error message.
I have to admit, I don't actually know what exactly was done in the c++11 (the current) standards definitions, that this situation can be clearly disambiguated by a c++11 compliant parser.

The "right angle bracket fix" is found in §14.2 [temp.names]/p3 (emphasis mine):
When parsing a template-argument-list, the first non-nested > is
taken as the ending delimiter rather than a greater-than operator.
Similarly, the first non-nested >> is treated as two consecutive but
distinct > tokens, the first of which is taken as the end of the
template-argument-list and completes the
template-id. [ Note: The second > token produced by this replacement rule may terminate an enclosing
template-id construct or it may be part of a different construct (e.g. a cast).—end note ]
If the static_cast is otherwise valid, then both pieces of code in the OP are perfectly valid in C++11 and perfectly invalid in C++03. If your IDE reports an error on one but not the other, then it's a bug with that IDE.
It's difficult (and also somewhat pointless) for us to speculate on the source of the bug. A potential cause can be that the second > are closing different constructs (the first case it's closing a cast, the second is closing a template argument list) and the parser's implementation somehow missed the "second > being part of a different construct" case. But that's just wild speculation.

Related

Why is the semicolon at the end of the init-statement within the for statement mandatory?

This is how the C++17 standard defines the for statement:
for ( init-statement conditionₒₚₜ ; expressionₒₚₜ ) statement
I've also looked in https://en.cppreference.com/w/cpp/language/for:
attr(optional) for ( init-statement condition(optional) ; iteration_expression(optional) ) statement
Therefore, I can only understand that init-statement isn't optional when using a for loop. In other words, one has to initialize some variable inside the header in order to work with this flow of control mechanism. Well, this doesn't seem to be the case, because when I type code such as
for (; a != b; ++a)
Presuming that I have already defined these variables, it runs just fine. Not only did I not declare a variable in the header, but also referenced other variables previously defined outside the loop. If I were to provide an init-statement, its object would only be usable inside the for loop, but it seems I can use variables declared elsewhere just fine.
Having come to this conclusion, I thought I didn't need the first part: tried removing the semicolon to make it more readable (and well, just for the heck of it). It won't compile now. Compiler says it expected a ;, calculates a != b as if it weren't inside a for loop: "C4552: '!=': result of expression not used" and finally concludes: "C2143: syntax error: missing ';' before ')'".
You don't need an init-statement, but you do need the semicolon. Why didn't these resources I cited initially made this clear, or is it implied in some way I am blind to?
The semicolon is mandatory because init-statement includes the semicolon.
Quote from N3337 6.5 Iteration statements:
for (for-init-statement condition_{opt}; expression_{opt}) statement
for-init-statement:
expression-statement
simple-declaration
6.2 Expression statement:
expression-statement:
expression_{opt} ;
7 Declarations:
simple-declaration:
decl-specifier-seq_{opt} init-declaratior-list_{opt} ;
attribute-specifier-seq decl-specifier-seq_{opt} init-declarator-list ;
If you look up init-statement and trace through the possibilities you'll find that they all require a trailing semi-colon.
iteration-statement:
    for ( for-init-statement conditionₒₚₜ ; expressionₒₚₜ ) statement
for-init-statement:
    expression-statement
    simple-declaration
expression-statement:
    expressionₒₚₜ ;
simple-declaration:
    attribute-specifier-seqₒₚₜ decl-specifier-seqₒₚₜ init-declarator-listₒₚₜ ;
A for-init-statement is either an expression-statement or a simple-declaration. An expression-statement has an optional expression but a mandatory semi-colon. Similarly, a simple-declaration's components are all optional except for the final semi-colon.
Why didn't these resources I cited initially made this clear, or is it implied in some way I am blind to?
Language Standard
The first resource you cited is the C++17 standard. The language standard is written to be precise and consistent. Often (as in your case) details are deferred to other sections so that each definition appears only once, no matter how often it is used. Ease of reading by the casual coder is not a priority. The first sentence of the standard sets the scope of the document.
This document specifies requirements for implementations of the C++ programming language.
Note who the requirements are for. They are for implementations, which are more commonly, yet imprecisely, called "compilers". The standard is not written for those writing C++ programs; it is written for those writing C++ compilers (and the other parts of the implementation). That is why this resource is not concerned about making points clear to a programmer's perspective.
cppreference.com
Coders often have to do a lot of interpretation to extract what they need from the standard. That is why there are books teaching the language and resources like your second citation, cppreference.com. Contrast the standard's stated purpose with that of cppreference.com:
Our goal is to provide programmers with a complete online reference for the C and C++ languages and standard libraries, i.e. a more convenient version of the C and C++ standards.
That website's target audience is programmers, rather than implementors. Hence the harsh precision of the standard has been diluted with a bit more explanation. It tries to strike a balance between rigorous correctness and readability, with a touch of empathy for the coder who needs to check a detail of some aspect of the language.
In this particular case, I think they did a reasonable job. I guess you just overlooked the explanations?
You copied the "formal syntax" line. Right after that is an "informal syntax" line that replaces the confusing init-statement with declaration-or-expression(optional) ; making it clear that the semicolon is required, but nothing need appear before it.
Right after the the syntax lines is a list of relevant definitions. The definition for init-statement calls out that it "may be a null statement ;". Also, the definition ends with
Note that any init-statement must end with a semicolon ;, which is why it is often described informally as an expression or a declaration followed by a semicolon.
Note that the definition is directly in the relevant page. This is one way the website differs in presentation from the standard. Since the website is not the ultimate authority, there is less fallout if it is inconsistent. That gives it the freedom to repeat itself and risk definitions not being identical. Case in point: the definition of init-statement is repeated in the if statement article (where the init-statement is optional).
Perhaps the important takeaway is to read those definitions instead of assuming you already know what the meanings are?

How C++ compilers differentiate the token >> for binary operator, and for template

My doubt is about the parser of C++ compilers as Clang, how the compilers handle the operator >> to know when it is a binary operator and when it is closing a template like: std::vector<std::tuple<int, double>>, I imagine that is done in parser time, so the better way to solve that is on lexical or use only > as token, and solve the problem in the grammar parser?
It's actually quite simple: if there is an open template bracket visible, a > closes it, even if the > would otherwise form part of a >> operator. (This doesn't apply to > characters which are part of other tokens, such as >=.) This change to C++ syntax was part of C++11, and is described in paragraph 3 of §13.3 [temp.names].
An open template bracket is not visible if the > is inside a parenthetically nested syntax. So the >> in both T<sizeof a[x >> 1]> and T<(x >> 1)> are right shift operators, while T<x >> 1> probably does not parse as expected.
The two implementation strategies are both workable, depending on where you want to put the complexity. If the lexer never generates a >> token; the parser can check that the > tokens in expr '>' '>' expr are adjacent by looking at their source locations. There will be a shift-reduce conflict, which will have to be resolved in favour of reducing the template parameter list. This works because it happens that there is no ambiguity created by separating >> into two tokens, but that's not a general rule: a + ++ b is different from a ++ + b; if the lexer were only generating + tokens, that would be ambiguous.
It's not too complicated to resolve the issue with a lexer hack, if you are prepared to have your lexer track parenthesis depth. That means the lexer has to know whether a < is a template bracket or a comparison operator, but it is quite possible that it does.
This is the more interesting question (at least imho): how is a < recognised as a template bracket rather than a less-than operator? Here there really is semantic feedback: it is a template bracket if it follows a name which designates a template.
This is not a simple determination. The name could be a class or union member, and even a member of a specialisation of a templated class or union. In the latter case, it might be necessary to calculate the values of compile-time constant expressions and then do template deduction in order to decide what the name designates.

Compiler: limitation of lexical analysis

In classic Compiler theory, the first 2 phases are Lexical Analysis and Parsing. They're in a pipeline. Lexical Analysis recognizes tokens as the input of Parsing.
But I came across some cases which are hard to be correctly recognized in Lexical Analysis. For example, the following code about C++ template:
map<int, vector<int>>
the >> would be recognized as bitwise right shift in a "regular" Lexical Analysis, but it's not correct. My feeling is it's hard to divide the handling of this kind of grammars into 2 phases, the lexing work has to be done in the parsing phase, because correctly parsing the >> relies on the grammar, not only the simple lexical rule.
I'd like to know the theory and practice about this problem. Also, I'd like to know how does C++ compiler handle this case?
The C++ standard requires that an implementation perform lexical analysis to produce a stream of tokens, before the parsing stage. According to the lexical analysis rules, two consecutive > characters (not followed by =) will always be interpreted as one >> token. The grammar provided with the C++ standard is defined in terms of these tokens.
The requirement that in certain contexts (such as when expecting a > within a template-id) the implementation should interpret >> as two > is not specified within the grammar. Instead the rule is specified as a special case:
14.2 Names of template specializations [temp.names] ###
After name lookup (3.4) finds that a name is a template-name or that an operator-function-id or a literal-operator-id refers to a set of overloaded functions any member of which is a function template if this is
followed by a <, the < is always taken as the delimiter of a template-argument-list and never as the less-than
operator. When parsing a template-argument-list, the first non-nested > is taken as the ending delimiter
rather than a greater-than operator. Similarly, the first non-nested >> is treated as two consecutive but
distinct > tokens, the first of which is taken as the end of the template-argument-list and completes the
template-id. [ Note: The second > token produced by this replacement rule may terminate an enclosing
template-id construct or it may be part of a different construct (e.g. a cast).—end note ]
Note the earlier rule, that in certain contexts < should be interpreted as the < in a template-argument-list. This is another example of a construct that requires context in order to disambiguate the parse.
The C++ grammar contains many such ambiguities which cannot be resolved during parsing without information about the context. The most well known of these is known as the Most Vexing Parse, in which an identifier may be interpreted as a type-name depending on context.
Keeping track of the aforementioned context in C++ requires an implementation to perform some semantic analysis in parallel with the parsing stage. This is commonly implemented in the form of semantic actions that are invoked when a particular grammatical construct is recognised in a given context. These semantic actions then build a data structure that represents the context and permits efficient queries. This is often referred to as a symbol table, but the structure required for C++ is pretty much the entire AST.
These kind of context-sensitive semantic actions can also be used to resolve ambiguities. For example, on recognising an identifier in the context of a namespace-body, a semantic action will check whether the name was previously defined as a template. The result of this will then be fed back to the parser. This can be done by marking the identifier token with the result, or replacing it with a special token that will match a different grammar rule.
The same technique can be used to mark a < as the beginning of a template-argument-list, or a > as the end. The rule for context-sensitive replacement of >> with two > poses essentially the same problem and can be resolved using the same method.
You are right, the theoretical clean distinction between lexer and parser is not always possible. I remember a porject I worked on as a student. We were to implement a C compiler, and the grammar we used as a basis would treat typedefined names as types in some cases, as identifiers in others. So the lexer had to switch between these two modes. The way I implemented this back then was using special empty rules, which reconfigured the lexer depending on context. To accomplish this, it was vital to know that the parser would always use exactly one token of look-ahead. So any change to lexer behaviour would have to occur at least one lexiacal token before the affected location. In the end, this worked quite well.
In the C++ case of >> you mention, I don't know what compilers actually do. willj quoted how the specification phrases this, but implementations are allowed to do things differently internally, as long as the visible result is the same. So here is how I'd try to tackle this: upon reading a >, the lexer would emit token GREATER, but also switch to a state where each subsequent > without a space in between would be lexed to GREATER_REPEATED. Any other symbol would switch the state back to normal. Instead of state switches, you could also do this by lexing the regular expression >+, and emitting multiple tokens from this rule. In the parser, you could then use rules like the following:
rightAngleBracket: GREATER | GREATER_REPEATED;
rightShift: GREATER GREATER_REPEATED;
With a bit of luck, you could make template argument rules use rightAngleBracket, while expressions would use rightShift. Depending on how much look-ahead your parser has, it might be neccessary to introduce additional non-terminals to hold longer sequences of ambiguous content, until you encounter some context which allows you to eventually make the decision between these cases.

Why I got "operation may be undefined" in Statement Expression in C++?

to describe the problem simply, please have a look at the code below:
int main()
{
int a=123;
({if (a) a=0;});
return 0;
}
I got this warning from [-Wsequence-point]
Line 4: warning: operation on 'a' may be undefined
my g++ version is 4.4.5
I'll appreciate whoever would explain this simple problem.
btw you could find my original program and original problem in #7 in this Chinese site (not necessary)
UPD1:
though to change the code into ({if(a) a=0; a;}) can avoid the warning, but I recognized that the real reason of the problem may not be The last thing in the compound statement should be an expression followed by a semicolon.
because the documentary also said If you use some other kind of statement last within the braces, the construct has type void, and thus effectively no value.
an example can show it:
int main()
{
int a=123, b;
({;});
({if (a) b=0;});
return 0;
}
and this code got no warnings!
so I think the real reason is something about sequence point.
please help!
UPD2:
sorry to #AndyProwl for having unaccept his answer which was accepted before UPD1. following his advise I may ask a new question (UPD1 is a new question different from the original one). I'll accept his answer again because it surely avoids warnings anyhow.:)
If I decided to ask a new question, I'll update this question to add a link.
According to the C++ grammar, expressions (apart from lambda expressions perhaps, but that's a different story) cannot contain statements - including block statements. Therefore, I would say your code is ill-formed, and if GCC compiles it, it means this is a (strange) compiler extension.
You should consult the compiler's reference to figure out what semantics it is given (or not given, as the error message seems to suggest) to it.
EDIT:
As pointed out by Shafik Yaghmour in the comments, this appears to be a GNU extension. According to the documentation, the value of this "statement expression" is supposed to be the value of the last statement in the block, which should be an expression statement:
The last thing in the compound statement should be an expression followed by a semicolon; the value of this subexpression serves as the value of the entire construct. (If you use some other kind of statement last within the braces, the construct has type void, and thus effectively no value.)
Since the block in your example does not contain an expression statement as the last statement, GCC does not know how to evaluate that "statement expression" (not to be confused with "expression statement" - that's what should appear last in a statement expression).
To prevent GCC from complaining, therefore, you should do something like:
({if (a) a=0; a;});
// ^^
But honestly, I do not understand why one would ever need this thing in C++.

c++, how is trailing comma not an error and what happens? Foo x, y,;

Someone pointed out to me that I had what looks like a typo in some c++ code:
protected:
Foo x, y,;
I would have thought the trailing comma would be an error, but apparently it isn't? Is this undefined, or what happens? Presumably something bad, since a code-checker program complained about it.
The relevant grammar production is in §9.2:
member-declarator-list:
member-declarator
member-declarator-list , member-declarator
The comma is only allowed to separate the declarators (names). member-declarator may not itself contain a comma.
EDIT: here is member-declarator… it's not quite as self-contained, the syntax for declarators is in general a cobweb.
member-declarator:
declarator virt-specifier-seq(opt) pure-specifier(opt)
declarator brace-or-equal-initializer(opt)
identifier(opt) attribute-specifier-seq(opt) : constant-expression
Incorrect grammar is not undefined behavior; a compiler allowing a misplaced comma has a bug. Rejecting that sort of thing is a requirement of the standard.
Note, trailing commas are allowed in enumeration definitions and brace-initializers. I think both cases were added by C++11 to simplify writing source code generators. (The preprocessor, which most often gets that job, has a tough time even with such simple requirements.) Typically a simple generator might avoid creating declarations with multiple names, because due to the complicated grammar, it's a can of worms. On the other hand, an empty declaration consisting of ; is allowed, as is a semicolon after a member function definition.
My observations
GCC 4.6.2:
void myFunc()
{
int x, y, ; // <-- Syntax error
}
But
class MyClass
{
int x, y,; // <-- No error (one extra comma) but last comma is ignored
};
MSVC 2008:
Both of them make errors
OpenWatcom 1.8:
Both of them make errors