Recursive descent parser, initialization of variable with itself, dilemma - c++

I want to know whether conforming C++ compiler is required to support the following code:
int a(a); // no other a is visible, we mean initialization of a with itself
Visual Studio 2013 does not support it (undeclared identifier), however some other compilers compile it.
And here is our dilemma: for possible expression check we need to dispose information about a (including its type) since it can be part of expression, however there is another possibility that it is function, in this case we are only constructing type expression (and symbol a is probably not in the symbol table yet).
I think that recursive descent parser is more likely to run in this dilemma, since it is very structural in nature and supporting this specific case will be like a special 'crutch' (type expression is being constructed when we encounter a inside () and we are at some level of recursion). So I assume that visual studio uses recursive descent strategy.
So with all this in mind, whether it is worth compiler writer's effort and whether it is justified to support such code (especially when using recursive descent)?

[basic.scope.pdecl]
The point of declaration for a name is immediately after its complete
declarator (Clause 8) and before its initializer (if any), except as
noted below. [ Example:
unsigned char x = 12;
{ unsigned char x = x; }
Here the second x is initialized with its own (indeterminate) value. —end example]
In int a(a);, the declarator ends at the opening brace of the initializer, so yes, compilers are required to allow this (GCC helpfully gives an -Wuninitialized warning if it's an automatic variable).

Related

Warn about UB in argument evaluation order

I recently faced a bug in a code like this
class C
{
public:
// foo return value depends on C's state
// AND each call to foo changes the state.
int foo(int arg) /*foo is not const-qualified.*/ {}
private:
// Some mutable state
};
C c;
bar(c.foo(42), c.foo(43))
The last call behaved differently on different platforms (which is perfectly legal due to undefined order of argument evaluation), and I fixed the bug.
But the rest codebase is large and I would like to spot all other UB of this type.
Is there a special compiler warning in GCC, Clang or MSVS for such cases?
And what is the ideomatic and lightweight way to prevent such bugs?
Argument order evaluation is unspecified rather than undefined.
Order of evaluation of the operands of almost all C++ operators (including the order of evaluation of function arguments in a function-call expression and the order of evaluation of the subexpressions within any expression) is unspecified. The compiler can evaluate operands in any order, and may choose another order when the same expression is evaluated again.
Since it is unspecified rather than undefined behavior, compilers are not required to issue diagnostics for it.
GCC and Clang do not have any general compiler option to issue diagnostics for unspecified behavior.
In GCC there is the option fstrong-eval-order which does this:
Evaluate member access, array subscripting, and shift expressions in left-to-right order, and evaluate assignment in right-to-left order, as adopted for C++17. Enabled by default with -std=c++17. -fstrong-eval-order=some enables just the ordering of member access and shift expressions, and is the default without -std=c++17.
There is also the option -Wreorder (C++ and Objective-C++ only) which does this:
Warn when the order of member initializers given in the code does not match the order in which they must be executed
But I do not think these options will be helpful in your particular case.
In the below statement, if you want the first argument to be evaluated before the second:
bar(c.foo(42), c.foo(43))
The simple way is to store the results of c.foo(42) and c.foo(43) in intermediate variables first and then call bar(). (Turn off compiler optimizations to avoid any reordering of statements by the compiler !!)
auto var1 = c.foo(42);
auto var2 = c.foo(43);
bar(var1, var2);
I guess that is how you must have fixed the bug.

Why gcc and clang both don't emit any warning?

Suppose we have code like this:
int check(){
int x = 5;
++x; /* line 1.*/
return 0;
}
int main(){
return check();
}
If line 1 is commented out and the compiler is started with all warnings enabled, it emits:
warning: unused variable ‘x’ [-Wunused-variable]
However if we un-comment line 1, i.e. increase x, then no warning is emitted.
Why is that? Increasing the variable is not really using it.
This happen in both GCC and Clang for both c and c++.
Yes.
x++ is the same as x = x+1;, the assignment. When you are assigning to something, you possibly can not skip using it. The result is not discarded.
Also, from the online gcc manual, regarding -Wunused-variable option
Warn whenever a local or static variable is unused aside from its declaration.
So, when you comment the x++;, it satisfies the condition to generate and emit the warning message. When you uncomment, the usage is visible to the compiler (the "usefulness" of this particular "usage" is questionable, but, it's an usage, nonetheless) and no warning.
With the preincrement you are incrementing and assigning the value to the variable again. It is like:
x=x+1
As the gcc documentation says:
-Wunused-variable:
Warn whenever a local or static variable is unused aside from its declaration.
If you comment that line you are not using the variable aside of the line in which you declare it
increasing variable not really using it.
Sure this is using it. It's doing a read and a write access on the stored object. This operation doesn't have any effect in your simple toy code, and the optimizer might notice that and remove the variable altogether. But the logic behind the warning is much simpler: warn iff the variable is never used.
This has actually the benefit that you can silence that warning in cases where it makes sense:
void someCallback(void *data)
{
(void)data; // <- this "uses" data
// [...] handler code that doesn't need data
}
Why is that? increasing variable not really using it.
Yes, it is really using it. At least from the language point of view. I would hope that an optimizer removes all trace of the variable.
Sure, that particular use has no effect on the rest of the program, so the variable is indeed redundant. I would agree that warning in this case would be helpful. But that is not the purpose of the warning about being unused, that you mention.
However, consider that analyzing whether a particular variable has any effect on the execution of the program in general is quite difficult. There has to be a point where the compiler stops checking whether a variable is actually useful. It appears that the stages that generate warnings of the compilers that you tested only check whether the variable is used at least once. That once was the increment operation.
I think there is a misconception about the word 'using' and what the compiler means with that. When you have a ++i you are not only accessing the variable, you are even modifying it, and AFAIK this counts as 'use'.
There are limitations to what the compiler can identify as 'how' variables are being used, and if the statements make any sense. In fact both clang and gcc will try to remove unnecessary statements, depending on the -O-flag (sometimes too aggressively). But these optimizations happen without warnings.
Detecting a variable that is never ever accessed or used though (there is no further statement mentioning that variable) is rather easy.
I agree with you, it could generate a warning about this. I think it doesn't generate a warning, because developers of the compilers just didn't bothered handling this case (yet). Maybe it is because it is too complicated to do. But maybe they will do this in the future (hint: you can suggest them this warning).
Compilers getting more and more warnings. For example, there is -Wunused-but-set-variable in GCC (which is a "new" warning, introduced in GCC 4.6 in 2011), which warns about this:
void fn() {
int a;
a = 2;
}
So it is completely fine to expect that this emits a warning too (there is nothing different here, neither codes do anything useful):
void fn() {
int a = 1;
a++;
}
Maybe they could add a new warning, like -Wmeaningless-variable
As per C standard ISO/IEC 9899:201x, expressions evaluation are always executed to allow for expression's side effects to be produced unless the compiler can't be sufficiently sure that removing it the program execution is not altered.
5.1.2.3 Program execution
In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
When removing the line
++x;
The compiler can deduce that the local variable x is defined and initialized, but not used.
When you add it, the expression itself can be considered a void expression, that must be evaluated for side effects, as stated in:
6.8.3 Expression and null statements
The expression in an expression statement is evaluated as a void expression for its side effects.
On the other hand to remove compiler warnings relative to unused variable is very common to cast the expression to void. I.e. for an unused parameter in a function you can write:
int MyFunc(int unused)
{
(void)unused;
...
return a;
}
In this case we have a void expression that reference the symbol unused.

Why there is no compilation or run-time error in the below code?

I have below discovery by chance. The compiler compiles the below code without any error or warning. Please help me understand why the compiler is not throwing any error? The program contains just a string in a double quotation.
I have not declared any char array neither assigned the below string to any variable.
void main()
{
"Why there is no error in compilation?";
}
Because any expression is a valid statement.
"Why is there no error in compilation?";
is a statement that consists of an expression that evaluates to the given literal string. This is a perfectly valid statement that happens to have no effect whatsoever.
Of course, "useful" statements look more like
a = b;
But well;
b;
is also a valid statement. In your case, b is simply string literal; and you are free to place that within a the body of a method. Obviously this statement doesn't have any side effects; but what if the statement would be something like
"some string " + someFunctionReturningString();
You probably would want that expression to be executed; and as side effect, that method to be called, wouldn't you?
Compile the program with -Wunused-value flag. It just raises
warning: statement with no effect
"Why there is no error in compilation?";
^
That is it.
And If you compile the above code with -Wall flag it also says
warning: return type of ‘main’ is not ‘int’ [-Wmain]
void main() {
void main()
{
"Why there is no error in compilation?";
}
First, let's address the string literal. An expression statement, which is valid in any context where any statement is valid, consists of an (optional) expression followed by a semicolon. The expression is evaluated and any result is discarded. (The empty statement, consisting of just a semicolon, is classified as an expression statement; I'm not sure why.)
Expression statements are very common, but usually used when the expression has side effects. For example, both assignments (x = 42;) and function calls (printf("Hello, world\n")) are expressions, and the both yield values. If you don't care about the result, just add a semicolon and you have a valid statement.
Not all expressions have side effects. Adding a semicolon to an expression that doesn't have any side effects, as you've done here (a string literal is an expression), is not generally useful, but the language doesn't forbid it. Generally C lets you do what you want and lets you worry about whether it makes sense, rather than imposing special-case rules that might prevent mistakes but could also prevent you from doing something useful.
Now let's cover void main(). A lot of people will tell you, with some justification, that this is wrong, and that the correct definition is int main(void). That's almost correct, and it's excellent advice, but the details are more complicated than that.
For a hosted implementation (basically one that provides the standard library), main may be defined in one of three ways:
int main(void) { /* ... */ }
or
int main(int argc, char *argv[]) { /* ... */ }
or equivalent, "or in some other implementation-defined manner." (See N1570 section 5.1.2.2.2 for the gory details.) That means that a particular implementation is permitted to document and implement forms of main other than the two mandated forms. In particular, a compiler can (and some do) state in its documentation that
void main() { /* ... */ }
and/or
void main(void) { /* ... */ }
is valid for that compiler. And a compiler that doesn't explicitly support void main() isn't required to complain if you write void main() anyway. It's not a syntax error or a constraint violation; it just has undefined behavior.
For a freestanding implementation (basically one that targets embedded systems with no OS, and no requirement to support most of the standard library), the entry point is entirely implementation-defined; it needn't even be called main. Requiring void main() is not uncommon for such implementations. (You're probably using a hosted implementation.)
Having said all that, if you're using a hosted implementation, you should always define main with an int return type (and in C you should int main(void) rather than int main()). There is no good reason to use void main(). It makes your program non-portable, and it causes annoying pedants like me to bore you with lengthy discussions of how main should be defined.
A number of C books advise you to use void main(). If you see this, remember who wrote the book and avoid anything written by that author; he or she doesn't know C very well, and will likely make other mistakes. (I'm thinking of Herbert Schildt in particular.) The great irony here is that void keyword was introduced by the 1989 ANSI C standard -- the very same standard that introduced the requirement for main to return int (unless the implementation explicitly permits something else).
I've discussed the C rules so far. Your question is tagged both C and C++, and the rules are a bit different in C++. In C++, empty parentheses on a function declaration or definition have a different meaning, and you should write int main() rather than int main(void) (the latter is supported in C++, but only for compatibility with C). And C++ requires main to return int for hosted implementations, with no permission for an implementation to support void main().

Is `auto` specifier slower in compilation time?

Since C++11 we can use auto a = 1+2 instead of int a = 1+2 and the compiler deduces the type of a by itself. How does it work? Is it slower during compile time (more operations) than declaring the type myself?
auto is asking the C++11 compiler to make some limited kind of type inference (look into Ocaml if you want some more sexy type inference language). But the overhead is compile-time only.
If you replace auto a=1+2; with int a=1+2; (both have the same meaning, see answer by simplicis) and if you ask your compiler to optimize (and probably even without asking for optimizations) you'll probably get the same machine code. See also this.
If using GCC try to compile a small C++11 foo.cc file with g++ -Wall -fverbose-asm -O -S foo.cc and look (with an editor) into the generated foo.s assembler file. You'll see no difference in the generated code (but the assembler file might perhaps change slightly, e.g. because of metadata like debug information etc.)
If you are concerned about slower compile-time I guess that using auto is not a decisive factor (probably, overloading could be more costly in compilation time). C++11 is nearly designed to practically require a lot of optimizations (in particular sophisticated inlining and constant folding and dead code elimination), and its "parsing" (notably header inclusion and template expansion) is costly.
Precompiling headers and parallel builds with make -j (and perhaps ccache or distcc) might help in improving the overall compilation time, much more than avoiding auto.
And if you wanted to systematically avoid auto (in particular in range-for loops like std::map<std::string,int> dict; for (auto it: dict) {...}) you'll end up typing much more source code (whose parsing and checking takes significant time) with more risks of error. As explained here, you might guess slightly wrongly the type, and expliciting it (slightly wrongly) might slow down the execution of your code because of additional conversions.
If using GCC you might pass the -ftime-report to g++ and get time measurements about various GCC passes and phases.
The compiler knows the type an expression (like 1 + 2) evaluates to. That's just the way the language works -- both operands are of type int so the result is int as well. With auto a, you are just telling the compiler to "use the type of the initializing expression".
The compiler does not have to do any additional work or deducing here. The auto keyword is merely relieving you from figuring out the expression and writing the correct type. (Which you might get wrong, with probably unintended side-effects -- see this question (and the top answer) for an example how auto can avoid unintended run-time conversions and copying.
The auto keyword really comes into its own with iterators:
std::vector< std::string >::const_iterator it = foo.cbegin();
versus
auto it = foo.cbegin();
How does it work:
From the ISO/IEC:
...The auto specifier is a placeholder for a type to be deduced (7.1.6.4). The other simple-type-specifiers specify
either a previously-declared user-defined type or one of the fundamental types...
7.1.6.4 auto specifier
The auto type-specifier signifies that the type of a variable being declared shall be deduced from its initializer
or that a function declarator shall include a trailing-return-type.
The auto type-specifier may appear with a function declarator with a trailing-return-type in any
context where such a declarator is valid.
Otherwise, the type of the variable is deduced from its initializer. The name of the variable being declared
shall not appear in the initializer expression. This use of auto is allowed when declaring variables in a
block, in namespace scope, and in a for-init-statement; auto shall appear as one of the decl-specifiers in the decl-specifier-seq and the decl-specifier-seq shall be followed by one or more initdeclarators, each of which shall have a non-empty initializer...
Example:
auto x = 5; // OK: x has type int
const auto *v = &x, u = 6; // OK: v has type const int*, u has type const int
static auto y = 0.0; // OK: y has type double
auto int r; // error: auto is not a storage-class-specifier
Is it faster:
The simple answer is Yes, by using it a lot of type conversions could be omitted, however, if not used properly it could become great source of errors.
In one of the interviews from Bjarne Stroustrup, he said that auto keyword has resulted in win-win situation for coders and compiler implementers.

Can I rely on my compiler to diagnose type mismatches within a TU?

On a search through the spec, it appears that my compiler isn't required to diagnose such mistakes as
extern int a;
extern float a;
I previously thought that my compiler needs to diagnose that, but the spec says (emphasis added by me)
After all adjustments of types (during which typedefs (7.1.3) are replaced by their definitions), the types specified by all declarations referring to a given variable or function shall be identical, except that declarations for an array object can specify array types that differ by the presence or absence of a major array bound (8.3.4). A violation of this rule on type identity does not require a diagnostic.
And in fact, I found cases where compilers don't care. For example, GCC and clang accept the following
void g() { int f(); }
void h() { float f(); }
Since a violation of a rule for which no diagnostic is required means that the entire program requires no diagnostic at all anymore, it means that the following ill-formed program doesn't require a diagnostic either (see 1.4p2). Fortunately, both GCC and Clang diagnose them.
int f();
float f();
The behavior of this code at translation time is effectively undefined. What is the reason for this? Why can the spec not require such cases to be rejected and require these to be diagnosed?
I think the rule you're quoting is talking about the whole program. A diagnostic isn't required if one TU has extern int a; and another has extern float a; because separate translation makes it impossible - the problem can only be detected at link time at best.
But if both declarations occur within a single TU I'm sure a diagnostic is required. Perhaps by 3.3/4? That (roughly) requires that all declarations of a name in one scope refer to the same entity.
For your first example Visual studio (rightfully) kicks up:
d:\experiments\test1\test1\test1.cpp(7) : error C2371: 'a' : redefinition; different basic types
d:\experiments\test1\test1\test1.cpp(6) : see declaration of 'a'
There's nothing wrong with your second example as the function definitions are local so they can do whatever they like.
Your third example (rightfully) kicks up an error in visual studio:
d:\experiments\test1\test1\test1.cpp(7) : error C2556: 'float f(void)' : overloaded function differs only by return type from 'int f(void)'
d:\experiments\test1\test1\test1.cpp(6) : see declaration of 'f'
d:\experiments\test1\test1\test1.cpp(7) : error C2371: 'f' : redefinition; different basic types
I'm certain the spec says you cannot have multiple identically named variables in the same scope and function definitions in the same scope must differ by more than their return type.
According to Pete Becker, when you have list of rules like the bullet points in §1.4/2, they're (at least normally) to be read in order, with earlier rules taking precedence over later rules.
In other words, if your code violates both the second and third bullet points, violation of the second bullet point requires issuing a diagnostic, even though violating the third bullet point appears to remove that requirement.
Unfortunately, I've never seen any explicit statement to that effect in the standard proper, only in old Usenet posts from Pete (and, if memory serves, perhaps also from Andrew Koenig who was the editor before Pete).
This looks like heavy math problem. My guess is that the the reason is that with two different typedefs you could end up with the same type in two different expressions. However when compiler stores the data structures, requiring the check would require compiler to "evaluate" the type expression to it's normal form. Church-Rosser theorem would need to be used inside the compiler to prove that the two expressions are equivalent. The operation used in typedefs is just plain old substitution, so full church-rosser would be required. So guess they made it optional. I think they don't want to add lambda calculus, which would be required next.
typedef A<int> C;
typedef int D;
typedef A<D> E;
extern C v;
extern E v;
Now, without evaluating both to A<int>, there is no way to check if these are the same type.