Function Call = expression-statement... even functions of type void? - c++

in C++, there is no such thing as an assignment statement or function-call statement.
An assignment is an expression; a function-call is an expression; this is coming straight from Bjarne Stroustrup in his book "The C++ Programming Language".
I know an expression computes a value, which has me wondering if this applies to void functions, since they don't return a value.
I'd like to know if functions with a return type of void still count as expressions, and if so, why?

C++14 standard:
§5 Expressions:
1 An expression is a sequence of operators and operands that specifies
a computation. An expression can result in a value and can cause side
effects
So the "main" purpose/scope of an expression is to specify a computation, not to compute a value. Some computations may result in a value and some may have side effects.
In addition to this (or actually first of all), "expressions" and "statements" are used in defining the grammar of C and C++. It would be a syntactically impossible to make functions that don't return a value not an "expression". And adding that distinction at a semantic level would be an unnecessary overly-complication.

Yes functions returning no value (declared as returning void) still counts as expressions when you call them. That limits their use in other expressions though, for example such calls can not be on either side of an assignment.
As for "why"? Well, a function call is a function call is a function call. Adding special rules for functions that don't return a value would make the language design much more complicated. C++ already have enough special rules and exceptions.

Yes, void function call is also an expression. The definition in C++ Standard says:
An expression is a sequence of operators and operands that specifies a
computation. An expression can result in a value and can cause side
effects.
A function call is a postfix expression followed by parentheses
containing a possibly empty, comma-separated list of expressions which
constitute the arguments to the function.
Also in MSDN C++ Language Reference:
A postfix-expression followed by the function-call operator, ( ),
specifies a function call.

Function Call = expression-statement… even functions of type void?
no. But that's because a function call is this:
do_stuff()
and that's an expression_opt. It's an expression, not a statement. You can use this expression in compound expressions, but it's not a statement by language logic.
You can quickly convert that expression_opt to a expression-statement by giving it a semicolon:
do_stuff();
is now a full statement.
The difference becomes clear if you think about something like
if(good_thing() || do_stuff())
{
....
}
do_stuff() and good_thing() are expressions, which can/will be evaluated. Semicolons after () would break that if clause.

Related

Do return statements in C++ have a type?

As we know, in C++ every expressions and statements have a type. So, what is the type of a return statement in a void function? Is its type void?
In C++ (and in C), only expressions have types; not statements. However, many statements are actually just evaluations of expressions (e.g. a function call, or an assignment) which may be the cause of confusion.
So, no, a return statement - in a function returning void or anything else - has no type as such.
Read more about "expression statements" and statements in general on the statements page on cppreference.com.

In OCaml Menhir, how to write a parser for C++/Rust/Java-style generics

In C++, a famous parsing ambiguity happens with code like
x<T> a;
Is it if T is a type, it is what it looks like (a declaration of a variable a of type x<T>, otherwise it is (x < T) > a (<> are comparison operators, not angle brackets).
In fact, we could make a change to make this become unambiguous: we can make < and > nonassociative. So x < T > a, without brackets, would not be a valid sentence anyway even if x, T and a were all variable names.
How could one resolve this conflict in Menhir? At first glance it seems we just can't. Even with the aforementioned modification, we need to lookahead an indeterminate number of tokens before we see another closing >, and conclude that it was a template instantiation, or otherwise, to conclude that it was an expression. Is there any way in Menhir to implement such an arbitrary lookahead?
Different languages (including the ones listed in your title) actually have very different rules for templates/generics (like what type of arguments there can be, where templates/generics can appear, when they are allowed to have an explicit argument list and what the syntax for template/type arguments on generic methods is), which strongly affect the options you have for parsing. In no language that I know is it true that the meaning of x<T> a; depends on whether T is a type.
So let's go through the languages C++, Java, Rust and C#:
In all four of those languages both types and functions/methods can be templates/generic. So we'll not only have to worry about an ambiguity with variable declarations, but also function/method calls: is f<T>(x) a function/method call with an explicit template/type argument or is it two relational operators with the last operand parenthesized? In all four languages template/generic functions/methods can be called without template/type when those can be inferred, but that inference isn't always possible, so just disallowing explicit template/type arguments for function/method calls is not an option.
Even if a language does not allow relational operators to be chained, we could get an ambiguity in expressions like this: f(a<b, c, d>(e)). Is this calling f with the three arguments a<b, c and d>e or with the single argument a<b, c, d>(e) calling a function/method named a with the type/template arguments b,c,d?
Now beyond this common foundation, most everything else is different between these languages:
Rust
In Rust the syntax for a variable declaration is let variableName: type = expr;, so x<T> a; couldn't possibly be a variable declaration because that doesn't match the syntax at all. In addition it's also not a valid expression statement (anymore) because comparison operators can't be chained (anymore).
So there's no ambiguity here or even a parsing difficulty. But what about function calls? For function calls, Rust avoided the ambiguity by simply choosing a different syntax to provide type arguments: instead of f<T>(x) the syntax is f::<T>(x). Since type arguments for function calls are optional when they can be inferred, this ugliness is thankfully not necessary very often.
So in summary: let a: x<T> = ...; is a variable declaration, f(a<b, c, d>(e)); calls f with three arguments and f(a::<b, c, d>(e)); calls a with three type arguments. Parsing is easy because all of these are sufficiently different to be distinguished with just one token of lookahead.
Java
In Java x<T> a; is in fact a valid variable declaration, but it is not a valid expression statement. The reason for that is that Java's grammar has a dedicated non-terminal for expressions that can appear as an expression statement and applications of relational operators (or any other non-assignment operators) are not matched by that non-terminal. Assignments are, but the left side of assignment expressions is similarly restricted. In fact, an identifier can only be the start of an expression statement if the next token is either a =, ., [ or (. So an identifier followed by a < can only be the start of a variable declaration, meaning we only need one token of lookahead to parse this.
Note that when accessing static members of a generic class, you can and must refer to the class without type arguments (i.e. FooClass.bar(); instead of FooClass<T>.bar()), so even in that case the class name would be followed by a ., not a <.
But what about generic method calls? Something like y = f<T>(x); could still run into the ambiguity because relational operators are of course allowed on the right side of =. Here Java chooses a similar solution as Rust by simply changing the syntax for generic method calls. Instead of object.f<T>(x) the syntax is object.<T>f(x) where the object. part is non-optional even if the object is this. So to call a generic method with an explicit type argument on the current object, you'd have to write this.<T>f(x);, but like in Rust the type argument can often be inferred, allowing you to just write f(x);.
So in summary x<T> a; is a variable declaration and there can't be expression statements that start with relational operations; in general expressions this.<T>f(x) is a generic method call and f<T>(x); is a comparison (well, a type error, actually). Again, parsing is easy.
C#
C# has the same restrictions on expression statements as Java does, so variable declarations aren't a problem, but unlike the previous two languages, it does allow f<T>(x) as the syntax for function calls. In order to avoid ambiguities, relational operators need to be parenthesized when used in a way that could also be valid call of a generic function. So the expression f<T>(x) is a method call and you'd need to add parentheses f<(T>(x)) or (f<T)>(x) to make it a comparison (though actually those would be type errors because you can't compare booleans with < or >, but the parser doesn't care about that) and similarly f(a<b, c, d>(e)) calls a generic method named a with the type arguments b,c,d whereas f((a<b), c, (d<e)) would involve two comparisons (and you can in fact leave out one of the two pairs of parentheses).
This leads to a nicer syntax for method calls with explicit type arguments than in the previous two languages, but parsing becomes kind of tricky. Considering that in the above example f(a<b, c, d>(e)) we can actually place an arbitrary number of arguments before d>(e) and a<b is a perfectly valid comparison if not followed by d>(e), we actually need an arbitrary amount of lookahead, backtracking or non-determinism to parse this.
So in summary x<T> a; is a variable declaration, there is no expression statement that starts with a comparison, f<T>(x) is a method call expression and (f<T)>(x) or f<(T>(x)) would be (ill-typed) comparisons. It is impossible to parse C# with menhir.
C++
In C++ a < b; is a valid (albeit useless) expression statement, the syntax for template function calls with explicit template arguments is f<T>(x) and a<b>c can be a perfectly valid (even well-typed) comparison. So statements like a<b>c; and expressions like a<b>(c) are actually ambiguous without additional information. Further, template arguments in C++ don't have to be types. That is, Foo<42> x; or even Foo<c> x; where c is defined as const int x = 42;, for example, could be perfectly valid instantiations of the Foo template if Foo is defined to take an integer as a template argument. So that's a bummer.
To resolve this ambiguity, the C++ grammar refers to the rule template-name instead of identifier in places where the name of a template is expected. So if we treated these as distinct entities, there'd be no ambiguity here. But of course template-name is defined simply as template-name: identifier in the grammar, so that seems pretty useless, ... except that the standard also says that template-name should only be matched when the given identifier names a template in the current scope. Similarly it says that identifiers should only be interpreted as variable names when they don't refer to a template (or type name).
Note that, unlike the previous three languages, C++ requires all types and templates to be declared before they can be used. So when we see the statement a<b>c;, we know that it can only be a template instantiation if we've previously parsed a declaration for a template named a and it is currently in scope.
So, if we keep track of scopes while parsing, we can simply use if-statements to check whether the name a refers to a previously parsed template or not in a hand-written parser. In parser generators that allow semantic predicates, we can do the same thing. Doing this does not even require any lookahead or backtracking.
But what about parser generators like yacc or menhir that don't support semantic predicates? For these we can use something known as the lexer hack, meaning we make the lexer generate different tokens for type names, template names and ordinary identifiers. Then we have a nicely unambiguous grammar that we can feed our parser generator. Of course the trick is getting the lexer to actually do that. In order to accomplish that, we need to keep track of which templates and types are currently in scope using a symbol table and then access that symbol table from the lexer. We'll also need to tell the lexer when we're reading the name of a definition, like the x in int x;, because then we want to generate a regular identifier even if a template named x is currently in scope (the definition int x; would shadow the template until the variable goes out of scope).
This same approach is used to resolve the casting ambiguity (is (T)(x) a cast of x to type T or a function call of a function named T?) in C and C++.
So in summary, foo<T> a; and foo<T>(x) are template instantiations if and only if foo is a template. Parsing's a bitch, but possible without arbitrary lookahead or backtracking and even using menhir when applying the lexer hack.
AFAIK C++'s template syntax is a well-known example of real-world non-LR grammar. Strictly speaking, it is not LR(k) for any finite k... So C++ parsers are usually hand-written with hacks (like clang) or generated by a GLR grammar (LR with branching). So in theory it is impossible to implement a complete C++ parser in Menhir, which is LR.
However even the same syntax for generics can be different. If generic types and expressions involving comparison operators never appear under the same context, the grammar may still be LR compatible. For example, consider the rust syntax for variable declaration (for this part only):
let x : Vec<T> = ...
The : token indicates that a type, rather than an expression follows, so in this case the grammar can be LR, or even LL (not verified).
So the final answer is, it depends. But for the C++ case it should be impossible to implement the syntax in Menhir.

When calling C/C++ function within another function, why do they stack? Is there a way to fix it? [duplicate]

If we have three functions (foo, bar, and baz) that are composed like so...
foo(bar(), baz())
Is there any guarantee by the C++ standard that bar will be evaluated before baz?
No, there's no such guarantee. It's unspecified according to the C++ standard.
Bjarne Stroustrup also says it explicitly in "The C++ Programming Language" 3rd edition section 6.2.2, with some reasoning:
Better code can be generated in the
absence of restrictions on expression
evaluation order
Although technically this refers to an earlier part of the same section which says that the order of evaluation of parts of an expression is also unspecified, i.e.
int x = f(2) + g(3); // unspecified whether f() or g() is called first
From [5.2.2] Function call,
The order of evaluation of arguments is unspecified. All side effects of argument expression evaluations take effect before the function is entered.
Therefore, there is no guarantee that bar() will run before baz(), only that bar() and baz() will be called before foo.
Also note from [5] Expressions that:
except where noted [e.g. special rules for && and ||], the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified.
so even if you were asking whether bar() will run before baz() in foo(bar() + baz()), the order is still unspecified.
There's no specified order for bar() and baz() - the only thing the Standard says is that they will both be evaluated before foo() is called. From the C++ Standard, section 5.2.2/8:
The order of evaluation of arguments
is unspecified.
C++17 specifies evaluation order for operators that was unspecified until C++17. See the question What are the evaluation order guarantees introduced by C++17? But note your expression
foo(bar(), baz())
has still unspecified evaluation order.
In C++11, the relevant text can be found in 8.3.6 Default arguments/9 (Emphasis mine)
Default arguments are evaluated each time the function is called. The order of evaluation of function arguments is unspecified. Consequently, parameters of a function shall not be used in a default argument, even if they are not evaluated.
The same verbiage is used by C++14 standard as well, and is found under the same section.
As others have already pointed out, the standard does not give any guidance on order of evaluation for this particular scenario. This order of evaluation is then left to the compiler, and the compiler might have a guarantee.
It's important to remember that the C++ standard is really a language to instruct a compiler on constructing assembly/machine code. The standard is only one part of the equation. Where the standard is ambiguous or is specifically implementation defined you should turn to the compiler and understand how it translates C++ instructions into true machine language.
So, if order of evaluation is a requirement, or at least important, and being cross-compiler compatible is not a requirement, investigate how your compiler will ultimately piece this together, your answer could ultimate lie there. Note that the compiler could change it's methodology in the future

Why won't this c++ lamba function compile?

Why does this fail to compile:
int myVar = 0;
myVar ? []()->void{} : []()->void{};
with following error msg:
Error 2 error C2446: ':' : no conversion from 'red_black_core::`anonymous-namespace'::< lambda1>' to red_black_core::anonymous-namespace::< lambda0>
While this complies correctly:
void left()
{}
void right()
{}
int myVar = 0;
myVar ? left() : right();
The return type of the ?: operator has to be deduced from it's two operands, and the rules for determining this type are quite complex. Lambdas don't satisfy them because they can't be converted to each other. So when the compiler tries to work out what the result of that ?: is, then there can't be a result, because those two lambdas aren't convertible to each other.
However, when you try to compute the functions, then you actually called them, but you didn't call the lambdas. So when you call the functions, they both have void, so the return type of ?: is void.
This
void left()
{}
void right()
{}
int myVar = 0;
myVar ? left() : right();
is equivalent to
int myVar = 0;
myVar ? [](){}() : [](){}();
Note the extra () on the end- I actually called the lambda.
What you had originally is equivalent to
compiler_deduced_type var;
if (myVar)
var = [](){};
else
var = [](){};
But- no type exists that can be both lambdas. The compiler is well within it's rights to make both lambdas different types.
EDIT:
I remembered something. In the latest Standard draft, lambdas with no captures can be implicitly converted into function pointers of the same signature. That is, in the above code, compiler_deduced_type could be void(*)(). However, I know for a fact that MSVC does not include this behaviour because that was not defined at the time that they implemented lambdas. This is likely why GCC allows it and MSVC does not- GCC's lambda support is substantially newer than MSVC's.
Rules for conditional operator in the draft n3225 says at one point
Otherwise, the result is a prvalue. If the second and third operands do not have the same type, and either
has (possibly cv-qualified) class type, overload resolution is used to determine the conversions (if any) to be
applied to the operands (13.3.1.2, 13.6). If the overload resolution fails, the program is ill-formed. Otherwise,
the conversions thus determined are applied, and the converted operands are used in place of the original
operands for the remainder of this section.
Up to that point, every other alternative (like, convert one to the other operand) failed, so we will now do what that paragraph says. The conversions we will apply are determined by overload resolution by transforming a ? b : c into operator?(a, b, c) (an imaginary function call to a so-named function). If you look what the candidates for the imaginary operator? are, you find (among others)
For every type T , where T is a pointer, pointer-to-member, or scoped enumeration type, there exist candidate operator functions of the form
T operator?(bool, T , T );
And this includes a candidate for which T is the type void(*)(). This is important, because lambda expressions yield an object of a class that can be converted to such a type. The spec says
The closure type for a lambda-expression with no lambda-capture has a public non-virtual non-explicit const conversion function to pointer to function having the same parameter and return types as the closure type’s function call operator. The value returned by this conversion function shall be the address of a function that, when invoked, has the same effect as invoking the closure type’s function call operator.
The lambda expressions can't be convert to any other of the parameter types listed, which means overload resolution succeeds, finds a single operator? and will convert both lambda expressions to function pointers. The remainder of the conditional opreator section will then proceed as usual, now having two branches for the conditional operator having the same type.
That's why also your first version is OK, and why GCC is right accepting it. However I don't really understand why you show the second version at all - as others explained, it's doing something different and it's not surprising that it works while the other doesn't (on your compiler). Next time, best try not to include useless code into the question.
Because every lambda is a unique type. It is basically syntactic sugar for a functor, and two separately implemented functors aren't the same type, even if they contain identical code.
The standard does specify that lambdas can be converted to function pointers if they don't capture anything, but that rule was added after MSVC's lambda support was implemented.
With that rule, however, two lambdas can be converted to the same type, and so I believe your code would be valid with a compliant compiler.
Both snippets compile just fine with GCC 4.5.2.
Maybe your compiler has no (or partial/broken) support to C++0x features such as lambda?
It doesn't fail to compile. It works just fine. You probably don't have C++0x enabled in your compiler.
Edit:
An error message has now been added to the original question! It seems that you do have C++0x support, but that it is not complete in your compiler. This is not surprising.
The code is still valid C++0x, but I recommend only using C++0x features when you really have to, until it's standardised and there is full support across a range of toolchains. You have a viable C++03 alternative that you gave in your answer, and I suggest using it for the time being.
Possible alternative explanation:
Also note that you probably didn't write what you actually meant to write. []()->void{} is a lambda. []()->void{}() executes the lambda and evaluates to its result. Depending what you're doing with this result, your problem could be that the result of calling your lambda is void, and you can't do much with void.

subscript operator postfix

The C++ standard defines the expression using subscripts as a postfix expression. AFAIK, this operator always takes two arguments (the first is the pointer to T and the other is the enum or integral type). Hence it should qualify as a binary operator.
However MSDN and IBM does not list it as a binary operator.
So the question is, what is subscript operator? Is it unary or binary? For sure, it is not unary as it is not mentioned in $5.3 (at least straigt away).
What does it mean when the Standard mentions it's usage in the context of postfix expression?
I'd tend to agree with you in that operator[] is a binary operator in the strictest sense, since it does take two arguments: a (possibly implicit) reference to an object, and a value of some other type (not necessarily enumerated or integral). However, since it is a bracketing operator, you might say that the sequence of tokens [x], where x might be any valid subscript-expression, qualifies as a postfix unary operator in an abstract sense; think currying.
Also, you cannot overload a global operator[](const C&, size_t), for example. The compiler complains that operator[] must be a nonstatic member function.
You are correct that operator[] is a binary operator but it is special in that it must also be a member function.
Similar to operator()
You can read up on postfix expressions here
I just found an interesting article about operator[] and postfix expression, here
I think it's the context that [] is used in that counts. Section 5.2.1 the symbol [] is used in the context of a postfix expression that is 'is identical (by definition) to *((E1)+(E2))'. In this context, [] isn't an operator. In section 13.5.5 its used to mean the subscripting operator. In this case it's an operator that takes one argument. For example, if I wrote:
x = a[2];
It's not necessarily the case that the above statement evaluates to:
x = *(a + 2);
because 'a' might be an object. If a is an object type then in this context, [] is used as an subscript operator.
Anyway that's the best explanation I can derive from the standard that resolves apparent contradictions.
If you take a close look to http://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B it will explain you that standard C++ recognize operator[] to be a binary operator, as you said.
Operator[] is, generally speaking, binary, and, despite there is the possibility to make it unary, it should always be used as binary inside a class, even because it has no sense outside a class.
It is well explained in the link I provided you...
Notice that sometimes many programmers overload operators without think too much about what they are doing, sometimes overloading them in an incorrect manner; the compiler is ease is this and accept it, but, probably, it was not the correct way to overload that operator.
Following guides like the one I provided you, is a good way to do things in the correct manner.
So, always beware examples where operators are overloaded without a good practice (out of standard), refer, first to the standard methods, and use those examples that are compliant to them.