Do return statements in C++ have a type? - c++

As we know, in C++ every expressions and statements have a type. So, what is the type of a return statement in a void function? Is its type void?

In C++ (and in C), only expressions have types; not statements. However, many statements are actually just evaluations of expressions (e.g. a function call, or an assignment) which may be the cause of confusion.
So, no, a return statement - in a function returning void or anything else - has no type as such.
Read more about "expression statements" and statements in general on the statements page on cppreference.com.

Related

C++ Construct to check whether a local expression is not a constant expression

I'm looking for some construct that can detect whether a local expression is a constant expression, or whether it's not, that can be used in a static_assert declaration.
In code:
int main()
{
constexpr int i = 1;
if constexpr(i) {}
static_assert(is_constexpr(i));
int j = 1;
//if constexpr(j) {} // error: 'j' is not usable in a constant expression
static_assert(!is_constexpr(j));
}
The is_constexpr here can be a macro, a class (then probably with other syntax), or a function. How to make is_constexpr(i) return true/true_type and, conversely, is_constexpr(j) return false/false_type?
I tried a lot of the solutions available here on SO (also using old-school SFINAE) without success. I can edit in the links and my failed attempts if required.
Playground on godbolt here
The problem is that parameters (like t) are not constant expressions. For example, we cannot use t as a template non-type parameter, or as a size of a built-in array.
This means that some expression say some_expr( t ) that contains(involves) the subexpression t is itself not a constant expression, either.
Can is_constexpr be a function?
The simple fact that you write is_constexpr(i) means that you're looking for a function of type bool(auto); but you also want to be able to pass it constexpr arguments or non-constexpr arguments, so for sure it can't be consteval otherwise it would error when called in the latter scenario; so you can at most make it constexpr. However, there's no such a thing (yet?) as a constexpr function parameter, so as soon as you pass something to that constexpr function you have no way to detect whether or not the argument was a constant expression at the call site.
Can is_constexpr be a meta-function?
The other opportunity would be to use a metafunction, but how would you expect to call it? If you expect is_constexpr<i>, i.e. passing the entity you are querying as a template argument, that would be clearly only possible if i is constexpr; if you go for is_constexpr<decltype(x)>, you are expecting that decltype(x) contains some info about the constexpr-ness of x, but that's not the case, as constexpr/consteval are not part of the type of an expression.
Do you really need it?
This, I believe, means that whenever you ask whether an expression is a constant expression, you can always find the answer by looking at the code before that expression, and that answer will not depend on any boolean condition, not even a constexpr condition.
In other words, I believe (and happy to be proved wrong), there's no way for an expression appearing in a C++ program to be constexpr or not in the same line of code depending on the path that leads there.
Do we really have a usecase?
I agree that this is likely an XY problem.
So I challenge you to write an example where a given C++ expression of your choice appearing at a certain line of code is constexpr across one branch of execution and non-constexpr across another branch. And the branches can also be compile-time (well, given your question, you were not asking about non-compile-time branching, so in the previous sentence you can can also be are).

In OCaml Menhir, how to write a parser for C++/Rust/Java-style generics

In C++, a famous parsing ambiguity happens with code like
x<T> a;
Is it if T is a type, it is what it looks like (a declaration of a variable a of type x<T>, otherwise it is (x < T) > a (<> are comparison operators, not angle brackets).
In fact, we could make a change to make this become unambiguous: we can make < and > nonassociative. So x < T > a, without brackets, would not be a valid sentence anyway even if x, T and a were all variable names.
How could one resolve this conflict in Menhir? At first glance it seems we just can't. Even with the aforementioned modification, we need to lookahead an indeterminate number of tokens before we see another closing >, and conclude that it was a template instantiation, or otherwise, to conclude that it was an expression. Is there any way in Menhir to implement such an arbitrary lookahead?
Different languages (including the ones listed in your title) actually have very different rules for templates/generics (like what type of arguments there can be, where templates/generics can appear, when they are allowed to have an explicit argument list and what the syntax for template/type arguments on generic methods is), which strongly affect the options you have for parsing. In no language that I know is it true that the meaning of x<T> a; depends on whether T is a type.
So let's go through the languages C++, Java, Rust and C#:
In all four of those languages both types and functions/methods can be templates/generic. So we'll not only have to worry about an ambiguity with variable declarations, but also function/method calls: is f<T>(x) a function/method call with an explicit template/type argument or is it two relational operators with the last operand parenthesized? In all four languages template/generic functions/methods can be called without template/type when those can be inferred, but that inference isn't always possible, so just disallowing explicit template/type arguments for function/method calls is not an option.
Even if a language does not allow relational operators to be chained, we could get an ambiguity in expressions like this: f(a<b, c, d>(e)). Is this calling f with the three arguments a<b, c and d>e or with the single argument a<b, c, d>(e) calling a function/method named a with the type/template arguments b,c,d?
Now beyond this common foundation, most everything else is different between these languages:
Rust
In Rust the syntax for a variable declaration is let variableName: type = expr;, so x<T> a; couldn't possibly be a variable declaration because that doesn't match the syntax at all. In addition it's also not a valid expression statement (anymore) because comparison operators can't be chained (anymore).
So there's no ambiguity here or even a parsing difficulty. But what about function calls? For function calls, Rust avoided the ambiguity by simply choosing a different syntax to provide type arguments: instead of f<T>(x) the syntax is f::<T>(x). Since type arguments for function calls are optional when they can be inferred, this ugliness is thankfully not necessary very often.
So in summary: let a: x<T> = ...; is a variable declaration, f(a<b, c, d>(e)); calls f with three arguments and f(a::<b, c, d>(e)); calls a with three type arguments. Parsing is easy because all of these are sufficiently different to be distinguished with just one token of lookahead.
Java
In Java x<T> a; is in fact a valid variable declaration, but it is not a valid expression statement. The reason for that is that Java's grammar has a dedicated non-terminal for expressions that can appear as an expression statement and applications of relational operators (or any other non-assignment operators) are not matched by that non-terminal. Assignments are, but the left side of assignment expressions is similarly restricted. In fact, an identifier can only be the start of an expression statement if the next token is either a =, ., [ or (. So an identifier followed by a < can only be the start of a variable declaration, meaning we only need one token of lookahead to parse this.
Note that when accessing static members of a generic class, you can and must refer to the class without type arguments (i.e. FooClass.bar(); instead of FooClass<T>.bar()), so even in that case the class name would be followed by a ., not a <.
But what about generic method calls? Something like y = f<T>(x); could still run into the ambiguity because relational operators are of course allowed on the right side of =. Here Java chooses a similar solution as Rust by simply changing the syntax for generic method calls. Instead of object.f<T>(x) the syntax is object.<T>f(x) where the object. part is non-optional even if the object is this. So to call a generic method with an explicit type argument on the current object, you'd have to write this.<T>f(x);, but like in Rust the type argument can often be inferred, allowing you to just write f(x);.
So in summary x<T> a; is a variable declaration and there can't be expression statements that start with relational operations; in general expressions this.<T>f(x) is a generic method call and f<T>(x); is a comparison (well, a type error, actually). Again, parsing is easy.
C#
C# has the same restrictions on expression statements as Java does, so variable declarations aren't a problem, but unlike the previous two languages, it does allow f<T>(x) as the syntax for function calls. In order to avoid ambiguities, relational operators need to be parenthesized when used in a way that could also be valid call of a generic function. So the expression f<T>(x) is a method call and you'd need to add parentheses f<(T>(x)) or (f<T)>(x) to make it a comparison (though actually those would be type errors because you can't compare booleans with < or >, but the parser doesn't care about that) and similarly f(a<b, c, d>(e)) calls a generic method named a with the type arguments b,c,d whereas f((a<b), c, (d<e)) would involve two comparisons (and you can in fact leave out one of the two pairs of parentheses).
This leads to a nicer syntax for method calls with explicit type arguments than in the previous two languages, but parsing becomes kind of tricky. Considering that in the above example f(a<b, c, d>(e)) we can actually place an arbitrary number of arguments before d>(e) and a<b is a perfectly valid comparison if not followed by d>(e), we actually need an arbitrary amount of lookahead, backtracking or non-determinism to parse this.
So in summary x<T> a; is a variable declaration, there is no expression statement that starts with a comparison, f<T>(x) is a method call expression and (f<T)>(x) or f<(T>(x)) would be (ill-typed) comparisons. It is impossible to parse C# with menhir.
C++
In C++ a < b; is a valid (albeit useless) expression statement, the syntax for template function calls with explicit template arguments is f<T>(x) and a<b>c can be a perfectly valid (even well-typed) comparison. So statements like a<b>c; and expressions like a<b>(c) are actually ambiguous without additional information. Further, template arguments in C++ don't have to be types. That is, Foo<42> x; or even Foo<c> x; where c is defined as const int x = 42;, for example, could be perfectly valid instantiations of the Foo template if Foo is defined to take an integer as a template argument. So that's a bummer.
To resolve this ambiguity, the C++ grammar refers to the rule template-name instead of identifier in places where the name of a template is expected. So if we treated these as distinct entities, there'd be no ambiguity here. But of course template-name is defined simply as template-name: identifier in the grammar, so that seems pretty useless, ... except that the standard also says that template-name should only be matched when the given identifier names a template in the current scope. Similarly it says that identifiers should only be interpreted as variable names when they don't refer to a template (or type name).
Note that, unlike the previous three languages, C++ requires all types and templates to be declared before they can be used. So when we see the statement a<b>c;, we know that it can only be a template instantiation if we've previously parsed a declaration for a template named a and it is currently in scope.
So, if we keep track of scopes while parsing, we can simply use if-statements to check whether the name a refers to a previously parsed template or not in a hand-written parser. In parser generators that allow semantic predicates, we can do the same thing. Doing this does not even require any lookahead or backtracking.
But what about parser generators like yacc or menhir that don't support semantic predicates? For these we can use something known as the lexer hack, meaning we make the lexer generate different tokens for type names, template names and ordinary identifiers. Then we have a nicely unambiguous grammar that we can feed our parser generator. Of course the trick is getting the lexer to actually do that. In order to accomplish that, we need to keep track of which templates and types are currently in scope using a symbol table and then access that symbol table from the lexer. We'll also need to tell the lexer when we're reading the name of a definition, like the x in int x;, because then we want to generate a regular identifier even if a template named x is currently in scope (the definition int x; would shadow the template until the variable goes out of scope).
This same approach is used to resolve the casting ambiguity (is (T)(x) a cast of x to type T or a function call of a function named T?) in C and C++.
So in summary, foo<T> a; and foo<T>(x) are template instantiations if and only if foo is a template. Parsing's a bitch, but possible without arbitrary lookahead or backtracking and even using menhir when applying the lexer hack.
AFAIK C++'s template syntax is a well-known example of real-world non-LR grammar. Strictly speaking, it is not LR(k) for any finite k... So C++ parsers are usually hand-written with hacks (like clang) or generated by a GLR grammar (LR with branching). So in theory it is impossible to implement a complete C++ parser in Menhir, which is LR.
However even the same syntax for generics can be different. If generic types and expressions involving comparison operators never appear under the same context, the grammar may still be LR compatible. For example, consider the rust syntax for variable declaration (for this part only):
let x : Vec<T> = ...
The : token indicates that a type, rather than an expression follows, so in this case the grammar can be LR, or even LL (not verified).
So the final answer is, it depends. But for the C++ case it should be impossible to implement the syntax in Menhir.

Why is a [[noreturn]] function checked for return type?

Suppose I have a function with signature [[noreturn]] void die(int exit_code);. If I write the statement:
check_some_condition() or die(EXIT_FAILURE);
I get an error message (with GCC 5.4.0):
error: expression must have bool type (or be convertible to bool)
but why is the type checked, if the compiler knows that going into that function, the return value won't matter; and if the condition checks out, again the return type doesn't matter?
Edit: Does the wording of the standard regarding [[noreturn]] not address this point, i.e. relax the requirements regarding types so as to "legitimize" such expressions?
The concept you are looking for is called a bottom type. In a type system, a bottom type is a type that is convertible to any other type. (Also compare it with a top type, to which all types are convertible.)
The bottom type is a perfect candidate for a function that doesn't return. It's type-safe to assign it to anything, precisely because the assignment will never happen anyway. If C++ had a bottom type, and if you declared your function as returning the bottom type, your snippet could have been perfectly legal and your expectation would have been correct.
Unfortunately, C++ doesn't have such a type. As pointed out already, [[noreturn]] is not a type -- it's an attribute that's used to express an intention (to other programmers and to the optimizer) in a way that's orthogonal to the type system. As far as the type checker is concerned, the function's return type is still void and that can't be converted to a boolean.
noreturn doesn't tell the compiler that the function returns no value. It tells the compiler that the function will not return. It also has no effect on the function's return type.
In an expression, the compiler is required to check that operands of expressions have valid types. In an expression like check_some_condition() or die(EXIT_FAILURE), that requires the return type of both check_some_condition() and die() to be checked. noreturn does not affect the function's return type, so does not affect the need for that check. If the function returns void, the expression is invalid.
Every expression must have a well determined type, period. So even though the function doesn't return, it must still be possible to evaluate the type of the expression check_some_condition() or die(EXIT_FAILURE)
And from that it follows that the function return type must be viable for use in a logical operation.

Function Call = expression-statement... even functions of type void?

in C++, there is no such thing as an assignment statement or function-call statement.
An assignment is an expression; a function-call is an expression; this is coming straight from Bjarne Stroustrup in his book "The C++ Programming Language".
I know an expression computes a value, which has me wondering if this applies to void functions, since they don't return a value.
I'd like to know if functions with a return type of void still count as expressions, and if so, why?
C++14 standard:
§5 Expressions:
1 An expression is a sequence of operators and operands that specifies
a computation. An expression can result in a value and can cause side
effects
So the "main" purpose/scope of an expression is to specify a computation, not to compute a value. Some computations may result in a value and some may have side effects.
In addition to this (or actually first of all), "expressions" and "statements" are used in defining the grammar of C and C++. It would be a syntactically impossible to make functions that don't return a value not an "expression". And adding that distinction at a semantic level would be an unnecessary overly-complication.
Yes functions returning no value (declared as returning void) still counts as expressions when you call them. That limits their use in other expressions though, for example such calls can not be on either side of an assignment.
As for "why"? Well, a function call is a function call is a function call. Adding special rules for functions that don't return a value would make the language design much more complicated. C++ already have enough special rules and exceptions.
Yes, void function call is also an expression. The definition in C++ Standard says:
An expression is a sequence of operators and operands that specifies a
computation. An expression can result in a value and can cause side
effects.
A function call is a postfix expression followed by parentheses
containing a possibly empty, comma-separated list of expressions which
constitute the arguments to the function.
Also in MSDN C++ Language Reference:
A postfix-expression followed by the function-call operator, ( ),
specifies a function call.
Function Call = expression-statement… even functions of type void?
no. But that's because a function call is this:
do_stuff()
and that's an expression_opt. It's an expression, not a statement. You can use this expression in compound expressions, but it's not a statement by language logic.
You can quickly convert that expression_opt to a expression-statement by giving it a semicolon:
do_stuff();
is now a full statement.
The difference becomes clear if you think about something like
if(good_thing() || do_stuff())
{
....
}
do_stuff() and good_thing() are expressions, which can/will be evaluated. Semicolons after () would break that if clause.

condition execution using if or logic

when using objects I sometimes test for their existence
e.g
if(object)
object->Use();
could i just use
(object && object->Use());
and what differences are there, if any?
They're the same assuming object->Use() returns something that's valid in a boolean context; if it returns void the compiler will complain that a void return isn't being ignored like it should be, and other return types that don't fit will give you something like no match for 'operator&&'
One enormous difference is that the two function very differently if operator&& has been overloaded. Short circuit evaluation is only provided for the built in operators. In the case of an overloaded operator, both sides will be evaluated [in an unspecified order; operator&& also does not define a sequence point in this case], and the results passed to the actual function call.
If object and the return type of object->Use() are both primitive types, then you're okay. But if either are of class type, then it is possible object->Use() will be called even if object evaluates to false.
They are effectively the same thing but the second is not as clear as your first version, whose intent is obvious. Execution speed is probably no different, either.
Functionally they are the same, and a decent compiler should be able to optimize both equally well. However, writing an expression with operators like this and not checking the result is very odd. Perhaps if this style were common, it would be considered concise and easy to read, but it's not - right now it's just weird. You may get used to it and it could make perfect sense to you, but to others who read your code, their first impression will be, "What the heck is this?" Thus, I recommend going with the first, commonly used version if only to avoid making your fellow programmers insane.
When I was younger I think I would have found that appealing. I always wanted to trim down lines of code, but I realized later on that when you deviate too far from the norm, it'll bite you in the long run when you start working with a team. If you want to achieve zen-programming with minimum lines of code, focus on the logic more than the syntax.
I wouldn't do that. If you overloaded operator&& for pointer type pointing to object and class type returned by object->Use() all bets are off and there is no short-circuit evaluation.
Yes, you can. You see, C language, as well as C++, is a mix of two fairly independent worlds, or realms, if you will. There's the realm of statements and the realm of expressions. Each one can be seen as a separate sub-language in itself, with its own implementations of basic programming constructs.
In the realm of statements, the sequencing is achieved by the ; at the end of the single statement or by the } at the end of compound statement. In the realm of expressions the sequencing is provided by the , operator.
Branching in the realm of statements is implemented by if statement, while in the realm of expressions it can be implemented by either ?: operator or by use of the short-circuit evaluation properties of && and || operators (which is what you just did, assuming your expression is valid).
The realm of expressions has no cycles, but it has recursion that can replace it (requires function calls though, which inevitable forces us to switch to statements).
Obviously these realms are far from being equivalent in their power. C and C++ are languages dominated by statements. However, often one can implement fairly complex constructs using the language of expressions alone.
What you did above does implement equivalent branching in the language of expressions. Keep in mind that many people will find it hard to read in the normal code (mostly because, once again, they are used by statement-dominated C and C++ code). But it often comes very handy in some specific contexts, like template metaprogramming, for one example.