no return in function using switch statement - c++

I'm devoloping an application in LINUX with an older gcc version (7.something if I remember correctly).
Recently I tried to run the same application on Windows.
On Windows, I'm using MinGW as a compiler (with gcc 8.1.0) .
I came across this error message while compiling my application on Windows:
warning: control reaches end of non-void function [-Wreturn-type]
the code is similar to the following:
class myClass {
protected:
enum class myEnum{
a,
b,
};
int fun(myClass::myEnum e);
}
and
int myClass::fun(myClass::myEnum e) {
switch (e){
case myEnum::a:{
return 0;
}
case myEnum::b:{
return 1;
}
}
}
I understand what the error message means, I'm just wondering why it was never an issue in LINUX.
Is this piece of code really an issue and do I have to add some dummy return statements?
Is there a branch of this function that will lead to no return statement?

This is a shortcoming of g++ static analyzer. It doesn't get the fact that all enum values are handled in the switch statement correctly.
You can notice here https://godbolt.org/z/LQnBNi that clang doesn't issue any warning for the code in it's current shape, and issues two warnings ("not all enum values are handled in switch" and "controls reach the end on non-void function") when another value is added to the enum.
Keep in mind, that compiler diagnostic is not standardized in any way - compiler are free to report warnings for conforming code, and report warnings (and compile!) for a malformed program.

You have to keep in mind that in C++ enums are not what they are appear to be. They are just ints with some constraints, and can easily assume other values than these indicated. Consider this example:
#include <iostream>
enum class MyEnum {
A = 1,
B = 2
};
int main() {
MyEnum m {}; // initialized to 0
switch(m) {
case MyEnum::A: return 0;
case MyEnum::B: return 0;
}
std::cout << "skipped all cases!" << std::endl;
}
The way around this is to either put a default case with assert(false) as VTT indicated above, or (if you can give everybody the guarantee that no values from outside the indicated set will ever get there) use compiler-specific hint, like __builtin_unreachable() on GCC and clang:
switch(m) {
case MyEnum::A: return 0;
case MyEnum::B: return 0;
default: __builtin_unreachable();
}

Firstly, what you describe is a warning, not an error message. Compilers are not required to issue such warnings, and are still permitted to successfully compile your code - as it is technically valid.
Practically, most modern compilers CAN issue such warnings, but in their default configuration they do not. With gcc, the compiler can be optionally configured to issue such warnings (e.g. using suitable command line options).
The only reason this was "never an issue" under linux is because your chosen compiler was not configured (or used with a suitable command line option) to issue the warning.
Most compilers do extensive analysis of the code, either directly (during parsing the source code) or by analysis of some internal representation of that code. The analysis is needed to determine if the code has diagnosable errors, to work out how to optimise for performance.
Because of such analysis, most compilers can and do detect situations that may be problematical, even if the code does not have diagnosable errors (i.e. it is "correct enough" that the C++ standard does not require a diagnostic).
In this case, there are a number of distinct conclusions that the compiler may reach, depending on how it conducts analysis.
There is a switch. In principle, code after a switch statement may be executed.
The code after the switch reaches the end of the function without a return, and the function returns a value. The result of this is potential undefined behaviour.
If the compiler's analysis gets this far (and the compiler is configured to warn on such things) criteria for issuing a warning are met. Further analysis is then needed if the warning can be suppressed e.g. determine that all possible values of e are represented by a case, and that all cases have a return statement. The thing is, a compiler vendor may elect not to do such analysis, and therefore not suppress warnings, for all sorts of reasons.
Doing more analysis increases compilation times. Vendors compete on claims of their compiler being faster, among other things, so NOT doing some analysis is therefore beneficial in getting lower compilation times;
The compiler vendor may consider it is better to flag potential problems, even if the code is actually correct. Given a choice between giving extraneous warnings, or not warning about some things, the vendor may prefer to give extraneous warnings.
In either of these cases, analysis to determine that the warning can be suppressed will not be done, so the warning will not be suppressed. The compiler will simply not have done enough analysis to determine that all paths of execution through the function encounter a return statement.
In the end, you need to treat compiler warnings as sign of potential problems, and then make a sensible decision about whether the potential problem is worth bothering about. Your options from here include suppressing the warning (e.g. using a command line option that causes the warning to be suppressed), modifying the code to prevent the warning (e.g. adding a return after the switch and/or default case in the switch that returns).

One should be very careful when omitting return statements. It is an undefined behavior:
9.6.3 The return statement [stmt.return]
Flowing off the end of a constructor, a destructor, or a function with a cv void return type is equivalent to a return with no operand. Otherwise, flowing off the end of a function other than main (6.6.1) results in undefined behavior.
It may be tempting to consider that this code is fine because all the valid enumerator values (in this case in range 0..1 [0..(2 ^ M - 1)] with M = 1) are handled in switch however compiler is not required to perform any particular reachability analysis to figure this out prior to jumping into UB zone.
Moreover, example from SergeyA's answer shows that this kind of code is a straight time bomb:
class myClass {
protected:
enum class myEnum{
a,
b,
c
};
int fun(myClass::myEnum e);
};
int myClass::fun(myClass::myEnum e) {
switch (e){
case myEnum::a:{
return 0;
}
case myEnum::b:{
return 1;
}
case myEnum::c:{
return 2;
}
}
}
Just by adding a third enum member (and handling it in switch) the range of valid enumerator values gets extended to 0..3 ([0..(2 ^ M - 1)] with M = 2) and clang happily accepts it without any complaints even though passing 3 into this function will miss the switch because compiler is not required to report UB either.
So the rule of thumb would be to write code in a manner that all paths end either with return throw or [[noreturn]] function. In this particular case I would probably wrote a single return statement with an assertion for unhandled enumerator values:
int myClass::fun(myClass::myEnum e) {
int result{};
switch (e){
case myEnum::a:{
result = 0;
break;
}
case myEnum::b:{
result = 1;
break;
}
default:
{
assert(false);
break;
}
}
return result;
}

Related

How to speed up dynamic dispatch by 20% using computed gotos in standard C++

Before you down-vote or start saying that gotoing is evil and obsolete, please read the justification of why it is viable in this case. Before you mark it as duplicate, please read the full question.
I was reading about virtual machine interpreters, when I stumbled across computed gotos . Apparently they allow significant performance improvement of certain pieces of code. The most known example is the main VM interpreter loop.
Consider a (very) simple VM like this:
#include <iostream>
enum class Opcode
{
HALT,
INC,
DEC,
BIT_LEFT,
BIT_RIGHT,
RET
};
int main()
{
Opcode program[] = { // an example program that returns 10
Opcode::INC,
Opcode::BIT_LEFT,
Opcode::BIT_LEFT,
Opcode::BIT_LEFT,
Opcode::INC,
Opcode::INC,
Opcode::RET
};
int result = 0;
for (Opcode instruction : program)
{
switch (instruction)
{
case Opcode::HALT:
break;
case Opcode::INC:
++result;
break;
case Opcode::DEC:
--result;
break;
case Opcode::BIT_LEFT:
result <<= 1;
break;
case Opcode::BIT_RIGHT:
result >>= 1;
break;
case Opcode::RET:
std::cout << result;
return 0;
}
}
}
All this VM can do is a few simple operations on one number of type int and print it. In spite of its doubtable usefullness, it illustrates the subject nonetheless.
The critical part of the VM is obviously the switch statement in the for loop. Its performance is determined by many factors, of which the most inportant ones are most certainly branch prediction and the action of jumping to the appropriate point of execution (the case labels).
There is room for optimization here. In order to speed up the execution of this loop, one might use, so called, computed gotos.
Computed Gotos
Computed gotos are a construct well known to Fortran programmers and those using a certain (non-standard) GCC extension. I do not endorse the use of any non-standard, implementation-defined, and (obviously) undefined behavior. However to illustrate the concept in question, I will use the syntax of the mentioned GCC extension.
In standard C++ we are allowed to define labels that can later be jumped to by a goto statement:
goto some_label;
some_label:
do_something();
Doing this isn't considered good code (and for a good reason!). Although there are good arguments against using goto (of which most are related to code maintainability) there is an application for this abominated feature. It is the improvement of performance.
Using a goto statement can be faster than a function invocation. This is because the amount of "paperwork", like setting up the stack and returning a value, that has to be done when invoking a function. Meanwhile a goto can sometimes be converted into a single jmp assembly instruction.
To exploit the full potential of goto an extension to the GCC compiler was made that allows goto to be more dynamic. That is, the label to jump to can be determined at run-time.
This extension allows one to obtain a label pointer, similar to a function pointer and gotoing to it:
void* label_ptr = &&some_label;
goto (*label_ptr);
some_label:
do_something();
This is an interesting concept that allows us to further enhance our simple VM. Instead of using a switch statement we will use an array of label pointers (a so called jump table) and than goto to the appropriate one (the opcode will be used to index the array):
// [Courtesy of Eli Bendersky][4]
// This code is licensed with the [Unlicense][5]
int interp_cgoto(unsigned char* code, int initval) {
/* The indices of labels in the dispatch_table are the relevant opcodes
*/
static void* dispatch_table[] = {
&&do_halt, &&do_inc, &&do_dec, &&do_mul2,
&&do_div2, &&do_add7, &&do_neg};
#define DISPATCH() goto *dispatch_table[code[pc++]]
int pc = 0;
int val = initval;
DISPATCH();
while (1) {
do_halt:
return val;
do_inc:
val++;
DISPATCH();
do_dec:
val--;
DISPATCH();
do_mul2:
val *= 2;
DISPATCH();
do_div2:
val /= 2;
DISPATCH();
do_add7:
val += 7;
DISPATCH();
do_neg:
val = -val;
DISPATCH();
}
}
This version is about 25% faster than the one that uses a switch (the one on the linked blog post, not the one above). This is because there is only one jump performed after each operation, instead of two.
Control flow with switch:
For example, if we wanted to execute Opcode::FOO and then Opcode::SOMETHING, it would look like this:
As you can see, there are two jumps being performed after an instruction is executed. The first one is back to the switch code and the second is to the actual instruction.
In contrary, if we would go with an array of label pointers (as a reminder, they are non-standard), we would have only one jump:
It is worthwhile to note that in addition to saving cycles by doing less operations, we also enhance the quality of branch prediction by eliminating the additional jump.
Now, we know that by using an array of label pointers instead of a switch we can improve the performance of our VM significantly (by about 20%). I figured that maybe this could have some other applications too.
I came to the conclusion that this technique could be used in any program that has a loop in which it sequentially indirectly dispatches some logic. A simple example of this (apart from the VM) could be invoking a virtual method on every element of a container of polymorphic objects:
std::vector<Base*> objects;
objects = get_objects();
for (auto object : objects)
{
object->foo();
}
Now, this has much more applications.
There is one problem though: There is nothing such as label pointers in standard C++. As such, the question is: Is there a way to simulate the behaviour of computed gotos in standard C++ that can match them in performance?.
Edit 1:
There is yet another down side to using the switch. I was reminded of it by user1937198. It is bound checking. In short, it checks if the value of the variable inside of the switch matches any of the cases. It adds redundant branching (this check is mandated by the standard).
Edit 2:
In response to cmaster, I will clarify what is my idea on reducing overhead of virtual function calls. A dirty approach to this would be to have an id in each derived instance representing its type, that would be used to index the jump table (label pointer array). The problem is that:
There are no jump tables is standard C++
It would require as to modify all jump tables when a new derived class is added.
I would be thankful, if someone came up with some type of template magic (or a macro as a last resort), that would allow to write it to be more clean, extensible and automated, like this:
On a recent versions of MSVC, the key is to give the optimizer the hints it needs so that it can tell that just indexing into the jump table is a safe transform. There are two constraints on the original code that prevent this, and thus make optimising to the code generated by the computed label code an invalid transform.
Firstly in the original code, if the program counter overflows the program, then the loop exits. In the computed label code, undefined behavior (dereferencing an out of range index) is invoked. Thus the compiler has to insert a check for this, causing it to generate a basic block for the loop header rather than inlining that in each switch block.
Secondly in the original code, the default case is not handled. Whilst the switch covers all enum values, and thus it is undefined behavior for no branches to match, the msvc optimiser is not intelligent enough to exploit this, so generates a default case that does nothing. Checking this default case requires a conditional as it handles a large range of values. The computed goto code invokes undefined behavior in this case as well.
The solution to the first issue is simple. Don't use a c++ range for loop, use a while loop or a for loop with no condition. The solution for the second unfortunatly requires platform specific code to tell the optimizer the default is undefined behavior in the form of _assume(0), but something analogous is present in most compilers (__builtin_unreachable() in clang and gcc), and can be conditionally compiled to nothing when no equivalent is present without any correctness issues.
So the result of this is:
#include <iostream>
enum class Opcode
{
HALT,
INC,
DEC,
BIT_LEFT,
BIT_RIGHT,
RET
};
int run(Opcode* program) {
int result = 0;
for (int i = 0; true;i++)
{
auto instruction = program[i];
switch (instruction)
{
case Opcode::HALT:
break;
case Opcode::INC:
++result;
break;
case Opcode::DEC:
--result;
break;
case Opcode::BIT_LEFT:
result <<= 1;
break;
case Opcode::BIT_RIGHT:
result >>= 1;
break;
case Opcode::RET:
std::cout << result;
return 0;
default:
__assume(0);
}
}
}
The generated assembly can be verified on godbolt

Handling of switch enum class returns in clang, gcc and icc consistently

I am generally using clang to develop code, using all reasonable warnings I can (-Wall -Wextra [-Wpedantic]). One of the nice things about this setup is that the compiler checks for the consistency of the switch stataments in relation to the enumeration used. For example in this code:
enum class E{e1, e2};
int fun(E e){
switch(e){
case E::e1: return 11;
case E::e2: return 22; // if I forget this line, clang warns
}
}
clang would complain (warn) if: I omit either the e1 or the e2 case, and there is no-default case.
<source>:4:12: warning: enumeration value 'e2' not handled in switch [-Wswitch]
switch(e){
This behavior is great because
it checks for consistency at compile time between enums and switches, making them a very useful and inseparable pair of features.
I don't need to define an artificial default case for which I wouldn't have a good thing to do.
It allows me to omit a global return for which I wouldn't have a good thing to return (sometimes the return is not a simple type like int, it could be a type without a default constructor for example.
(Note that I am using an enum class so I assume only valid cases, as an invalid case can only be generated by a nasty cast on the callers end.)
Now the bad news:
Unfortunately this breaks down quickly when switching to other compilers.
In GCC and Intel (icc) the above code warns (using the same flags) that I am not returning from a non-void function.
<source>: In function 'int fun(E)':
<source>:11:1: warning: control reaches end of non-void function [-Wreturn-type]
11 | }
| ^
Compiler returned: 0
The only solution I found for this working to both have a default case and return a non-sensical value.
int fun(E e){
switch(e){
case E::e1: return 11;
case E::e2: return 22;
default: return {}; // or int{} // needed by GCC and icc
}
}
This is bad because of the reasons I stated above (and not even getting to the case where the return type has no default constructor).
But it is also bad because I can forget again one of the enum cases and now clang will not complain because there is a default case.
So what I ended up doing is to have this ugly code that works on these compilers and warns when it can for the right reasons.
enum E{e1, e2};
int fun(E e){
switch(e){
case E::e1: return 11;
case E::e2: return 22;
#ifndef __clang__
default: return {};
#endif
}
}
or
int fun(E e){
switch(e){
case E::e1: return 11;
case E::e2: return 22;
}
#ifndef __clang__
return {};
#endif
}
Is there a better way to do this?
This is the example: https://godbolt.org/z/h5_HAs
In the case on non-default constructible classes I am really out of good options completely:
A fun(E e){
switch(e){
case E::e1: return A{11};
case E::e2: return A{22};
}
#ifndef __clang__
return reinterpret_cast<A const&>(e); // :P, because return A{} could be invalid
#endif
}
https://godbolt.org/z/3WC5v8
It is important to note that, given your initial definition of fun, it is entirely legal C++ to do the following:
fun(static_cast<E>(2));
Any enumeration type can assume any value within the number of bits of its representation. The representation for a type with an explicit underlying type (enum class always has an underlying type; int by default) is the entirety of that underlying type. Therefore, an enum class by default can assume the value of any int.
This is not undefined behavior in C++.
As such, GCC is well within its rights to assume that fun may get any value within the range of its underlying type, rather than only one of its enumerators.
Standard C++ doesn't really have an answer for this. In an ideal world, C++ would have a contract system where you can declare up-front that fun requires that the parameter e be one of the enumerators. With that knowledge, GCC would know that the switch will take all control paths. Of course, even if C++20 had contracts (which is being retooled for C++23), there still isn't a way to test if an enum value only has values equal to one of its enumerators.
In a slightly less ideal world, C++ would have a way to explicitly tell the compiler that a piece of code is expected to be unreachable, and therefore the compiler can ignore the possibility of execution getting there. Unfortunately, that feature didn't make C++20 either.
So for the time being, you're stuck with compiler-specific alternatives.
All of these three compilers have the __builtin_unreachable() extension. You can use it to both suppress the warning (even if the return value has constructor problems) and to elicit better code generation:
enum class E{e1, e2};
int fun(E e){
switch(e){
case E::e1: return 11;
case E::e2: return 22;
}
__builtin_unreachable();
}
https://godbolt.org/z/0VP9af
This has nothing to do with enum or switch and everything to do with the compiler’s ability to prove a valid return statement through every path. Some compilers are better at this than others.
The correct way is to just add a valid return at the end of the function.
A fun(E e){
switch(c){
case E::e1: return A{11};
...
}
return A{11}; // can't get here, so return anything
}
Edit: some compilers (like MSVC) will complain if you have a return from an unreachable path. Just bracket the return by an #if for the compiler. Or as I often do, just have a RETURN(x) defined that is defined based on the compiler.

What is this C++ warning - control reaches end of non-void function

I am writing this function definition.
#include<iostream>
int abs_fun(int x);
int main()
{
int i =abs_fun(0);
std::cout<<i;
return 0;
}
int abs_fun(int x)
{
if (x<0)
return -x;
else if(x>=0) //warning
return x;
}
I am compiling the code using g++ -Wall -Wconversion -Wextra p4.cpp -o p4 and getting warning at the end of abs_fun() as warning: control reaches end of non-void function [-Wreturn-type] } If I just write else instead of else if, then warning goes away. Any suggestions why does warning appear?
Because the compiler is not smart enough (or just doesn't bother trying) to understand that one of the returns is always reached.
In the case of an if and else, if both the if and else branch return you can be sure that something returns in all cases.
In your case, it requires an analyses to determine that the conditions fully cover all possible use cases which the compiler is not required to do. In this case it's pretty straight forward, but consider more complicated conditions. If such simple conditions were required to be analyzed, it would be a case of "where do we draw the line?" in terms of what complexity should the compiler be expected to analyze.
The problem is that all control paths should either throw exception or return some value. Doing otherwise is undefined behavior, so compiler wants to warn you about that. In your case, compiler cannot prove that all execution paths lead to return/exception.
Why?
Because doing the check is expensive. Sometimes it might not even be possible. As a result, compiler just doesn't have such feature.
In which cases?
Well, this is more about discrete math.

Conditional branches

Why this piece of code compiles?
#include <iostream>
int foo(int x)
{
if(x == 10)
return x*10;
}
int main()
{
int a;
std::cin>>a;
std::cout<<foo(a)<<'\n';
}
The compiler shouldn't give me an error like "not all code paths returns a value"? What happens/returns my function when x isn't equal to ten?
The result is undefined, so the compiler is free to choose -- you probably get what happens to sit at the appropriate stack address where the caller expects the result. Activate compiler warnings, and your compiler will inform you about your omission.
The compiler is not required to give you an error in this circumstance. Many will, some will only issue warnings. Some apparently won't notice.
This is because it's possible that your code ensures outside of this function that the condition will always be true. Therefore, it isn't necessarily bad (though it almost always is, which is why most compilers will issue at least a warning).
The specification will state that the result of exiting a function that should return a value but doesn't is undefined behavior. A value may be returned. Or the program might crash. Or anything might happen. It's undefined.

Is there any c/c++ compiler that can warn (or give error) or enum conversion to int?

Cleaning up old c/c++ code that used hardcoded integer literals instead of enums, it is tedious to find places where the function-declaration has been properly refactored but not the body. e.g.
enum important {
little = 1,
abit = 2,
much = 3
};
void blah(int e)
{
// magic stuff here
}
void boing(int e) { ... }
void guck(important e)
{
switch (e) {
case 3: // this would be a good place for a warning
blah(e); // and this
break;
default:
boing((int)e); // but this is OK (although imperfect and a warning would be acceptable)
break;
}
}
Annotating/modifying each enum type or searching through the code for them would also be a fair amount of work as there are very very many different enums, so this is not preferred, but could be an acceptable solution.
I don't need it to be in any of our main compilers or other tools (gcc mostly) or platform (most), running it manually a couple of times would be enough, but I would prefer something that is not too esoteric or pricy.
lint will provide this warning for you (condition 641)
641 Converting enum to int -- An enumeration type was used in a context that
required a computation such as an argument to an arithmetic operator or was
compared with an integral argument. This warning will be suppressed if you
use the integer model of enumeration (+fie) but you will lose some valuable
type-checking in doing so. An intermediate policy is to simply turn off this
warning. Assignment of int to enum will still be caught.
Splint (http://www.splint.org/download.html) is a modern lint you can use
Sparse (a semantic checker tool used by the linux kernel people) can help you with some of this.
A subset of enum errors can be caught by these options: -Wenum-mismatch, -Wcast-truncate. However, I ran your code through this and doesn't look like any of those were caught.
This is Free software, should you want to extend it.