I am writing this function definition.
#include<iostream>
int abs_fun(int x);
int main()
{
int i =abs_fun(0);
std::cout<<i;
return 0;
}
int abs_fun(int x)
{
if (x<0)
return -x;
else if(x>=0) //warning
return x;
}
I am compiling the code using g++ -Wall -Wconversion -Wextra p4.cpp -o p4 and getting warning at the end of abs_fun() as warning: control reaches end of non-void function [-Wreturn-type] } If I just write else instead of else if, then warning goes away. Any suggestions why does warning appear?
Because the compiler is not smart enough (or just doesn't bother trying) to understand that one of the returns is always reached.
In the case of an if and else, if both the if and else branch return you can be sure that something returns in all cases.
In your case, it requires an analyses to determine that the conditions fully cover all possible use cases which the compiler is not required to do. In this case it's pretty straight forward, but consider more complicated conditions. If such simple conditions were required to be analyzed, it would be a case of "where do we draw the line?" in terms of what complexity should the compiler be expected to analyze.
The problem is that all control paths should either throw exception or return some value. Doing otherwise is undefined behavior, so compiler wants to warn you about that. In your case, compiler cannot prove that all execution paths lead to return/exception.
Why?
Because doing the check is expensive. Sometimes it might not even be possible. As a result, compiler just doesn't have such feature.
In which cases?
Well, this is more about discrete math.
Related
I'm devoloping an application in LINUX with an older gcc version (7.something if I remember correctly).
Recently I tried to run the same application on Windows.
On Windows, I'm using MinGW as a compiler (with gcc 8.1.0) .
I came across this error message while compiling my application on Windows:
warning: control reaches end of non-void function [-Wreturn-type]
the code is similar to the following:
class myClass {
protected:
enum class myEnum{
a,
b,
};
int fun(myClass::myEnum e);
}
and
int myClass::fun(myClass::myEnum e) {
switch (e){
case myEnum::a:{
return 0;
}
case myEnum::b:{
return 1;
}
}
}
I understand what the error message means, I'm just wondering why it was never an issue in LINUX.
Is this piece of code really an issue and do I have to add some dummy return statements?
Is there a branch of this function that will lead to no return statement?
This is a shortcoming of g++ static analyzer. It doesn't get the fact that all enum values are handled in the switch statement correctly.
You can notice here https://godbolt.org/z/LQnBNi that clang doesn't issue any warning for the code in it's current shape, and issues two warnings ("not all enum values are handled in switch" and "controls reach the end on non-void function") when another value is added to the enum.
Keep in mind, that compiler diagnostic is not standardized in any way - compiler are free to report warnings for conforming code, and report warnings (and compile!) for a malformed program.
You have to keep in mind that in C++ enums are not what they are appear to be. They are just ints with some constraints, and can easily assume other values than these indicated. Consider this example:
#include <iostream>
enum class MyEnum {
A = 1,
B = 2
};
int main() {
MyEnum m {}; // initialized to 0
switch(m) {
case MyEnum::A: return 0;
case MyEnum::B: return 0;
}
std::cout << "skipped all cases!" << std::endl;
}
The way around this is to either put a default case with assert(false) as VTT indicated above, or (if you can give everybody the guarantee that no values from outside the indicated set will ever get there) use compiler-specific hint, like __builtin_unreachable() on GCC and clang:
switch(m) {
case MyEnum::A: return 0;
case MyEnum::B: return 0;
default: __builtin_unreachable();
}
Firstly, what you describe is a warning, not an error message. Compilers are not required to issue such warnings, and are still permitted to successfully compile your code - as it is technically valid.
Practically, most modern compilers CAN issue such warnings, but in their default configuration they do not. With gcc, the compiler can be optionally configured to issue such warnings (e.g. using suitable command line options).
The only reason this was "never an issue" under linux is because your chosen compiler was not configured (or used with a suitable command line option) to issue the warning.
Most compilers do extensive analysis of the code, either directly (during parsing the source code) or by analysis of some internal representation of that code. The analysis is needed to determine if the code has diagnosable errors, to work out how to optimise for performance.
Because of such analysis, most compilers can and do detect situations that may be problematical, even if the code does not have diagnosable errors (i.e. it is "correct enough" that the C++ standard does not require a diagnostic).
In this case, there are a number of distinct conclusions that the compiler may reach, depending on how it conducts analysis.
There is a switch. In principle, code after a switch statement may be executed.
The code after the switch reaches the end of the function without a return, and the function returns a value. The result of this is potential undefined behaviour.
If the compiler's analysis gets this far (and the compiler is configured to warn on such things) criteria for issuing a warning are met. Further analysis is then needed if the warning can be suppressed e.g. determine that all possible values of e are represented by a case, and that all cases have a return statement. The thing is, a compiler vendor may elect not to do such analysis, and therefore not suppress warnings, for all sorts of reasons.
Doing more analysis increases compilation times. Vendors compete on claims of their compiler being faster, among other things, so NOT doing some analysis is therefore beneficial in getting lower compilation times;
The compiler vendor may consider it is better to flag potential problems, even if the code is actually correct. Given a choice between giving extraneous warnings, or not warning about some things, the vendor may prefer to give extraneous warnings.
In either of these cases, analysis to determine that the warning can be suppressed will not be done, so the warning will not be suppressed. The compiler will simply not have done enough analysis to determine that all paths of execution through the function encounter a return statement.
In the end, you need to treat compiler warnings as sign of potential problems, and then make a sensible decision about whether the potential problem is worth bothering about. Your options from here include suppressing the warning (e.g. using a command line option that causes the warning to be suppressed), modifying the code to prevent the warning (e.g. adding a return after the switch and/or default case in the switch that returns).
One should be very careful when omitting return statements. It is an undefined behavior:
9.6.3 The return statement [stmt.return]
Flowing off the end of a constructor, a destructor, or a function with a cv void return type is equivalent to a return with no operand. Otherwise, flowing off the end of a function other than main (6.6.1) results in undefined behavior.
It may be tempting to consider that this code is fine because all the valid enumerator values (in this case in range 0..1 [0..(2 ^ M - 1)] with M = 1) are handled in switch however compiler is not required to perform any particular reachability analysis to figure this out prior to jumping into UB zone.
Moreover, example from SergeyA's answer shows that this kind of code is a straight time bomb:
class myClass {
protected:
enum class myEnum{
a,
b,
c
};
int fun(myClass::myEnum e);
};
int myClass::fun(myClass::myEnum e) {
switch (e){
case myEnum::a:{
return 0;
}
case myEnum::b:{
return 1;
}
case myEnum::c:{
return 2;
}
}
}
Just by adding a third enum member (and handling it in switch) the range of valid enumerator values gets extended to 0..3 ([0..(2 ^ M - 1)] with M = 2) and clang happily accepts it without any complaints even though passing 3 into this function will miss the switch because compiler is not required to report UB either.
So the rule of thumb would be to write code in a manner that all paths end either with return throw or [[noreturn]] function. In this particular case I would probably wrote a single return statement with an assertion for unhandled enumerator values:
int myClass::fun(myClass::myEnum e) {
int result{};
switch (e){
case myEnum::a:{
result = 0;
break;
}
case myEnum::b:{
result = 1;
break;
}
default:
{
assert(false);
break;
}
}
return result;
}
During a discussion I had with a couple of colleagues the other day I threw together a piece of code in C++ to illustrate a memory access violation.
I am currently in the process of slowly returning to C++ after a long spell of almost exclusively using languages with garbage collection and, I guess, my loss of touch shows, since I've been quite puzzled by the behaviour my short program exhibited.
The code in question is as such:
#include <iostream>
using std::cout;
using std::endl;
struct A
{
int value;
};
void f()
{
A* pa; // Uninitialized pointer
cout<< pa << endl;
pa->value = 42; // Writing via an uninitialized pointer
}
int main(int argc, char** argv)
{
f();
cout<< "Returned to main()" << endl;
return 0;
}
I compiled it with GCC 4.9.2 on Ubuntu 15.04 with -O2 compiler flag set. My expectations when running it were that it would crash when the line, denoted by my comment as "writing via an uninitialized pointer", got executed.
Contrary to my expectations, however, the program ran successfully to the end, producing the following output:
0
Returned to main()
I recompiled the code with a -O0 flag (to disable all optimizations) and ran the program again. This time, the behaviour was as I expected:
0
Segmentation fault
(Well, almost: I didn't expect a pointer to be initialized to 0.) Based on this observation, I presume that when compiling with -O2 set, the fatal instruction got optimized away. This makes sense, since no further code accesses the pa->value after it's set by the offending line, so, presumably, the compiler determined that its removal would not modify the observable behaviour of the program.
I reproduced this several times and every time the program would crash when compiled without optimization and miraculously work, when compiled with -O2.
My hypothesis was further confirmed when I added a line, which outputs the pa->value, to the end of f()'s body:
cout<< pa->value << endl;
Just as expected, with this line in place, the program consistently crashes, regardless of the optimization level, with which it was compiled.
This all makes sense, if my assumptions so far are correct.
However, where my understanding breaks somewhat is in case where I move the code from the body of f() directly to main(), like so:
int main(int argc, char** argv)
{
A* pa;
cout<< pa << endl;
pa->value = 42;
cout<< pa->value << endl;
return 0;
}
With optimizations disabled, this program crashes, just as expected. With -O2, however, the program successfully runs to the end and produces the following output:
0
42
And this makes no sense to me.
This answer mentions "dereferencing a pointer that has not yet been definitely initialized", which is exactly what I'm doing, as one of the sources of undefined behaviour in C++.
So, is this difference in the way optimization affects the code in main(), compared to the code in f(), entirely explained by the fact that my program contains UB, and thus compiler is technically free to "go nuts", or is there some fundamental difference, which I don't know of, between the way code in main() is optimized, compared to code in other routines?
Your program has undefined behaviour. This means that anything may happen. The program is not covered at all by the C++ Standard. You should not go in with any expectations.
It's often said that undefined behaviour may "launch missiles" or "cause demons to fly out of your nose", to reinforce that point. The latter is more far-fetched but the former is feasible, imagine your code is on a nuclear launch site and the wild pointer happens to write a piece of memory that starts global thermouclear war..
Writing unknown pointers has always been something which could have unknown consequences. What's nastier is a currently-fashionable philosophy which suggests that compilers should assume that programs will never receive inputs that cause UB, and should thus optimize out any code which would test for such inputs if such tests would not prevent UB from occurring.
Thus, for example, given:
uint32_t hey(uint16_t x, uint16_t y)
{
if (x < 60000)
launch_missiles();
else
return x*y;
}
void wow(uint16_t x)
{
return hey(x,40000);
}
a 32-bit compiler could legitimately replace wow with an unconditional call to
launch_missiles without regard for the value of x, since x "can't possibly" be greater than 53687 (any value beyond that would cause the calculation of x*y to overflow. Even though the authors of C89 noted that the majority of compilers of that era would calculate the correct result in a situation like the above, since the Standard doesn't impose any requirements on compilers, hyper-modern philosophy regards it as "more efficient" for compilers to assume programs will never receive inputs that would necessitate reliance upon such things.
Suppose you wrote a function in c++, but absentmindedly forgot to type the word return. What would happen in that case? I was hoping that the compiler would complain, or at least a segmentation fault would be raised once the program got to that point. However, what actually happens is far worse: the program spews out rubbish. Not only that, but the actual output depends on the level of optimization! Here's some code that demonstrate this problem:
#include <iostream>
#include <vector>
using namespace std;
double max_1(double n1,
double n2)
{
if(n1>n2)
n1;
else
n2;
}
int max_2(const int n1,
const int n2)
{
if(n1>n2)
n1;
else
n2;
}
size_t max_length(const vector<int>& v1,
const vector<int>& v2)
{
if(v1.size()>v2.size())
v1.size();
else
v2.size();
}
int main(void)
{
cout << max_1(3,4) << endl;
cout << max_1(4,3) << endl;
cout << max_2(3,4) << endl;
cout << max_2(4,3) << endl;
cout << max_length(vector<int>(3,1),vector<int>(4,1)) << endl;
cout << max_length(vector<int>(4,1),vector<int>(3,1)) << endl;
return 0;
}
And here's what I get when I compile it at different optimization levels:
$ rm ./a.out; g++ -O0 ./test.cpp && ./a.out
nan
nan
134525024
134525024
4
4
$ rm ./a.out; g++ -O1 ./test.cpp && ./a.out
0
0
0
0
0
0
$ rm ./a.out; g++ -O2 ./test.cpp && ./a.out
0
0
0
0
0
0
$ rm ./a.out; g++ -O3 ./test.cpp && ./a.out
0
0
0
0
0
0
Now imagine that you're trying to debug the function max_length. In production mode you get the wrong answer, so you recompile in debug mode, and now when you run it everything works fine.
I know there are ways to avoid such cases altogether by adding the appropriate warning flags (-Wreturn-type), but I'm still have two questions
Why does the compiler even agree to compile a function without a return statement? Is this feature required for legacy code?
Why does the output depend on the optimization level?
This is undefined behavior to drop off the end of the value returning function, this is covered in the draft C++ standard section `6.6.31 The return statement which says:
Flowing off the end of a function is equivalent to a return with no
value; this results in undefined behavior in a value-returning
function.
The compiler is not required to issue a diagnostic, we can see this from section 1.4 Implementation compliance which says:
The set of diagnosable rules consists of all syntactic and semantic
rules in this International Standard except for those rules containing
an explicit notation that “no diagnostic is required” or which are
described as resulting in “undefined behavior.”
although compiler in general do try and catch a wide range of undefined behaviors and produce warnings, although usually you need to use the right set of flags. For gcc and clang I find the following set of flags to be useful:
-Wall -Wextra -Wconversion -pedantic
and in general I would encourage you to turn warnings into errors using -Werror.
Compiler are notorious for taking advantage of undefined behavior during the optimization stages, see Finding Undefined Behavior Bugs by Finding Dead Code for some good examples including the infamous Linux kernel null pointer check removal where in processing this code:
struct foo *s = ...;
int x = s->f;
if (!s) return ERROR;
gcc inferred that since s was deferenced in s->f; and since dereferencing a null pointer is undefined behavior then s must not be null and therefore optimizes away the if (!s) check on the next line (copied from my answer here).
Since undefined behavior is unpredictable, then at more aggressive settings the compiler in many cases will do more aggressive optimizations many of them may not make much intuitive sense but, hey it is undefined behavior so you should have no expectations anyway.
Note, that although there are many cases the compiler can determine a function is not properly returning in the general case this is the halting problem. Doing this at run-time automatically would carry a cost which violates the don't pay for what you don't use philosophy. Although both gcc and clang implement sanitizers to check for things like undefined behavior, for example using the -fsanitize=undefined flag would check for undefined behavior at run-time.
You may want to check out this answer here
The just of it is that the compiler allows you to not have a return statement since there are potentially many different execution paths, ensuring each will exit with a return can be tricky at compile time, so the compiler will take care of it for you.
Things to remember:
if main ends without a return it will always return 0.
if another function ends without a return it will always return the last value in the eax register, usually the last statement
optimization changes the code on the assembly level. This is why you are getting the weird behavior, the compiler is "fixing" your code for you changing when things are executed giving a different last value, and thus return value.
Hope this helped!
Why this piece of code compiles?
#include <iostream>
int foo(int x)
{
if(x == 10)
return x*10;
}
int main()
{
int a;
std::cin>>a;
std::cout<<foo(a)<<'\n';
}
The compiler shouldn't give me an error like "not all code paths returns a value"? What happens/returns my function when x isn't equal to ten?
The result is undefined, so the compiler is free to choose -- you probably get what happens to sit at the appropriate stack address where the caller expects the result. Activate compiler warnings, and your compiler will inform you about your omission.
The compiler is not required to give you an error in this circumstance. Many will, some will only issue warnings. Some apparently won't notice.
This is because it's possible that your code ensures outside of this function that the condition will always be true. Therefore, it isn't necessarily bad (though it almost always is, which is why most compilers will issue at least a warning).
The specification will state that the result of exiting a function that should return a value but doesn't is undefined behavior. A value may be returned. Or the program might crash. Or anything might happen. It's undefined.
It seems to me that it would work perfectly well to do tail-recursion optimization in both C and C++, yet while debugging I never seem to see a frame stack that indicates this optimization. That is kind of good, because the stack tells me how deep the recursion is. However, the optimization would be kind of nice as well.
Do any C++ compilers do this optimization? Why? Why not?
How do I go about telling the compiler to do it?
For MSVC: /O2 or /Ox
For GCC: -O2 or -O3
How about checking if the compiler has done this in a certain case?
For MSVC, enable PDB output to be able to trace the code, then inspect the code
For GCC..?
I'd still take suggestions for how to determine if a certain function is optimized like this by the compiler (even though I find it reassuring that Konrad tells me to assume it)
It is always possible to check if the compiler does this at all by making an infinite recursion and checking if it results in an infinite loop or a stack overflow (I did this with GCC and found out that -O2 is sufficient), but I want to be able to check a certain function that I know will terminate anyway. I'd love to have an easy way of checking this :)
After some testing, I discovered that destructors ruin the possibility of making this optimization. It can sometimes be worth it to change the scoping of certain variables and temporaries to make sure they go out of scope before the return-statement starts.
If any destructor needs to be run after the tail-call, the tail-call optimization can not be done.
All current mainstream compilers perform tail call optimisation fairly well (and have done for more than a decade), even for mutually recursive calls such as:
int bar(int, int);
int foo(int n, int acc) {
return (n == 0) ? acc : bar(n - 1, acc + 2);
}
int bar(int n, int acc) {
return (n == 0) ? acc : foo(n - 1, acc + 1);
}
Letting the compiler do the optimisation is straightforward: Just switch on optimisation for speed:
For MSVC, use /O2 or /Ox.
For GCC, Clang and ICC, use -O3
An easy way to check if the compiler did the optimisation is to perform a call that would otherwise result in a stack overflow — or looking at the assembly output.
As an interesting historical note, tail call optimisation for C was added to the GCC in the course of a diploma thesis by Mark Probst. The thesis describes some interesting caveats in the implementation. It's worth reading.
As well as the obvious (compilers don't do this sort of optimization unless you ask for it), there is a complexity about tail-call optimization in C++: destructors.
Given something like:
int fn(int j, int i)
{
if (i <= 0) return j;
Funky cls(j,i);
return fn(j, i-1);
}
The compiler can't (in general) tail-call optimize this because it needs
to call the destructor of cls after the recursive call returns.
Sometimes the compiler can see that the destructor has no externally visible side effects (so it can be done early), but often it can't.
A particularly common form of this is where Funky is actually a std::vector or similar.
gcc 4.3.2 completely inlines this function (crappy/trivial atoi() implementation) into main(). Optimization level is -O1. I notice if I play around with it (even changing it from static to extern, the tail recursion goes away pretty fast, so I wouldn't depend on it for program correctness.
#include <stdio.h>
static int atoi(const char *str, int n)
{
if (str == 0 || *str == 0)
return n;
return atoi(str+1, n*10 + *str-'0');
}
int main(int argc, char **argv)
{
for (int i = 1; i != argc; ++i)
printf("%s -> %d\n", argv[i], atoi(argv[i], 0));
return 0;
}
Most compilers don't do any kind of optimisation in a debug build.
If using VC, try a release build with PDB info turned on - this will let you trace through the optimised app and you should hopefully see what you want then. Note, however, that debugging and tracing an optimised build will jump you around all over the place, and often you cannot inspect variables directly as they only ever end up in registers or get optimised away entirely. It's an "interesting" experience...
As Greg mentions, compilers won't do it in debug mode. It's ok for debug builds to be slower than a prod build, but they shouldn't crash more often: and if you depend on a tail call optimization, they may do exactly that. Because of this it is often best to rewrite the tail call as an normal loop. :-(