Compiler: how to check a user function returns properly?

Compiler: how to check a user function returns properly? - c++

I am writing a very simple compiler where users are allowed to define functions that return either void, int or char. However, users' function may be malformed. They may not return a value for a function that does not return void, or return a value for a function that returns void as declared. Currently my compiler is unable to detect this kind of errors and fails to generate proper code for function that returns void as this class of functions can return without a return; (they return implicitly). These two problems have cost me quite some time to phrase them out clearly. See the example code below:
// Problem A: detect implicit return.
void Foo(int Arg) {
if (Arg)
return;
else {
Arg = 1;
// Foo returns here! How can I know!
}
}
// Problem B: detect "forgotten return".
int Bar(int Arg) {
if (Arg > 1) {
return 1;
}
// this is an error: control flow reaches end at non-void function!
// How can I know!
}
I think the more general question may be: how can I tell the control flow reaches end at some point in a function? By saying reach end I mean the it reaches a point after which the function has no code to execute. If I can detect the end of control flow, I can look for a return at this point and either report an error if the function ought to return something or generate an explicit return for a void function. If I enumerate all such points of a function, I can ensure that the function is fully checked or complemented.
I see this problem as a well-solved one in compiler engineering since modern C/C++ can do that pretty well. Is LLVM can offer any API to do this? Or is there simple algorithm to achieve this? Thanks very much.
Edit: I am currently using LLVM and have BasicBlock emitted already. I hope a guide in doing this in LLVM specifically.
Edit: In this question we assume that the return type declared in the function prototype always matches that of its return stmt. I primarily focus on the absence of a required return.

The answer is simple. After all BB's of a function are emitted, loop over them and pick up those ends without a Terminator (see the llvm document for what is a Terminator Instruction). Assuming the emission of all kinds of control flow statements (While, For, etc.) follows the rule (One BB is ended by one and only one Terminator), the only possible explanation of these rule-breakers is that they miss a Return IR in the end. If the current function return void, append a ReturnVoid to them. Otherwise, this is an error, report it.
The reasoning is largely correct as it rely on the well-formed property of LLVM's BB and it is easy to implement, cheap to run. Here is the code:
/// Generate body for a Function.
void visitFuncDef(FuncDef *FD) {
// Unrelated code omitted...
/// Generate the body
for (Stmt *S : FD->stmts) {
visitStmt(S);
}
/// Check for well-formness of all BBs. In particular, look for
/// any unterminated BB and try to add a Return to it.
for (BasicBlock &BB : *Fn) {
Instruction *Terminator = BB.getTerminator();
if (Terminator != nullptr) continue; /// Well-formed
if (Fn->getReturnType()->isVoidTy()) {
/// Make implicit return of void Function explicit.
Builder.SetInsertPoint(&BB);
Builder.CreateRetVoid();
} else {
// How to attach source location?
EM.Error("control flow reaches end of non-void function");
// No source location, make errors short
return;
}
}
/// Verify the function body
String ErrorMsg;
llvm::raw_string_ostream OS(ErrorMsg);
if (llvm::verifyFunction(*Fn, &OS)) {
EM.Error(ErrorMsg);
}
}

Related

what is the purpose of return in the void function

Is there has any different by putting or not putting return in the the last line of function?
void InputClass::KeyDown(unsigned int input)
{
// If a key is pressed then save that state in the key array
m_keys[input] = true;
return;
}

No, there is no difference !
return in void functions is used to exit early on certain conditions.
Ex:
void f(bool cond)
{
// do stuff here
if(cond)
return;
// do other stuff
}

There is functionally no difference in your example, if we look at the C++ draft standard section 6.6.3 The return statement paragraph 2 says:
A return statement with neither an expression nor a braced-init-list can be used only in functions that do not return a value, that is, a function with the return type void, a constructor (12.1), or a destructor (12.4). [...] Flowing off the end of a function is equivalent to a return with no value; this results in undefined behavior in a value-returning function.

In your particular code, No. But usually if you want to have an early return from the function based on a condition then use return.
void InputClass::KeyDown(unsigned int input)
{
// If a key is pressed then save that state in the key array
m_keys[input] = true;
if(someCondition) //early return
return;
//continue with the rest of function
//.....
}

In this particular case, it serves absolutely no purpose - it also won't cause any problems (e.g. no extra code is generated assuming the compiler has at least some optimisation ablities).
There is of course a purpose to putting a return in the middle of a void function, so that some later part of the function is not executed.

No Difference, in your example,but if you want to an earlier return from the function in case,it is useful

return in void functions has multiple roles:
prematurely end function execution (e.g Algorithm finished, preconditions are not met)
in certain cases you design the algorithm such that in 85% of the cases will end sooner. (thus executing faster) leaving the other 15% of the case go after that return (thus running slower for some rare race conditions.
similar to a goto end.

In this case NOTHING. In other cases it might be because code is after the return statement and the author wanted to have it 'dead' for a little while. It should be deleted afterwards.

No, they are same things.
Use return; when you want to exit function.

Function reference and assert(0) in C++

I wish to understand what fetch().text and assert(0) do below. I am not familiar with a function like fetch() that can refers to a member of the return type, i.e. fetch().text. Is this somehow enabled by the use of assert(0)?
class SimpleS{
struct internal_element {
const char *text;
};
class SimpleE {
public:
SimpleE() {
}
const char* text() const {
return fetch().text;
}
void set_text(const char *text) {
fetch().text = text;
}
private:
internal_element& fetch() const {
... // some code
assert(0);
}
}

The assertion has nothing to do with it. What's happening here is that fetch() returns a reference to an internal_element. That enables you to refer to members of that struct in the returned value:
fetch().text
refers to the internal_element::text member of the internal_element object returned by fetch().
As to why there's an assert(0) in there, no idea. You didn't give us the code. Usually, when a function ends with such an assert is because the programmer wants to catch cases where he didn't cover some possibility. For example:
if (condition)
//...
else if (condition)
//...
// We should have covered all possible conditions above and already
// returned. If we actually get here, then we did something wrong.
assert(0);

The assert(0); will always fail. I suppose its purpose is to make sure that this method is never called in the first place.
(Assuming, of course, there is no condition for the assert(0); being run)

If the ... // some code in your code has a conditional return, assert(0) will never be called. assert(0) basically causes an exception and the program will die if the exception is not being caught in the calling function.

return fetch().text
calls the function fetch. This presumably returns an object of class internal_element. It then accesses the text member of this returned object and returns it. It's effectively equivalen to:
internal_element temp = fetch();
return temp.text;

In below code (which you shared), programmer got condition which make function return, thats what the // some code supposed to do...but if one of the the conditions is not true (and it return) control reaches end of function it means there is fatal error, so is the purpose of assert(0) at the end of function...
internal_element& fetch() const {
... // some code
assert(0);
}

C/C++ optimizing away checks to see if a function has already been run before

Let's say you have a function in C/C++, that behaves a certain way the first time it runs. And then, all other times it behaves another way (see below for example). After it runs the first time, the if statement becomes redundant and could be optimized away if speed is important. Is there any way to make this optimization?
bool val = true;
void function1() {
if (val == true) {
// do something
val = false;
}
else {
// do other stuff, val is never set to true again
}
}

gcc has a builtin function that let you inform the implementation about branch prediction:
__builtin_expect
http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
For example in your case:
bool val = true;
void function1()
{
if (__builtin_expect(val, 0)) {
// do something
val = false;
}
else {
// do other stuff, val is never set to true again
}
}

You should only make the change if you're certain that it truly is a bottleneck. With branch-prediction, the if statement is probably instant, since it's a very predictable pattern.
That said, you can use callbacks:
#include <iostream>
using namespace std;
typedef void (*FunPtr) (void);
FunPtr method;
void subsequentRun()
{
std::cout << "subsequent call" << std::endl;
}
void firstRun()
{
std::cout << "first run" << std::endl;
method = subsequentRun;
}
int main()
{
method = firstRun;
method();
method();
method();
}
produces the output:
first run subsequent call subsequent call

You could use a function pointer but then it will require an indirect call in any case:
void (*yourFunction)(void) = &firstCall;
void firstCall() {
..
yourFunction = &otherCalls;
}
void otherCalls() {
..
}
void main()
{
yourFunction();
}

One possible method is to compile two different versions of the function (this can be done from a single function in the source with templates), and use a function pointer or object to decide at runtime. However, the pointer overhead will likely outweigh any potential gains unless your function is really expensive.

You could use a static member variable instead of a global variable..
Or, if the code you're running the first time changes something for all future uses (eg, opening a file?), you could use that change as a check to determine whether or not to run the code (ie, check if the file is open). This would save you the extra variable. Also, it might help with error checking - if for some reason the initial change is be unchanged by another operation (eg, the file is on removable media that is removed improperly), your check could try to re-do the change.

A compiler can only optimize what is known at compile time.
In your case, the value of val is only known at runtime, so it can't be optimized.
The if test is very quick, you shouldn't worry about optimizing it.

If you'd like to make the code a little bit cleaner you could make the variable local to the function using static:
void function() {
static bool firstRun = true;
if (firstRun) {
firstRun = false;
...
}
else {
...
}
}
On entering the function for the first time, firstRun would be true, and it would persist so each time the function is called, the firstRun variable will be the same instance as the ones before it (and will be false each subsequent time).
This could be used well with #ouah's solution.

Compilers like g++ (and I'm sure msvc) support generating profile data upon a first run, then using that data to better guess what branches are most likely to be followed, and optimizing accordingly. If you're using gcc, look at the -fprofile-generate option.
The expected behavior is that the compiler will optimize that if statement such that the else will be ordered first, thus avoiding the jmp operation on all your subsequent calls, making it pretty much as fast as if it wern't there, especially if you return somewhere in that else (thus avoiding having to jump past the 'if' statements)

One way to make this optimization is to split the function in two. Instead of:
void function1()
{
if (val == true) {
// do something
val = false;
} else {
// do other stuff
}
}
Do this:
void function1()
{
// do something
}
void function2()
{
// do other stuff
}

One thing you can do is put the logic into the constructor of an object, which is then defined static. If such a static object occurs in a block scope, the constructor is run the fist time that an execution of that scope takes place. The once-only check is emitted by the compiler.
You can also put static objects at file scope, and then they are initialized before main is called.
I'm giving this answer because perhaps you're not making effective use of C++ classes.
(Regarding C/C++, there is no such language. There is C and there is C++. Are you working in C that has to also compile as C++ (sometimes called, unofficially, "Clean C"), or are you really working in C++?)
What is "Clean C" and how does it differ from standard C?

To remain compiler INDEPENDENT you can code the parts of if() in one function and else{} in another. almost all compilers optimize the if() else{} - so, once the most LIKELY being the else{} - hence code the occasional executable code in if() and the rest in a separate function that's called in else

Can I tell the compiler to consider a control path closed with regards to return value?

Say I have the following function:
Thingy& getThingy(int id)
{
for ( int i = 0; i < something(); ++i )
{
// normal execution guarantees that the Thingy we're looking for exists
if ( thingyArray[i].id == id )
return thingyArray[i];
}
// If we got this far, then something went horribly wrong and we can't recover.
// This function terminates the program.
fatalError("The sky is falling!");
// Execution will never reach this point.
}
Compilers will typically complain at this, saying that "not all control paths return a value". Which is technically true, but the control paths that don't return a value abort the program before the function ends, and are therefore semantically correct. Is there a way to tell the compiler (VS2010 in my case, but I'm curious about others as well) that a certain control path is to be ignored for the purposes of this check, without suppressing the warning completely or returning a nonsensical dummy value at the end of the function?

You can annotate the function fatalError (its declaration) to let the compiler know it will never return.
In C++11, this would be something like:
[[noreturn]] void fatalError(std::string const&);
Pre C++11, you have compiler specific attributes, such as GCC's:
void fatalError(std::string const&) __attribute__((noreturn));
or Visual Studio's:
__declspec(noreturn) void fatalError(std::string const&);

Why don't you throw an exception? That would solve the problem and it would force the calling method to deal with the exception.
If you did manage to haggle the warning out some way or other, you are still left with having to do something with the function that calls getThingy(). What happens when getThingy() fails? How will the caller know? What you have here is an exception (conceptually) and your design should reflect that.

You can use a run time assertion in lieu of your fatalError routine. This would just look like:
Thingy& getThingy(int id)
{
for ( int i = 0; i < something(); ++i )
{
if ( thingyArray[i].id == id )
return thingyArray[i];
}
// Clean up and error condition reporting go here.
assert(false);
}

Can I return in void function?

I have to return to the previous level of the recursion. is the syntax like below right?
void f()
{
// some code here
//
return;
}

Yes, you can return from a void function.
Interestingly, you can also return void from a void function. For example:
void foo()
{
return void();
}
As expected, this is the same as a plain return;. It may seem esoteric, but the reason is for template consistency:
template<class T>
T default_value()
{
return T();
}
Here, default_value returns a default-constructed object of type T, and because of the ability to return void, it works even when T = void.

Sure. You just shouldn't be returning an actual value.

Yes, you can use that code to return from the function. (I have to be very verbose here to make Stack Overflow not say that my answer is too short)

Yes, that will return from the function to the previous level of recursion. This is going to be very basic explanation, but when you call a function you are creating a new call stack. In a recursive function you are simply adding to that call stack. By returning from a function, whether you return a value or not, you are moving the stack pointer back to the previous function on the stack. It's sort of like a stack of plates. You keep putting plates on it, but than returning moves the top plate.
You could also verify this by using a debugger. Just put a few break points in your code and step through it. You can verify yourself that it works.

The simple answer to this is YES! C++ recognise void method as a function with no return. It basically tells the compiler that whatever happens, once you see the return; break and leave the method....

Yes, sometimes you may wish to return void() instead of just nothing.
Consider a void function that wants to call some pass-through void functions without a bunch of if-else.
return
InputEvent == E_Pressed ? Controller->Grip() :
InputEvent == E_Released ? Controller->Release() :
InputEvent == E_Touched ? Controller->Touch() : void();

You shouldn't have to have the return there, the program will return to the previous function by itself, go into debug mode and step through and you can see it yourself.
On the other hand i don't think having a return there will harm the program at all.

As everyone else said, yes you can. In this example, return is not necessary and questionably serves a purpose. I think what you are referring to is an early return in the middle of a function. You can do that too however it is bad programming practice because it leads to complicated control flow (not single-entry single-exit), along with statements like break. Instead, just skip over the remainder of the function using conditionals like if/else().

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Compiler: how to check a user function returns properly? - c++

Related

what is the purpose of return in the void function

Function reference and assert(0) in C++

C/C++ optimizing away checks to see if a function has already been run before

Can I tell the compiler to consider a control path closed with regards to return value?

Can I return in void function?

Categories

Resources