How c++ is compiled? (Regarding variable declaration) - c++

So I went through this video - https://youtu.be/e4ax90XmUBc
Now, my doubt is that if C++ is compiled language, that is, it goes through the entire code and translates it, then if I do something like
void main() {
int a;
cout<<"This is a number = "<<a; //This will give an error (Why?)
a = 10;
}
Now, answer for this would be that I have not defined the value for a, which I learned in school. But if a compiler goes through the entire code and then translates it then I think it shouldn't give any error.
But by giving an error like this, it looks to me as if C++ is a interpreted language.
Can anyone put some light on this and help me solve my dilemma here?

Technically, the C++ standard doesn't mandate that the compiler has to compile C++ into machine code. As an example LLVM Clang first compiles it to IR (Intermediate Representation) and only then to machine code.
Similarly, a compiler could embed a copy of itself in a program that it compiles and then, when the program is executed compile the program, immediately invoke it and delete the executable afterwards which in practice would be very similar to the program being interpreted. In practice, all widely used C++ compilers parse and assemble programs beforehand.
Regarding your example, the statement "This will give an error" is a bit ambiguous. I'm not sure if you're saying that you're getting a compile-time error or a runtime error. As such, I will discuss both possibilities.
If you're getting a compile time error, then your compiler has noticed that your program has undefined behaviour. This is something that you always want to avoid (in some cases, such as when your application operates outside the scope of the C++ Standard, such as when interfacing with certain hardware, UB occurs by definition, as certain behaviour is not defined by the Standard). This is a simple form of static analysis. The Standard doesn't mandate the your compiler informs you of this error and it would usually be a runtime error, but your compiler informed you anyway because it noticed that you probably made a mistake. For example on g++ such behaviour could be achieved by using the -Wall -Werror flags.
In the case of the error being a runtime error then you're most likely seeing a message like "Memory Access Violation" (on Windows) or "Signal 11" (on Linux). This is due to the fact that your program accessed uninitialized memory which is Undefined Behaviour.
In practice, you wouldn't most likely get any error at all at runtime. Unless the compiler has embedded dynamic checks in your program, it would just silently print a (seemingly) random value and continue. The value comes from uninitialized memory.
Side note: main returns int rather than void. Also using namespace std; considered harmful.

Related

Strange behavior of VS2013

Recently I was bothering by a crash of my program in release mode while runs fine under debug mode.
By inspecting deeply into my code I found that I forget to return true at the end of a function, which causes the crash. The function should return false when fail, otherwise, it returns true.
I am wandering whether this is a defect of the compiler(vs 2013) as it (maybe) added for me the return true statement at the end of the function, however it did not when releasing. Consequently, the programmer will spent lots of time in debugging the fault, although, the programmer should blame.
:)
Flowing off the end of a function that is supposed to return a value is undefined behavior. Undefined behavior means the compiler can do anything and still be compliant. Giving a warning message is compliant. Not giving a warning message is compliant. Erasing your hard drive: That's also compliant. Fortunately for me, that hasn't happened yet. I've had the misfortune of invoking undefined behavior many, many times.
One reason this is undefined behavior is because there are some weird cases where flow analysis can't decide whether a function returns a value. Another reason is that you might have used assembly to set the return value in a way that works just fine on your computer. A third reason is that the compiler has to do flow analysis to make this determination; this is something many compilers don't do unless optimization is enabled.
That said, a lack of a return before the close brace will often trigger a compiler to check whether the function returns a value. The compiler was being nice to you when it issued a warning.
That you received a warning message and ignored it -- Never do that. Compile with flags set to a reasonably high level and address each and every warning. Code should always compile clean. Always.
C and C++ are tolerant languages. When the programmer writes code that the compiler can compile even if it looks weird, the compiler emits a warning. The warning means you are writing something that may contain an error, but you take the decision.
It allows to voluntarily do certain optimisations. For example, you can always use an 2D array as an 1D array what could not be done in some other languages. But the counterpart is never ignore a warning if you are not sure you know why you are forcing the compiler to to something it does not like
Conclusion : as soon as he has ignored a warning that at the end leads to the error, the programmer is to blame ;-)

Missing return statement does not produce an error [duplicate]

This question already has answers here:
Why does flowing off the end of a non-void function without returning a value not produce a compiler error?
(11 answers)
Closed 6 years ago.
I found this in one of my libraries this morning:
static tvec4 Min(const tvec4& a, const tvec4& b, tvec4& out)
{
tvec3::Min(a,b,out);
out.w = min(a.w,b.w);
}
I'd expect a compiler error because this method doesn't return anything, and the return type is not void.
The only two things that come to mind are
In the only place where this method is called, the return value isn't being used or stored. (This method was supposed to be void - the tvec4 return type is a copy-and-paste error)
a default constructed tvec4 is being created, which seems a bit unlike, oh, everything else in C++.
I haven't found the part of the C++ spec that addresses this. References (ha) are appreciated.
Update
In some circumstances, this generates an error in VS2012. I haven't narrowed down specifics, but it's interesting, nonetheless.
This is undefined behavior from the C++11 draft standard section 6.6.3 The return statement paragraph 2 which says:
[...] Flowing off the end of a function is equivalent to a return with no value; this results in undefined behavior in a value-returning function. [...]
This means that the compiler is not obligated provide an error nor a warning usually because it can be difficult to diagnose in all cases. We can see this from the definition of undefined behavior in the draft standard in section 1.3.24 which says:
[...]Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).[...]
Although in this case we can get both gcc and clang to generate a wanring using the -Wall flag, which gives me a warning similar to this:
warning: control reaches end of non-void function [-Wreturn-type]
We can turn this particular warning into an error using the -Werror=return-type flag. I also like to use -Wextra -Wconversion -pedantic for my own personal projects.
As ComicSansMS mentions in Visual Studio this code would generate C4716 which is an error by default, the message I see is:
error C4716: 'Min' : must return a value
and in the case where not all code paths would return a value then it would generate C4715, which is a warning.
Maybe some elaboration on the why part of the question:
As it turns out, it is actually quite hard† for a C++ compiler to determine whether a function exits without a return value. In addition to the code paths that end in explicit return statements and the ones that fall off the end of the function, you also have to consider potential exception throws or longjmps in the function itself, as well as all of its callees.
While it is quite easy for a compiler to identify a function that looks like it might be missing a return, it is considerably harder to prove that it is missing a return. In order to lift compiler vendors of this burden, the standard does not require this to generate an error.
So compiler vendors are free to generate a warning if they are quite sure that a function is missing a return and the user is then free to ignore/mask that warning in those rare cases where the compiler was actually wrong.
†: In the general case, this is equivalent to the halting problem, so it is actually impossible for a machine to decide this reliably.
Compile your code with -Wreturn-type option:
$ g++ -Wreturn-type source.cpp
This will give you warning. You can turn the warning into error if you use -Werror too:
$ g++ -Wreturn-type -Werror source.cpp
Note that this will turn all warnings into errors. So if you want error for specific warning, say -Wreturn-type, just type return-type without -W part as:
$ g++ -Werror=return-type source.cpp
In general you should always use -Wall option which includes most common warnings — this includes missing return statement also. Along with -Wall, you can use -Wextra also, which includes other warnings not included by -Wall.
Maybe some additional elaboration on the why part of the question.
C++ was designed so that a very large body of pre-existing body of C code compiles with minimum amount of changes. Unfortunately, C itself was paying a similar duty to earliest pre-standard C which did not even have the void keyword and instead relied on a default return type of int. C functions usually did return values, and whenever code superficially similar to Algol/Pascal/Basic procedures was written without any return statements, the function was, under the hood, returning whichever garbage was left on the stack. Neither the caller nor the callee assigns the value of the garbage in a reliable way. If the garbage is then ignored by every caller, everything is fine and C++ inherits the moral obligation to compile such code.
(If the returned value is used by the caller, the code may behave non-deterministically, similar to processing of an uninitialized variable. Could the difference be reliably identified by a compiler, in a hypothetical successor language to C? This is hardly possible. The caller and the callee may be in different compilation units.)
The implicit int is just a part of the C legacy involved here. A "dispatcher" function might, depending on a parameter, return a variety of types from some code branches, and return no useful value from other code branches. Such a function would generally be declared to return a type long enough to hold any of the possible types and the caller might need to cast it or extract it from a union.
So the deepest cause is probably the C language creators' belief that procedures that do not return any value are just an unimportant special case of functions that do; this problem got aggravated by the lack of focus on type safety of function calls in the oldest C dialects.
While C++ did break compatibility with some of the worst aspects of C (example), the willingness to compile a return statement without a value (or the implicit value-less return at the end of a function) was not one of them.
As already mentioned, this is undefined behavior and will give you a compiler warning. Most places I've worked require you to turn on compiler settings to treat warnings as errors - which enforces that all your code must compile with 0 errors and 0 warnings. This is a good example of why that is a good idea.
This is more of the standard C++ rule/feature which tends to be flexible with things and which tends to be more close to C.
But when we talk of the compilers, GCC or VS, they are more for professional usage and for variety of development purposes and hence put more strict development rules as per your needs.
That makes sense also, my personal opinion, because the language is all about features and its usage whereas compiler defines the rules for optimal and best way of using it as per your needs.
As mentioned in above post, compiler sometimes gives the error, sometimes gives warning and also it has the option of skipping these warning etc, indicating the freedom to use the language and its features in a way that suits us best.
Along with this there are several other questions mentioning this behaviour of returning a result without having a return statement. One simple example would be:
int foo(int a, int b){ int c = a+b;}
int main(){
int c = 5;
int d = 5;
printf("f(%d,%d) is %d\n", c, d, foo(c,d));
return 0;
}
Could this anomaly be due stack properties and more specifically:
Zero-Address Machines
In zero-address machines, locations of both operands are assumed to be at a default location.
These machines use the stack as the source of the input operands and the result goes back into
the stack. Stack is a LIFO (last-in-first-out) data structure that all processors support, whether
or not they are zero-address machines. As the name implies, the last item placed on the stack
is the first item to be taken out of the stack. All operations on this type of machine assume that the required input operands are the top
two values on the stack. The result of the operation is placed on top of the stack.
In addition to that, for accessing memory to read and write data same registers are used as data source and destination(DS (data segment) register), that store first the variables needed for the calculation and then the returned result.
Note:
with this answer I would like to discuss one possible explanation of the strange behaviour at machine (instruction) level as it has already a context and its covered in adequately wide range.

Why does this C++ snippet compile (non-void function does not return a value) [duplicate]

This question already has answers here:
Why does flowing off the end of a non-void function without returning a value not produce a compiler error?
(11 answers)
Closed 6 years ago.
I found this in one of my libraries this morning:
static tvec4 Min(const tvec4& a, const tvec4& b, tvec4& out)
{
tvec3::Min(a,b,out);
out.w = min(a.w,b.w);
}
I'd expect a compiler error because this method doesn't return anything, and the return type is not void.
The only two things that come to mind are
In the only place where this method is called, the return value isn't being used or stored. (This method was supposed to be void - the tvec4 return type is a copy-and-paste error)
a default constructed tvec4 is being created, which seems a bit unlike, oh, everything else in C++.
I haven't found the part of the C++ spec that addresses this. References (ha) are appreciated.
Update
In some circumstances, this generates an error in VS2012. I haven't narrowed down specifics, but it's interesting, nonetheless.
This is undefined behavior from the C++11 draft standard section 6.6.3 The return statement paragraph 2 which says:
[...] Flowing off the end of a function is equivalent to a return with no value; this results in undefined behavior in a value-returning function. [...]
This means that the compiler is not obligated provide an error nor a warning usually because it can be difficult to diagnose in all cases. We can see this from the definition of undefined behavior in the draft standard in section 1.3.24 which says:
[...]Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).[...]
Although in this case we can get both gcc and clang to generate a wanring using the -Wall flag, which gives me a warning similar to this:
warning: control reaches end of non-void function [-Wreturn-type]
We can turn this particular warning into an error using the -Werror=return-type flag. I also like to use -Wextra -Wconversion -pedantic for my own personal projects.
As ComicSansMS mentions in Visual Studio this code would generate C4716 which is an error by default, the message I see is:
error C4716: 'Min' : must return a value
and in the case where not all code paths would return a value then it would generate C4715, which is a warning.
Maybe some elaboration on the why part of the question:
As it turns out, it is actually quite hard† for a C++ compiler to determine whether a function exits without a return value. In addition to the code paths that end in explicit return statements and the ones that fall off the end of the function, you also have to consider potential exception throws or longjmps in the function itself, as well as all of its callees.
While it is quite easy for a compiler to identify a function that looks like it might be missing a return, it is considerably harder to prove that it is missing a return. In order to lift compiler vendors of this burden, the standard does not require this to generate an error.
So compiler vendors are free to generate a warning if they are quite sure that a function is missing a return and the user is then free to ignore/mask that warning in those rare cases where the compiler was actually wrong.
†: In the general case, this is equivalent to the halting problem, so it is actually impossible for a machine to decide this reliably.
Compile your code with -Wreturn-type option:
$ g++ -Wreturn-type source.cpp
This will give you warning. You can turn the warning into error if you use -Werror too:
$ g++ -Wreturn-type -Werror source.cpp
Note that this will turn all warnings into errors. So if you want error for specific warning, say -Wreturn-type, just type return-type without -W part as:
$ g++ -Werror=return-type source.cpp
In general you should always use -Wall option which includes most common warnings — this includes missing return statement also. Along with -Wall, you can use -Wextra also, which includes other warnings not included by -Wall.
Maybe some additional elaboration on the why part of the question.
C++ was designed so that a very large body of pre-existing body of C code compiles with minimum amount of changes. Unfortunately, C itself was paying a similar duty to earliest pre-standard C which did not even have the void keyword and instead relied on a default return type of int. C functions usually did return values, and whenever code superficially similar to Algol/Pascal/Basic procedures was written without any return statements, the function was, under the hood, returning whichever garbage was left on the stack. Neither the caller nor the callee assigns the value of the garbage in a reliable way. If the garbage is then ignored by every caller, everything is fine and C++ inherits the moral obligation to compile such code.
(If the returned value is used by the caller, the code may behave non-deterministically, similar to processing of an uninitialized variable. Could the difference be reliably identified by a compiler, in a hypothetical successor language to C? This is hardly possible. The caller and the callee may be in different compilation units.)
The implicit int is just a part of the C legacy involved here. A "dispatcher" function might, depending on a parameter, return a variety of types from some code branches, and return no useful value from other code branches. Such a function would generally be declared to return a type long enough to hold any of the possible types and the caller might need to cast it or extract it from a union.
So the deepest cause is probably the C language creators' belief that procedures that do not return any value are just an unimportant special case of functions that do; this problem got aggravated by the lack of focus on type safety of function calls in the oldest C dialects.
While C++ did break compatibility with some of the worst aspects of C (example), the willingness to compile a return statement without a value (or the implicit value-less return at the end of a function) was not one of them.
As already mentioned, this is undefined behavior and will give you a compiler warning. Most places I've worked require you to turn on compiler settings to treat warnings as errors - which enforces that all your code must compile with 0 errors and 0 warnings. This is a good example of why that is a good idea.
This is more of the standard C++ rule/feature which tends to be flexible with things and which tends to be more close to C.
But when we talk of the compilers, GCC or VS, they are more for professional usage and for variety of development purposes and hence put more strict development rules as per your needs.
That makes sense also, my personal opinion, because the language is all about features and its usage whereas compiler defines the rules for optimal and best way of using it as per your needs.
As mentioned in above post, compiler sometimes gives the error, sometimes gives warning and also it has the option of skipping these warning etc, indicating the freedom to use the language and its features in a way that suits us best.
Along with this there are several other questions mentioning this behaviour of returning a result without having a return statement. One simple example would be:
int foo(int a, int b){ int c = a+b;}
int main(){
int c = 5;
int d = 5;
printf("f(%d,%d) is %d\n", c, d, foo(c,d));
return 0;
}
Could this anomaly be due stack properties and more specifically:
Zero-Address Machines
In zero-address machines, locations of both operands are assumed to be at a default location.
These machines use the stack as the source of the input operands and the result goes back into
the stack. Stack is a LIFO (last-in-first-out) data structure that all processors support, whether
or not they are zero-address machines. As the name implies, the last item placed on the stack
is the first item to be taken out of the stack. All operations on this type of machine assume that the required input operands are the top
two values on the stack. The result of the operation is placed on top of the stack.
In addition to that, for accessing memory to read and write data same registers are used as data source and destination(DS (data segment) register), that store first the variables needed for the calculation and then the returned result.
Note:
with this answer I would like to discuss one possible explanation of the strange behaviour at machine (instruction) level as it has already a context and its covered in adequately wide range.

C/C++ unused inline function undefined reference

Consider the following code (this is not pthread specific; other examples, such as those involving the realtime library, exhibit similar behavior):
#define _GNU_SOURCE
#include <pthread.h>
inline void foo() {
static cpu_set_t cpuset;
pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpuset);
}
int main(int argc, char *argv[]) { }
This is a valid program in C and in C++. So I save the contents of this to testc.c and testcpp.cpp and try to build.
When I build in C++ I get no error. When I build in C I get an undefined reference error. Now, this error occurs in -O1 and in -O3. Is there anyway to instruct gcc to do the right thing (see that foo is unused and skip the requirement for a definition of pthread_setaffinity_np)?
EDIT: I thought it was obvious from context, but the error message is:
/tmp/ccgARGVJ.o: In function `foo':
testc.c:(.text+0x17): undefined reference to `pthread_setaffinity_np'
Note that since foo isn't being referenced in the main path, g++ correctly ignores the function entirely but gcc does not.
EDIT 2: Let me try this one more time. The function foo, and the subsequent call to pthread_setaffinity_np, is unused. The main function is empty. Just look at it! Somehow, g++ figured out that foo did not need to be included, and subsequently the build process did not trip up when we intentionally omitted -lpthread (and checking the exported symbols with nm confirms that neither foo nor reference to pthread_setaffinity_np were needed). The resultant output from gcc didn't pick up on that fact.
I am asking this question because the C++ and the C frontends seem to give different results on the same input. This doesn't seem to be an ld issue prima facie because I would expect both paths to give the same linking error, which is why I emphasized that it seems to be a compiler issue. If both C++ and C gave problems, then yes I would agree that its a linking issue.
Well, apparently your program contains an error: you declare and call function pthread_setaffinity_np, but you never define it. Apparently you forgot to supply the library that contains the definition. This is an error in both C and C++.
In other words, this is not a valid program in C and in C++. It violates the One Definition Rule of C++ (and whatever the similar rule is called in C).
The rest depends on whether the compiler will catch this error and issue a diagnostic message for it. While formally the compiler is supposed to catch it, in reality linking errors are not always caught by the compilation process (in extended sense of the term, i.e. including linking as well).
Whether they are caught or not might depend on many factors. In this particular case the factor that matters is apparently the difference between the properties of inline functions of C and C++ languages. (And yes, they are really different between C and C++). I would guess that in C++ mode the compiler decided that this inline function does not need the actual body, while in C mode it decided to generate the body anyway.
So, again, if this program, somehow successfully compiles in some circumstances, it is only because you got lucky. You seem to believe that a function that is not called is supposed to be "ignored entirely". Neither C nor C++ make such guarantees. Assuming that the definition of pthread_setaffinity_np is indeed missing, your program is invalid in both C and C++. For this reason, the compiler that refused to compile it is actually the one with the correct behavior.
Taking the above into account, you might want to ask yourself whether you really care about why you got different error reports in C and C++ modes. If you do, it will require some research into the internal mechanics of that specific implementation and won't have much to do with the languages themselves.
In C, the inline keyword does not affect the linkage of the function. Thus foo has external linkage, and cannot be optimized out because it might be called from another translation unit. If the compiler/assembler put functions in their own individual sections and the linker is able to discard unneeded function sections at link time, it might be able to avoid a linking error, but to be correct, since this program references pthread_setaffinity_np, it must contain a definition for that function somewhere anyway, i.e. you must use -lpthread or equivalent.
In C++, inline functions have internal some weird pseudo-external linkage by default, so gcc optimized it out. See the comments for details.
In short, the lack of an error in certain configurations is a failure of gcc to diagnose an invalid program. It's not the behavior you should expect.
The other lesson you should take away from this is that C and C++ are nowhere near the same thing. Choose which one you're writing and stick to it! Don't try to write code that's "interchangeable" between the two or you're likely to make it subtly incorrect in both...
inline is only a suggestion, not something a compiler is obligated to listen to, so it can't assume that foo is not used in another compilation unit.
But, yeah, it would be nice to know exactly which is the undefined reference, given that you didn't post the error, and odd that it's shows up in C and not C++ compilation.
foo might not be used in your source code, but it's almost certainly referenced elsewhere in the build process and consequently it needs to be compiled.
Especially since a lot of optimization occur in the linking process, because the linker can determine that a function is "dead" and can be discarded.
If, internally, the linker decides to assemble the entire program as one pass, and then optimization in another, I would expect you to see this error (how can it assemble the whole program?)
Further, if the function is to be exported then it most certainly has to be compiled, linked, and end up in the output.
It sounds like you're relying on compiler/linker specific behavior.

A C++ implementation that detects undefined behavior?

A huge number of operations in C++ result in undefined behavior, where the spec is completely mute about what the program's behavior ought to be and allows for anything to happen. Because of this, there are all sorts of cases where people have code that compiles in debug but not release mode, or that works until a seemingly unrelated change is made, or that works on one machine but not another, etc.
My question is whether there is a utility that looks at the execution of C++ code and flags all instances where the program invokes undefined behavior. While it's nice that we have tools like valgrind and checked STL implementations, these aren't as strong as what I'm thinking about - valgrind can have false negatives if you trash memory that you still have allocated, for example, and checked STL implementations won't catch deleting through a base class pointer.
Does this tool exist? Or would it even be useful to have it lying around at all?
EDIT: I am aware that in general it is undecidable to statically check whether a C++ program may ever execute something that has undefined behavior. However, it is possible to determine whether a specific execution of a C++ produced undefined behavior. One way to do this would be to make a C++ interpreter that steps through the code according to the definitions set out in the spec, at each point determining whether or not the code has undefined behavior. This won't detect undefined behavior that doesn't occur on a particular program execution, but it will find any undefined behavior that actually manifests itself in the program. This is related to how it is Turing-recognizable to determine if a TM accepts some input, even if it's still undecidable in general.
Thanks!
This is a great question, but let me give an idea for why I think it might be impossible (or at least very hard) in general.
Presumably, such an implementation would almost be a C++ interpreter, or at least a compiler for something more like Lisp or Java. It would need to keep extra data for each pointer to ensure you did not perform arithmetic outside of an array or dereference something that was already freed or whatever.
Now, consider the following code:
int *p = new int;
delete p;
int *q = new int;
if (p == q)
*p = 17;
Is the *p = 17 undefined behavior? On the one hand, it dereferences p after it has been freed. On the other hand, dereferencing q is fine and p == q...
But that is not really the point. The point is that whether the if evaluates to true at all depends on the details of the heap implementation, which can vary from implementation to implementation. So replace *p = 17 by some actual undefined behavior, and you have a program that might very well blow up on a normal compiler but run fine on your hypothetical "UB detector". (A typical C++ implementation will use a LIFO free list, so the pointers have a good chance of being equal. A hypothetical "UB detector" might work more like a garbage collected language in order to detect use-after-free problems.)
Put another way, the existence of merely implementation-defined behavior makes it impossible to write a "UB detector" that works for all programs, I suspect.
That said, a project to create an "uber-strict C++ compiler" would be very interesting. Let me know if you want to start one. :-)
John Regehr in Finding Undefined Behavior Bugs by Finding Dead Code points out a tool called STACK and I quote from the site (emphasis mine):
Optimization-unstable code (unstable code for short) is an emerging class of software bugs: code that is unexpectedly eliminated by compiler optimizations due to undefined behavior in the program. Unstable code is present in many systems, including the Linux kernel and the Postgres database server. The consequences of unstable code range from incorrect functionality to missing security checks.
STACK is a static checker that detects unstable code in C/C++ programs. Applying STACK to widely used systems has uncovered 160 new bugs that have been confirmed and fixed by developers.
Also in C++11 for the case of constexpr variables and functions undefined behavior should be caught at compile time.
We also have gcc ubsan:
GCC recently (version 4.9) gained Undefined Behavior Sanitizer
(ubsan), a run-time checker for the C and C++ languages. In order to
check your program with ubsan, compile and link the program with
-fsanitize=undefined option. Such instrumented binaries have to be executed; if ubsan detects any problem, it outputs a “runtime error:”
message, and in most cases continues executing the program.
and Clang Static Analyzer which includes many checks for undefined behavior. For example clangs -fsanitize checks which includes -fsanitize=undefined:
-fsanitize=undefined: Fast and compatible undefined behavior checker. Enables the undefined behavior checks that have small runtime cost and
no impact on address space layout or ABI. This includes all of the
checks listed below other than unsigned-integer-overflow.
and for C we can look at his article It’s Time to Get Serious About Exploiting Undefined Behavior which says:
[..]I confess to not personally having the gumption necessary for cramming GCC or LLVM through the best available dynamic undefined behavior checkers: KCC and Frama-C.[...]
Here is a link to kcc and I quote:
[...]If you try to run a program that is undefined (or one for which we are missing semantics), the program will get stuck. The message should tell you where it got stuck and may give a hint as to why. If you want help deciphering the output, or help understanding why the program is undefined, please send your .kdump file to us.[...]
and here are a link to Frama-C, an article where the first use of Frama-C as a C interpreter is described and an addendum to the article.
Using g++
-Wall -Werror -pedantic-error
(preferably with an appropriate -std argument as well) will pick up quite a few case of U.B.
Things that -Wall gets you include:
-pedantic
Issue all the warnings demanded by strict ISO C and ISO C++; reject
all programs that use forbidden extensions, and some other programs
that do not follow ISO C and ISO C++. For ISO C, follows the
version of the ISO C standard specified by any -std option used.
-Winit-self (C, C++, Objective-C and Objective-C++ only)
Warn about uninitialized variables which are initialized with
themselves. Note this option can only be used with the
-Wuninitialized option, which in turn only works with -O1 and
above.
-Wuninitialized
Warn if an automatic variable is used without first being
initialized or if a variable may be clobbered by a "setjmp" call.
and various disallowed things you can do with specifiers to printf and scanf family functions.
Clang has a suite of sanitizers that catch various forms of undefined behavior. Their eventual goal is to be able to catch all C++ core language undefined behavior, but checks for a few tricky forms of undefined behavior are missing right now.
For a decent set of sanitizers, try:
clang++ -fsanitize=undefined,address
-fsanitize=address checks for use of bad pointers (not pointing to valid memory), and -fsanitize=undefined enables a set of lightweight UB checks (integer overflow, bad shifts, misaligned pointers, ...).
-fsanitize=memory (for detecting uninitialized memory reads) and -fsanitize=thread (for detecting data races) are also useful, but neither of these can be combined with -fsanitize=address nor with each other because all three have an invasive impact on the program's address space.
You might want to read about SAFECode.
This is a research project from the University of Illinois, the goal is stated on the front page (linked above):
The purpose of the SAFECode project is to enable program safety without garbage collection and with minimal run-time checks using static analysis when possible and run-time checks when necessary. SAFECode defines a code representation with minimal semantic restrictions designed to enable static enforcement of safety, using aggressive compiler techniques developed in this project.
What is really interesting to me is the elimination of the runtime checks whenever the program can be proved to be correct statically, for example:
int array[N];
for (i = 0; i != N; ++i) { array[i] = 0; }
Should not incur any more overhead than the regular version.
In a lighter fashion, Clang has some guarantees about undefined behavior too as far as I recall, but I cannot get my hands on it...
The clang compiler can detect some undefined behaviors and warn against them. Probably not as complete as you want, but it's definitely a good start.
Unfortunately I'm not aware of any such tool. Typically UB is defined as such precisely because it would be hard or impossible for a compiler to diagnose it in all cases.
In fact your best tool is probably compiler warnings: They often warn about UB type items (for example, non-virtual destructor in base classes, abusing the strict-aliasing rules, etc).
Code review can also help catch cases where UB is relied upon.
Then you have to rely on valgrind to capture the remaining cases.
Just as a side observation, according to the theory of computability, you cannot have a program that detects all possible undefined behaviours.
You can only have tools that use heuristics and detect some particular cases that follow certain patterns. Or you can in certain cases prove that a program behaves as you want. But you cannot detect undefined behaviour in general.
Edit
If a program does not terminate (hangs, loops forever) on a given input, then its output is undefined.
If you agree on this definition, then determining whether a program terminates is the well-known "Halting Problem", which has been proven to be undecidable, i.e. there exists no program (Turing Machine, C program, C++ program, Pascal program, in whatever language) that can solve this problem in general.
Simply put: there exists no program P that can take as input any program Q and input data I and print as output TRUE if Q(I) terminates, or else print FALSE if Q(I) does not terminate.
For more information you can look at http://en.wikipedia.org/wiki/Halting_problem.
Undefined behaviour is undefined. The best you can do is conform to the standard pedantically, as others have suggested, however, you can not test for what is undefined, because you don't know what it is. If you knew what it was and standards specified it, it would not be undefined.
However, if you for some reason, do actually rely on what the standard says is undefined, and it results in a particular result, then you may choose to define it, and write some unit tests to confirm that for your particular build, it is defined. It is much better, however, to simply avoid undefined behaviour whenever possible.
Take a look at PCLint its pretty decent at detecting a lot of bad things in C++.
Here's a subset of what it catches