C++ null-terminated-string on specific compiler/platform problems

C++ null-terminated-string on specific compiler/platform problems - c++

I've found some code like (many problems in the following code):
//setup consistent in each of the bad code examples
string someString;
char* nullValue = getenv("NONEXISTENT"); // some non-existent environment variable
// bad code example 1:
char x[1024];
sprintf(x," some text%s ", nullValue); //crashes on solaris, what about linux?
// bad code example 2:
someString += nullValue; // What happens here?
//bad code example 3:
someString.append(nullValue); // What happens here?
//bad code example 4:
string nextString=string(nullValue); //What happens here?
cout<<nextString;
We're using solaris, linux, gcc, sunstudio, and will quite possibly use clang++ in the future. Is the behaviour of this code consistent across platform and compiler? I couldn't find specs that describe expected behaviour in all the cases of the above code.
At present, we have problems running our code using gcc (and on linux), is the above code a likely cause?
If the code above acts the same in all of these environments, that's valuable information (even if the behavior is a crash) for me because I will know that this isn't the reason for our linux problems.

In general these uses of NULL, where a valid C string is expected, cause undefined behavior, which means that anything can happen.
Some platforms try to have defined behavior for this. IIRC there are platforms that deal gracefully with passing a NULL pointer to printf family functions for a %s format substitution (printing something like "(null)"). Other than that, some platforms try to ensure a reproducible crash (e.g. a fatal signal) for such cases. But you can't rely on this in general.
If you have problems in that area of the code: yes, this is a likely cause or may obscure other causes, so: fix it, it's broken!

There is a problem constructing strings from the pointer, without checking the return value first. The getenv definition says:
Retrieves a C string containing the value of the environment variable whose name is specified as argument. If the requested variable is not part of the environment list, the function returns a null pointer.
Creating a std::string from a null pointer is explicitly not allowed by C++ standard. The same goes for appending to the string (+=).
I'm no C expert, but have a hunch that using a null pointer with sprintf is not allowed either.

Exactly what happens when you use a NULL pointer in any of the cases you have described is "undefined behaviour". Some C libraries do recognise NULL for strings in printf, and will print "(null)" or something along those lines, but I would definitely not rely on that. Similarly, your other usages of NULL are "undefined", which means they are guaranteed to not work in any particular way across a range of platforms. What happens on one platform may well be completely different to what happens on another platform (or with another brand/version of the compiler, or with different compiler optimisation settings, or which way the wind blows that day if you are unlucky). In this case, it's likely that it leads to either a crash or "well behaved code" - it depends on who wrote the C/C++ library.
One solution, if you have a few of these things is to create a "getenv_safe" that instead of returning NULL returns an empty string [or "not set" or similar] if the environment variable isn't set, and then either fix the code directly, or #define getenv(x) getenv_safe(x).

Related

How c++ is compiled? (Regarding variable declaration)

So I went through this video - https://youtu.be/e4ax90XmUBc
Now, my doubt is that if C++ is compiled language, that is, it goes through the entire code and translates it, then if I do something like
void main() {
int a;
cout<<"This is a number = "<<a; //This will give an error (Why?)
a = 10;
}
Now, answer for this would be that I have not defined the value for a, which I learned in school. But if a compiler goes through the entire code and then translates it then I think it shouldn't give any error.
But by giving an error like this, it looks to me as if C++ is a interpreted language.
Can anyone put some light on this and help me solve my dilemma here?

Technically, the C++ standard doesn't mandate that the compiler has to compile C++ into machine code. As an example LLVM Clang first compiles it to IR (Intermediate Representation) and only then to machine code.
Similarly, a compiler could embed a copy of itself in a program that it compiles and then, when the program is executed compile the program, immediately invoke it and delete the executable afterwards which in practice would be very similar to the program being interpreted. In practice, all widely used C++ compilers parse and assemble programs beforehand.
Regarding your example, the statement "This will give an error" is a bit ambiguous. I'm not sure if you're saying that you're getting a compile-time error or a runtime error. As such, I will discuss both possibilities.
If you're getting a compile time error, then your compiler has noticed that your program has undefined behaviour. This is something that you always want to avoid (in some cases, such as when your application operates outside the scope of the C++ Standard, such as when interfacing with certain hardware, UB occurs by definition, as certain behaviour is not defined by the Standard). This is a simple form of static analysis. The Standard doesn't mandate the your compiler informs you of this error and it would usually be a runtime error, but your compiler informed you anyway because it noticed that you probably made a mistake. For example on g++ such behaviour could be achieved by using the -Wall -Werror flags.
In the case of the error being a runtime error then you're most likely seeing a message like "Memory Access Violation" (on Windows) or "Signal 11" (on Linux). This is due to the fact that your program accessed uninitialized memory which is Undefined Behaviour.
In practice, you wouldn't most likely get any error at all at runtime. Unless the compiler has embedded dynamic checks in your program, it would just silently print a (seemingly) random value and continue. The value comes from uninitialized memory.
Side note: main returns int rather than void. Also using namespace std; considered harmful.

"If you've written a compiler test, you've written a call to main"

Calling maininside your program violates the C++ Standard
void f()
{
main(); // an endless loop calling main ? No that's not allowed
}
int main()
{
static int = 0;
std::cout << i++ << std::endl;
f();
}
In a lecture Chandler Carruth, at about '22.40' says
if you've written a compiler test you've written a call to main
How is this relevant or how the fact that the Standard doesn't allow is overcome ?

The point here is that if you write compiler test-code, you probably will want to test calling main with a few different parameter sets and that it is possible to do this, with the understanding of the compiler you are going to test on.
The standard forbids calls to main so that main can have magical code (e.g. the code to construct global objects or to initialize some data structure, zero out global uninitialized POD data, etc.). But if you are writing test code for a compiler, you probably will have an understanding of whether the compiler does this - and if so, what it actually does in such a step, and take that into account in your testing - you could for example "dirty" some global variable, and then call main again and see that this variable is indeed set to zero again. Or it could be that main is indeed not callable in this particular compiler.
Since Chandler is talking about LLVM (and in C terms, Clang), he knows how that compiler produces code for main.
This clearly doesn't apply to "black box testing" of compilers. In such a test-suite, you could not rely on the compiler doing anything in particular, or NOT doing something that would harm your test.
Like ALL undefined behaviour, it is not guaranteed to work in any particular way, but SOMETIMES, if you know the actual implementation of the compiler, it will be possible to exploit that behaviour - just don't consider it good programming, and don't expect it to work in a portable way.
As an example, on a PC, you can write to the text-screen (before the MMU has been configured at least) by doing this:
char *ptr = (char *)0xA0000;
ptr[0] = 'a';
ptr[1] = 7; // Determines the colour.
This, by the standard, is undefined behaviour, because the standard does say that you can only use pointers to allocations made inside the C or C++ runtime. But clearly, you can't allocate memory in the graphics card... So technically, it's UB, but guess what Linux and Windows do during early boot? Write directly to the VGA memory... [Or at least they used to some time ago, when I last looked at it]. And if you know your hardware, this should work with every compiler I'm aware of - if it doesn't, you probably can't use it to write low-level driver code. But it is undefined by the standard, and "UB sanitizer" will probably moan at the code.

What makes this usage of pointers unpredictable?

I'm currently learning pointers and my professor provided this piece of code as an example:
//We cannot predict the behavior of this program!
#include <iostream>
using namespace std;
int main()
{
char * s = "My String";
char s2[] = {'a', 'b', 'c', '\0'};
cout << s2 << endl;
return 0;
}
He wrote in the comments that we can't predict the behavior of the program. What exactly makes it unpredictable though? I see nothing wrong with it.

The behaviour of the program is non-existent, because it is ill-formed.
char* s = "My String";
This is illegal. Prior to 2011, it had been deprecated for 12 years.
The correct line is:
const char* s = "My String";
Other than that, the program is fine. Your professor should drink less whiskey!

The answer is: it depends on what C++ standard you're compiling against. All the code is perfectly well-formed across all standards‡ with the exception of this line:
char * s = "My String";
Now, the string literal has type const char[10] and we're trying to initialize a non-const pointer to it. For all other types other than the char family of string literals, such an initialization was always illegal. For example:
const int arr[] = {1};
int *p = arr; // nope!
However, in pre-C++11, for string literals, there was an exception in §4.2/2:
A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; [...]. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ]
So in C++03, the code is perfectly fine (though deprecated), and has clear, predictable behavior.
In C++11, that block does not exist - there is no such exception for string literals converted to char*, and so the code is just as ill-formed as the int* example I just provided. The compiler is obligated to issue a diagnostic, and ideally in cases such as this that are clear violations of the C++ type system, we would expect a good compiler to not just be conforming in this regard (e.g. by issuing a warning) but to fail outright.
The code should ideally not compile - but does on both gcc and clang (I assume because there's probably lots of code out there that would be broken with little gain, despite this type system hole being deprecated for over a decade). The code is ill-formed, and thus it does not make sense to reason about what the behavior of the code might be. But considering this specific case and the history of it being previously allowed, I do not believe it to be an unreasonable stretch to interpret the resulting code as if it were an implicit const_cast, something like:
const int arr[] = {1};
int *p = const_cast<int*>(arr); // OK, technically
With that, the rest of the program is perfectly fine, as you never actually touch s again. Reading a created-const object via a non-const pointer is perfectly OK. Writing a created-const object via such a pointer is undefined behavior:
std::cout << *p; // fine, prints 1
*p = 5; // will compile, but undefined behavior, which
// certainly qualifies as "unpredictable"
As there is no modification via s anywhere in your code, the program is fine in C++03, should fail to compile in C++11 but does anyway - and given that the compilers allow it, there's still no undefined behavior in it†. With allowances that the compilers are still [incorrectly] interpreting the C++03 rules, I see nothing that would lead to "unpredictable" behavior. Write to s though, and all bets are off. In both C++03 and C++11.
†Though, again, by definition ill-formed code yields no expectation of reasonable behavior
‡Except not, see Matt McNabb's answer

Other answers have covered that this program is ill-formed in C++11 due to the assignment of a const char array to a char *.
However the program was ill-formed prior to C++11 also.
The operator<< overloads are in <ostream>. The requirement for iostream to include ostream was added in C++11.
Historically, most implementations had iostream include ostream anyway, perhaps for ease of implementation or perhaps in order to provide a better QoI.
But it would be conforming for iostream to only define the ostream class without defining the operator<< overloads.

The only slightly wrong thing that I see with this program is that you're not supposed to assign a string literal to a mutable char pointer, though this is often accepted as a compiler extension.
Otherwise, this program appears well-defined to me:
The rules that dictate how character arrays become character pointers when passed as parameters (such as with cout << s2) are well-defined.
The array is null-terminated, which is a condition for operator<< with a char* (or a const char*).
#include <iostream> includes <ostream>, which in turn defines operator<<(ostream&, const char*), so everything appears to be in place.

You can't predict the behaviour of the compiler, for reasons noted above. (It should fail to compile, but may not.)
If compilation succeeds, then the behaviour is well-defined. You certainly can predict the behaviour of the program.
If it fails to compile, there is no program. In a compiled language, the program is the executable, not the source code. If you don't have an executable, you don't have a program, and you can't talk about behaviour of something that doesn't exist.
So I'd say your prof's statement is wrong. You can't predict the behaviour of the compiler when faced with this code, but that's distinct from the behaviour of the program. So if he's going to pick nits, he'd better make sure he's right. Or, of course, you might have misquoted him and the mistake is in your translation of what he said.

As others have noted, the code is illegitimate under C++11, although it was valid under earlier versions. Consequently, a compiler for C++11 is required to issue at least one diagnostic, but behavior of the compiler or the remainder of the build system is unspecified beyond that. Nothing in the Standard would forbid a compiler from exiting abruptly in response to an error, leaving a partially-written object file which a linker might think was valid, yielding a broken executable.
Although a good compiler should always ensure before it exits that any object file it is expected to have produced will be either valid, non-existent, or recognizable as invalid, such issues fall outside the jurisdiction of the Standard. While there have historically been (and may still be) some platforms where a failed compilation can result in legitimate-appearing executable files that crash in arbitrary fashion when loaded (and I've had to work with systems where link errors often had such behavior), I would not say that the consequences of syntax errors are generally unpredictable. On a good system, an attempted build will generally either produce an executable with a compiler's best effort at code generation, or won't produce an executable at all. Some systems will leave behind the old executable after a failed build, since in some cases being able to run the last successful build may be useful, but that can also lead to confusion.
My personal preference would be for disk-based systems to to rename the output file, to allow for the rare occasions when that executable would be useful while avoiding the confusion that can result from mistakenly believing one is running new code, and for embedded-programming systems to allow a programmer to specify for each project a program that should be loaded if a valid executable is not available under the normal name [ideally something which which safely indicates the lack of a useable program]. An embedded-systems tool-set would generally have no way of knowing what such a program should do, but in many cases someone writing "real" code for a system will have access to some hardware-test code that could easily be adapted to the purpose. I don't know that I've seen the renaming behavior, however, and I know that I haven't seen the indicated programming behavior.

GCC pragma to add/remove compiler options in a source file

I have developed a cross-platform library which makes fair use of type-punning in socket communications. This library is already being used in a number of projects, some of which I may not be aware of.
Using this library incorrectly can result in dangerously Undefined Behavior. I would like to ensure to the best of my ability that this library is being used properly.
Aside from documentation of course, under G++ the best way I'm aware of to do that is to use the -fstrict_aliasing and -Wstrict-aliasing options.
Is there a way under GCC to apply these options at a source file level?
In other words, I'd like to write something like the following:
MyFancyLib.h
#ifndef MY_FANCY_LIB_H
#define MY_FANCY_LIB_H
#pragma (something that pushes the current compiler options)
#pragma (something to set -fstrict_aliasing and -Wstrict-aliasing)
// ... my stuff ...
#pragma (something to pop the compiler options)
#endif
Is there a way?

I rather dislike nay-sayers. You can see an excellent post at this page: https://www.codingame.com/playgrounds/58302/using-pragma-for-compile-optimization
All the other answers clearly have nothing to do with the question so here is the actual documentation for GCC:
https://gcc.gnu.org/onlinedocs/gcc/Pragmas.html
Other compilers will have their own methods so you will need to look those up and create some macros to handle this.
Best of luck. Sorry that it took you 10 years to get any relevant answer.

Let's start with what I think is a false premise:
Using this library incorrectly can result in dangerously Undefined Behavior. I would like to ensure to the best of my ability that this library is being used properly.
If your library does type punning in a way that -fstrict-aliasing breaks, then it has undefined behavior according to the C++ standard regardless of what compiler flags are passed. The fact that the program seems to work on certain compilers when compiled with certain flags (in particular, -fno-strict-aliasing) does not change that.
Therefore, the best solution is to do what Florian said: change the code so it conforms to the C++ language specification. Until you do that, you're perpetually on thin ice.
"Yes, yes", you say, "but until then, what can I do to mitigate the problem?"
I recommend including a run-time check, used during library initialization, to detect the condition of having been compiled in a way that will cause it to misbehave. For example:
// Given two pointers to the *same* address, return 1 if the compiler
// is behaving as if -fstrict-aliasing is specified, and 0 if not.
//
// Based on https://blog.regehr.org/archives/959 .
static int sae_helper(int *h, long *k)
{
// Write a 1.
*h = 1;
// Overwrite it with all zeroes using a pointer with a different type.
// With naive semantics, '*h' is now 0. But when -fstrict-aliasing is
// enabled, the compiler will think 'h' and 'k' point to different
// memory locations ...
*k = 0;
// ... and therefore will optimize this read as 1.
return *h;
}
int strict_aliasing_enabled()
{
long k = 0;
// Undefined behavior! But we're only doing this because other
// code in the library also has undefined behavior, and we want
// to predict how that code will behave.
return sae_helper((int*)&k, &k);
}
(The above is C rather than C++ just to ease use in both languages.)
Now in your initialization routine, call strict_aliasing_enabled(), and if it returns 1, bail out immediately with an error message saying the library has been compiled incorrectly. This will help protect end users from misbehavior and alert the developers of the client programs that they need to fix their build.
I have tested this code with gcc-5.4.0 and clang-8.0.1. When -O2 is passed, strict_aliasing_enabled() returns 1. When -O2 -fno-strict-aliasing is passed, that function returns 0.
But let me emphasize again: my code has undefined behavior! There is (can be) no guarantee it will work. A standard-conforming C++ compiler could compile it into code that returns 0, crashes, or that initiates Global Thermonuclear War! Which is also true of the code you're presumably already using elsewhere in the library if you need -fno-strict-aliasing for it to behave as intended.

You can try the Diagnostic pragmas and change the level in error for your warnings. More details here:
http://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Pragmas.html

If your library is a header-only library, I think the only way to deal with this is to fix the strict aliasing violations. If the violations occur between types you define, you can use the usual tricks involving unions, or the may_alias type attribute. If your library uses the predefined sockaddr types, this could be difficult.

A C++ implementation that detects undefined behavior?

A huge number of operations in C++ result in undefined behavior, where the spec is completely mute about what the program's behavior ought to be and allows for anything to happen. Because of this, there are all sorts of cases where people have code that compiles in debug but not release mode, or that works until a seemingly unrelated change is made, or that works on one machine but not another, etc.
My question is whether there is a utility that looks at the execution of C++ code and flags all instances where the program invokes undefined behavior. While it's nice that we have tools like valgrind and checked STL implementations, these aren't as strong as what I'm thinking about - valgrind can have false negatives if you trash memory that you still have allocated, for example, and checked STL implementations won't catch deleting through a base class pointer.
Does this tool exist? Or would it even be useful to have it lying around at all?
EDIT: I am aware that in general it is undecidable to statically check whether a C++ program may ever execute something that has undefined behavior. However, it is possible to determine whether a specific execution of a C++ produced undefined behavior. One way to do this would be to make a C++ interpreter that steps through the code according to the definitions set out in the spec, at each point determining whether or not the code has undefined behavior. This won't detect undefined behavior that doesn't occur on a particular program execution, but it will find any undefined behavior that actually manifests itself in the program. This is related to how it is Turing-recognizable to determine if a TM accepts some input, even if it's still undecidable in general.
Thanks!

This is a great question, but let me give an idea for why I think it might be impossible (or at least very hard) in general.
Presumably, such an implementation would almost be a C++ interpreter, or at least a compiler for something more like Lisp or Java. It would need to keep extra data for each pointer to ensure you did not perform arithmetic outside of an array or dereference something that was already freed or whatever.
Now, consider the following code:
int *p = new int;
delete p;
int *q = new int;
if (p == q)
*p = 17;
Is the *p = 17 undefined behavior? On the one hand, it dereferences p after it has been freed. On the other hand, dereferencing q is fine and p == q...
But that is not really the point. The point is that whether the if evaluates to true at all depends on the details of the heap implementation, which can vary from implementation to implementation. So replace *p = 17 by some actual undefined behavior, and you have a program that might very well blow up on a normal compiler but run fine on your hypothetical "UB detector". (A typical C++ implementation will use a LIFO free list, so the pointers have a good chance of being equal. A hypothetical "UB detector" might work more like a garbage collected language in order to detect use-after-free problems.)
Put another way, the existence of merely implementation-defined behavior makes it impossible to write a "UB detector" that works for all programs, I suspect.
That said, a project to create an "uber-strict C++ compiler" would be very interesting. Let me know if you want to start one. :-)

John Regehr in Finding Undefined Behavior Bugs by Finding Dead Code points out a tool called STACK and I quote from the site (emphasis mine):
Optimization-unstable code (unstable code for short) is an emerging class of software bugs: code that is unexpectedly eliminated by compiler optimizations due to undefined behavior in the program. Unstable code is present in many systems, including the Linux kernel and the Postgres database server. The consequences of unstable code range from incorrect functionality to missing security checks.
STACK is a static checker that detects unstable code in C/C++ programs. Applying STACK to widely used systems has uncovered 160 new bugs that have been confirmed and fixed by developers.
Also in C++11 for the case of constexpr variables and functions undefined behavior should be caught at compile time.
We also have gcc ubsan:
GCC recently (version 4.9) gained Undefined Behavior Sanitizer
(ubsan), a run-time checker for the C and C++ languages. In order to
check your program with ubsan, compile and link the program with
-fsanitize=undefined option. Such instrumented binaries have to be executed; if ubsan detects any problem, it outputs a “runtime error:”
message, and in most cases continues executing the program.
and Clang Static Analyzer which includes many checks for undefined behavior. For example clangs -fsanitize checks which includes -fsanitize=undefined:
-fsanitize=undefined: Fast and compatible undefined behavior checker. Enables the undefined behavior checks that have small runtime cost and
no impact on address space layout or ABI. This includes all of the
checks listed below other than unsigned-integer-overflow.
and for C we can look at his article It’s Time to Get Serious About Exploiting Undefined Behavior which says:
[..]I confess to not personally having the gumption necessary for cramming GCC or LLVM through the best available dynamic undefined behavior checkers: KCC and Frama-C.[...]
Here is a link to kcc and I quote:
[...]If you try to run a program that is undefined (or one for which we are missing semantics), the program will get stuck. The message should tell you where it got stuck and may give a hint as to why. If you want help deciphering the output, or help understanding why the program is undefined, please send your .kdump file to us.[...]
and here are a link to Frama-C, an article where the first use of Frama-C as a C interpreter is described and an addendum to the article.

Using g++
-Wall -Werror -pedantic-error
(preferably with an appropriate -std argument as well) will pick up quite a few case of U.B.
Things that -Wall gets you include:
-pedantic
Issue all the warnings demanded by strict ISO C and ISO C++; reject
all programs that use forbidden extensions, and some other programs
that do not follow ISO C and ISO C++. For ISO C, follows the
version of the ISO C standard specified by any -std option used.
-Winit-self (C, C++, Objective-C and Objective-C++ only)
Warn about uninitialized variables which are initialized with
themselves. Note this option can only be used with the
-Wuninitialized option, which in turn only works with -O1 and
above.
-Wuninitialized
Warn if an automatic variable is used without first being
initialized or if a variable may be clobbered by a "setjmp" call.
and various disallowed things you can do with specifiers to printf and scanf family functions.

Clang has a suite of sanitizers that catch various forms of undefined behavior. Their eventual goal is to be able to catch all C++ core language undefined behavior, but checks for a few tricky forms of undefined behavior are missing right now.
For a decent set of sanitizers, try:
clang++ -fsanitize=undefined,address
-fsanitize=address checks for use of bad pointers (not pointing to valid memory), and -fsanitize=undefined enables a set of lightweight UB checks (integer overflow, bad shifts, misaligned pointers, ...).
-fsanitize=memory (for detecting uninitialized memory reads) and -fsanitize=thread (for detecting data races) are also useful, but neither of these can be combined with -fsanitize=address nor with each other because all three have an invasive impact on the program's address space.

You might want to read about SAFECode.
This is a research project from the University of Illinois, the goal is stated on the front page (linked above):
The purpose of the SAFECode project is to enable program safety without garbage collection and with minimal run-time checks using static analysis when possible and run-time checks when necessary. SAFECode defines a code representation with minimal semantic restrictions designed to enable static enforcement of safety, using aggressive compiler techniques developed in this project.
What is really interesting to me is the elimination of the runtime checks whenever the program can be proved to be correct statically, for example:
int array[N];
for (i = 0; i != N; ++i) { array[i] = 0; }
Should not incur any more overhead than the regular version.
In a lighter fashion, Clang has some guarantees about undefined behavior too as far as I recall, but I cannot get my hands on it...

The clang compiler can detect some undefined behaviors and warn against them. Probably not as complete as you want, but it's definitely a good start.

Unfortunately I'm not aware of any such tool. Typically UB is defined as such precisely because it would be hard or impossible for a compiler to diagnose it in all cases.
In fact your best tool is probably compiler warnings: They often warn about UB type items (for example, non-virtual destructor in base classes, abusing the strict-aliasing rules, etc).
Code review can also help catch cases where UB is relied upon.
Then you have to rely on valgrind to capture the remaining cases.

Just as a side observation, according to the theory of computability, you cannot have a program that detects all possible undefined behaviours.
You can only have tools that use heuristics and detect some particular cases that follow certain patterns. Or you can in certain cases prove that a program behaves as you want. But you cannot detect undefined behaviour in general.
Edit
If a program does not terminate (hangs, loops forever) on a given input, then its output is undefined.
If you agree on this definition, then determining whether a program terminates is the well-known "Halting Problem", which has been proven to be undecidable, i.e. there exists no program (Turing Machine, C program, C++ program, Pascal program, in whatever language) that can solve this problem in general.
Simply put: there exists no program P that can take as input any program Q and input data I and print as output TRUE if Q(I) terminates, or else print FALSE if Q(I) does not terminate.
For more information you can look at http://en.wikipedia.org/wiki/Halting_problem.

Undefined behaviour is undefined. The best you can do is conform to the standard pedantically, as others have suggested, however, you can not test for what is undefined, because you don't know what it is. If you knew what it was and standards specified it, it would not be undefined.
However, if you for some reason, do actually rely on what the standard says is undefined, and it results in a particular result, then you may choose to define it, and write some unit tests to confirm that for your particular build, it is defined. It is much better, however, to simply avoid undefined behaviour whenever possible.

Take a look at PCLint its pretty decent at detecting a lot of bad things in C++.
Here's a subset of what it catches

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js