Why does 'undefined behaviour' exist? [duplicate] - c++

This question already has answers here:
Why is undefined behaviour allowed in C
(3 answers)
Closed 4 years ago.
Certain common programming languages, most notably C and C++, have the strong notion of undefined behaviour: When you attempt to perform certain operations outside of the way they are intended to be used, this causes undefined behaviour.
If undefined behaviour occurs, a compiler is allowed to do anything (including nothing at all, 'time traveling', etc.) it wants.
My question is: Why does this notion of undefined behaviour exist? As far as I can see, a huge load of bugs, programs that work one one version of a compiler stop working on the next, etc. would be prevented if instead of causing undefined behaviour, using the operations outside of their intended use would cause a compilation error.
Why is this not the way things are?

Why does this notion of undefined behaviour exist?
To allow the language / library to be implemented on a variety of different computer architectures as efficiently as possible (- and perhaps in the case of C - while allowing the implementation to remain simple).
if instead of causing undefined behaviour, using the operations outside of their intended use would cause a compilation error
In most cases of undefined behaviour, it is impossible - or prohibitively expensive in resources - to prove that undefined behaviour exists at compile time for all programs in general.
Some cases are possible to prove for some programs, but it's not possible to specify which of those cases are exhaustively, and so the standard won't attempt to do so. Nevertheless, some compilers are smart enough to recognize some simple cases of UB, and those compilers will warn the programmer about it. Example:
int arr[10];
return arr[10];
This program has undefined behaviour. A particular version of GCC that I tested shows:
warning: array subscript 10 is above array bounds of 'int [10]' [-Warray-bounds]
It's hardly a good idea to ignore a warning like this.
More typical alternative to having undefined behaviour would be to have defined error handling in such cases, such as throwing an exception (compare for example Java, where accessing a null reference causes an exception of type java.lang.NullPointerException to be thrown). But checking for the pre-conditions of well defined behaviour is slower than not checking it.
By not checking for pre-conditions, the language gives the programmer the option of proving the correctness themselves, and thereby avoiding the runtime overhead of the check in a program that was proven to not need it. Indeed, this power comes with a great responsibility.
These days the burden of proving the program's well-definedness can be somewhat alleviated by using tools (example) which add some of those runtime checks, and neatly terminate the program upon failed check.

Undefined behavior exists mainly to give the compiler freedom to optimize. One thing it allows the compiler to do, for example, is to operate under the assumption that certain things can't happen (without having to first prove that they can't happen, which would often be very difficult or impossible). By allowing it to assume that certain things can't happen, the compiler can then eliminate/does not have to generate code that would otherwise be needed to account for certain possibilities.
Good talk on the topic

Undefined behavior is mostly based on the target it is intended to run on. The compiler is not responsible for the dynamic behavior of the program or the static behavior for that matter. The compiler checks are restricted to the rules of the language and some modern compilers do some level of static analysis too.
A typical example would be uninitialized variables. It exists because of the syntax rules of C where a variable can be declared without init value. Some compilers assign 0 to such variables and some just assign a mem pointer to the variable and leave just like that. if program does not initialize these variables it leads to undefined behavior.

Related

C++: Empty string instead of input string Incase of having longer input string than char array allocated for the input [duplicate]

This question already has answers here:
Undefined, unspecified and implementation-defined behavior
(9 answers)
Closed 7 years ago.
The classic apocryphal example of "undefined behavior" is, of course, "nasal demons" — a physical impossibility, regardless of what the C and C++ standards permit.
Because the C and C++ communities tend to put such an emphasis on the unpredictability of undefined behavior and the idea that the compiler is allowed to cause the program to do literally anything when undefined behavior is encountered, I had assumed that the standard puts no restrictions whatsoever on the behavior of, well, undefined behavior.
But the relevant quote in the C++ standard seems to be:
[C++14: defns.undefined]: [..] Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). [..]
This actually specifies a small set of possible options:
Ignoring the situation -- Yes, the standard goes on to say that this will have "unpredictable results", but that's not the same as the compiler inserting code (which I assume would be a prerequisite for, you know, nasal demons).
Behaving in a documented manner characteristic of the environment -- this actually sounds relatively benign. (I certainly haven't heard of any documented cases of nasal demons.)
Terminating translation or execution -- with a diagnostic, no less. Would that all UB would behave so nicely.
I assume that in most cases, compilers choose to ignore the undefined behavior; for example, when reading uninitialized memory, it would presumably be an anti-optimization to insert any code to ensure consistent behavior. I suppose that the stranger types of undefined behavior (such as "time travel") would fall under the second category--but this requires that such behaviors be documented and "characteristic of the environment" (so I guess nasal demons are only produced by infernal computers?).
Am I misunderstanding the definition? Are these intended as mere examples of what could constitute undefined behavior, rather than a comprehensive list of options? Is the claim that "anything can happen" meant merely as an unexpected side-effect of ignoring the situation?
Two minor points of clarification:
I thought it was clear from the original question, and I think to most people it was, but I'll spell it out anyway: I do realize that "nasal demons" is tongue-in-cheek.
Please do not write an(other) answer explaining that UB allows for platform-specific compiler optimizations, unless you also explain how it allows for optimizations that implementation-defined behavior wouldn't allow.
This question was not intended as a forum for discussion about the (de)merits of undefined behavior, but that's sort of what it became. In any case, this thread about a hypothetical C-compiler with no undefined behavior may be of additional interest to those who think this is an important topic.
Yes, it permits anything to happen. The note is just giving examples. The definition is pretty clear:
Undefined behavior: behavior for which this International Standard imposes no requirements.
Frequent point of confusion:
You should understand that "no requirement" also means means the implementation is NOT required to leave the behavior undefined or do something bizarre/nondeterministic!
The implementation is perfectly allowed by the C++ standard to document some sane behavior and behave accordingly.1 So, if your compiler claims to wrap around on signed overflow, logic (sanity?) would dictate that you're welcome to rely on that behavior on that compiler. Just don't expect another compiler to behave the same way if it doesn't claim to.
1Heck, it's even allowed to document one thing and do another. That'd be stupid, and it'd probably make you toss it into the trash—why would you trust a compiler whose documentation lies to you?—but it's not against the C++ standard.
One of the historical purposes of Undefined Behavior was to allow for the possibility that certain actions may have different potentially-useful effects on different platforms. For example, in the early days of C, given
int i=INT_MAX;
i++;
printf("%d",i);
some compilers could guarantee that the code would print some particular value (for a two's-complement machine it would typically be INT_MIN), while others would guarantee that the program would terminate without reaching the printf. Depending upon the application requirements, either behavior could be useful. Leaving the behavior undefined meant that an application where abnormal program termination was an acceptable consequence of overflow but producing seemingly-valid-but-wrong output would not be, could forgo overflow checking if run on a platform which would reliably trap it, and an application where abnormal termination in case of overflow would not be acceptable, but producing arithmetically-incorrect output would be, could forgo overflow checking if run on a platform where overflows weren't trapped.
Recently, however, some compiler authors seem to have gotten into a contest to see who can most efficiently eliminate any code whose existence would not be mandated by the standard. Given, for example...
#include <stdio.h>
int main(void)
{
int ch = getchar();
if (ch < 74)
printf("Hey there!");
else
printf("%d",ch*ch*ch*ch*ch);
}
a hyper-modern compiler may conclude that if ch is 74 or greater, the computation of ch*ch*ch*ch*ch would yield Undefined Behavior, and as a
consequence the program should print "Hey there!" unconditionally regardless
of what character was typed.
Nitpicking: You have not quoted a standard.
These are the sources used to generate drafts of the C++ standard. These sources should not be considered an ISO publication, nor should documents generated from them unless officially adopted by the C++ working group (ISO/IEC JTC1/SC22/WG21).
Interpretation: Notes are not normative according to the ISO/IEC Directives Part 2.
Notes and examples integrated in the text of a document shall only be used for giving additional information intended to assist the understanding or use of the document. They shall not contain requirements ("shall"; see 3.3.1 and Table H.1) or any information considered indispensable for the use of the document e.g. instructions (imperative; see Table H.1), recommendations ("should"; see 3.3.2 and Table H.2) or permission ("may"; see Table H.3). Notes may be written as a statement of fact.
Emphasis mine. This alone rules out "comprehensive list of options". Giving examples however does count as "additional information intended to assist the understanding .. of the document".
Do keep in mind that the "nasal demon" meme is not meant to be taken literally, just as using a balloon to explain how universe expansion works holds no truth in physical reality. It's to illustrate that it's foolhardy to discuss what "undefined behavior" should do when it's permissible to do anything. Yes, this means that there isn't an actual rubber band in outer space.
The definition of undefined behaviour, in every C and C++ standard, is essentially that the standard imposes no requirements on what happens.
Yes, that means any outcome is permitted. But there are no particular outcomes that are required to happen, nor any outcomes that are required to NOT happen. It does not matter if you have a compiler and library that consistently yields a particular behaviour in response to a particular instance of undefined behaviour - such a behaviour is not required, and may change even in a future bugfix release of your compiler - and the compiler will still be perfectly correct according to each version of the C and C++ standards.
If your host system has hardware support in the form of connection to probes that are inserted in your nostrils, it is within the realms of possibility that an occurrence of undefined behaviour will cause undesired nasal effects.
I thought I'd answer just one of your points, since the other answers answer the general question quite well, but have left this unaddressed.
"Ignoring the situation -- Yes, the standard goes on to say that this will have "unpredictable results", but that's not the same as the compiler inserting code (which I assume would be a prerequisite for, you know, nasal demons)."
A situation in which nasal demons could very reasonably be expected to occur with a sensible compiler, without the compiler inserting ANY code, would be the following:
if(!spawn_of_satan)
printf("Random debug value: %i\n", *x); // oops, null pointer deference
nasal_angels();
else
nasal_demons();
A compiler, if it can prove that that *x is a null pointer dereference, is perfectly entitled, as part of some optimisation, to say "OK, so I see that they've dereferenced a null pointer in this branch of the if. Therefore, as part of that branch I'm allowed to do anything. So I can therefore optimise to this:"
if(!spawn_of_satan)
nasal_demons();
else
nasal_demons();
"And from there, I can optimise to this:"
nasal_demons();
You can see how this sort of thing can in the right circumstances prove very useful for an optimising compiler, and yet cause disaster. I did see some examples a while back of cases where actually it IS important for optimisation to be able to optimise this sort of case. I might try to dig them out later when I have more time.
EDIT: One example that just came from the depths of my memory of such a case where it's useful for optimisation is where you very frequently check a pointer for being NULL (perhaps in inlined helper functions), even after having already dereferenced it and without having changed it. The optimising compiler can see that you've dereferenced it and so optimise out all the "is NULL" checks, since if you've dereferenced it and it IS null, anything is allowed to happen, including just not running the "is NULL" checks. I believe that similar arguments apply to other undefined behaviour.
First, it is important to note that it is not only the behaviour of the user program that is undefined, it is the behaviour of the compiler that is undefined. Similarly, UB is not encountered at runtime, it is a property of the source code.
To a compiler writer, "the behaviour is undefined" means, "you do not have to take this situation into account", or even "you can assume no source code will ever produce this situation".
A compiler can do anything, intentionally or unintentionally, when presented with UB, and still be standard compliant, so yes, if you granted access to your nose...
Then, it is not always possible to know if a program has UB or not.
Example:
int * ptr = calculateAddress();
int i = *ptr;
Knowing if this can ever be UB or not would require knowing all possible values returned by calculateAddress(), which is impossible in the general case (See "Halting Problem"). A compiler has two choices:
assume ptr will always have a valid address
insert runtime checks to guarantee a certain behaviour
The first option produces fast programs, and puts the burden of avoiding undesired effects on the programmer, while the second option produces safer but slower code.
The C and C++ standards leave this choice open, and most compilers choose the first, while Java for example mandates the second.
Why is the behaviour not implementation-defined, but undefined?
Implementation-defined means (N4296, 1.9§2):
Certain aspects and operations of the abstract machine are described in this International Standard as
implementation-defined (for example,
sizeof(int)
). These constitute the parameters of the abstract machine. Each implementation shall include documentation describing its characteristics and behavior in these
respects.
Such documentation shall define the instance of the abstract machine that corresponds to that
implementation (referred to as the “corresponding instance” below).
Emphasis mine. In other words: A compiler-writer has to document exactly how the machine-code behaves, when the source code uses implementation-defined features.
Writing to a random non-null invalid pointer is one of the most unpredictable things you can do in a program, so this would require performance-reducing runtime-checks too.
Before we had MMUs, you could destroy hardware by writing to the wrong address, which comes very close to nasal demons ;-)
Undefined behavior is simply the result of a situation coming up that the writers of the specification did not foresee.
Take the idea of a traffic light. Red means stop, yellow means prepare for red, and green means go. In this example people driving cars are the implementation of the spec.
What happens if both green and red are on? Do you stop, then go? Do you wait until red turns off and it's just green? This is a case that the spec did not describe, and as a result, anything the drivers do is undefined behavior. Some people will do one thing, some another. Since there is no guarantee about what will happen you want to avoid this situation. The same applies to code.
One of the reasons for leaving behavior undefined is to allow the compiler to make whatever assumptions it wants when optimizing.
If there exists some condition that must hold if an optimization is to be applied, and that condition is dependent on undefined behavior in the code, then the compiler may assume that it's met, since a conforming program can't depend on undefined behavior in any way. Importantly, the compiler does not need to be consistent in these assumptions. (which is not the case for implementation-defined behavior)
So suppose your code contains an admittedly contrived example like the one below:
int bar = 0;
int foo = (undefined behavior of some kind);
if (foo) {
f();
bar = 1;
}
if (!foo) {
g();
bar = 1;
}
assert(1 == bar);
The compiler is free to assume that !foo is true in the first block and foo is true in the second, and thus optimize the entire chunk of code away. Now, logically either foo or !foo must be true, and so looking at the code, you would reasonably be able to assume that bar must equal 1 once you've run the code. But because the compiler optimized in that manner, bar never gets set to 1. And now that assertion becomes false and the program terminates, which is behavior that would not have happened if foo hadn't relied on undefined behavior.
Now, is it possible for the compiler to actually insert completely new code if it sees undefined behavior? If doing so will allow it to optimize more, absolutely. Is it likely to happen often? Probably not, but you can never guarantee it, so operating on the assumption that nasal demons are possible is the only safe approach.
Undefined behaviors allow compilers to generate faster code in some cases. Consider two different processor architectures that ADD differently:
Processor A inherently discards the carry bit upon overflow, while processor B generates an error. (Of course, Processor C inherently generates Nasal Demons - its just the easiest way to discharge that extra bit of energy in a snot-powered nanobot...)
If the standard required that an error be generated, then all code compiled for processor A would basically be forced to include additional instructions, to perform some sort of check for overflow, and if so, generate an error. This would result in slower code, even if the developer know that they were only going to end up adding small numbers.
Undefined behavior sacrifices portability for speed. By allowing 'anything' to happen, the compiler can avoid writing safety-checks for situations that will never occur. (Or, you know... they might.)
Additionally, when a programmer knows exactly what an undefined behavior will actually cause in their given environment, they are free to exploit that knowledge to gain additional performance.
If you want to ensure that your code behaves exactly the same on all platforms, you need to ensure that no 'undefined behavior' ever occurs - however, this may not be your goal.
Edit: (In respons to OPs edit)
Implementation Defined behavior would require the consistent generation of nasal demons. Undefined behavior allows the sporadic generation of nasal demons.
That's where the advantage that undefined behavior has over implementation specific behavior appears. Consider that extra code may be needed to avoid inconsistent behavior on a particular system. In these cases, undefined behavior allows greater speed.

Is there a safe version of C++ without undefined behaviour?

Undefined behaviour in C++ can be really hard to debug. Is there a version of C++ and standard library which does not contain any undefined behaviour but rather throws exceptions? I understand that this will be a performance killer, but I only intend to use this version when I am programming, debugging and compiling in debug mode and don't really care about performance. Ideally this version would be portable and you would be able to easily switch on/off the undefined behaviour checks.
For example, you could implement a safe pointer class like so (only check for null pointer, not actually if it points to a valid block of memory):
template <typename T>
class MySafePointer {
T* value;
public:
auto operator-> () {
#ifndef DEBUG_MODE
assert(value && "Trying to dereference a null pointer");
#endif
return value;
}
/* Other Stuff*/
};
Here the user only needs to #undef DEBUG_MODE if you want to get your performance back.
Is there a library / safe version of C++ which does this?
EDIT: Changed the code above so that it actually makes more sense and doesn't throw an exception but asserts value is non-null. The question is simply a matter of having a descriptive error message vs a crash...
Is there a version of c++ and standard library which does not contain any undefined behaviour but rather throws exceptions?
No, there is not. As mentioned in a comment, there are Address Sanitizer and Undefined Behavior Sanitizer and many other tools you can use to hunt for bugs, but there is no "C++ without undefined behavior" implementation.
If you want an inherently safe language, choose one. C++ isn't it.
Undefined behavior
Undefined behavior means that your program has ended up in a state the behavior of which is not defined by the standard.
So what you're really asking is if there's a language the standard of which defines every possible scenario.
And I can't think of one language like this, for the simple reason that programs are run by machines, but programming languages and standards and written by humans.
Is it always unintentional?
Per the reason explained above, the standard can have unintentional "holes", i.e. undefined behavior that was not intentionally allowed, and maybe not even noticed during standardization.
However, as all the "is undefined behavior" sentences in the standard prove, many times UB is intentionally allowed.
But why? Because that means giving less guarantees to the programmer, with the benefit of being able to make more optimizations or, equivalently, to not waste time verifying that the user is sticking to a defined contract.
So, even if the standard had no holes, there would still be a lot of cases where UB is stated to happen by the standard, because compilers can take advantage of it to make all sort of optmizations.²
The impact of preventing it in some trivial case
One trivial case of undefined behavior is when you access an out-of-bound element of a std::vector via operator[]. Exactly like for C-style arrays, v[i] basically gives you back *(v_ + i), where v_ is the pointer wrapped into v. This is fast and not safe.¹
What if you want to access the ith element safely? You would have to change the implementation of std::vector<>::operator[].
So what would the impact be of supporting the DEBUG_MODE flag? Essentially you would have to write two implementations separated by a #ifdef/(#else/)#endif. Obviously the two implementation can have a lot in common, so you could #-branch several times in the code. But... yeah, my bottom line is the your request can be fulfilled by changing the standard in such a way that it forces the implementers to support a two different implementations (safe and fast/unsafe and slow) for everything.
By the way, for this specific case, the standar does define another function, at, which is required to handle the out-of-bound case. But that's the point: it's another function.
Hypothetically, we could rip all undefined behavior out of C++ or even C. We could have everything be a priori well-defined and remove anything from the language whose evaluation could not be definitely determinable from first principles.
which makes me feel nervous about the answer I've given here.
(¹) This and other examples of UB are listed in this excellent article; search for Out of Bounds for the example I made.
(²) I really recommend reading this answer by Nicol Bolas about UB being absent in constexprs.
Is there a safe version of c++ without undefined behaviour?
No.
For example, you could implement a safe pointer class like so
How is throwing an exception safer than just crashing? You're still trying to find the bug so you can fix it statically, right?
What you wrote allows your buggy program to keep running (unless it just calls terminate, in which case you did some work for no result at all), but that doesn't make it correct, and it hides the error rather than helping you fix it.
Is there a library / safe version of C++ which does this?
Undefined behaviour is only one type of error, and it isn't always wrong. Deliberate use of non-portable platform features may also be undefined by the standard.
Anyway, let's say you catch every uninitialized value and null pointer and signed integer overflow - your program can still produce the wrong result.
If you write code that can't produce the wrong result, it won't have UB either.

Return By Reference to Local Reference From Function [duplicate]

This question already has answers here:
Undefined, unspecified and implementation-defined behavior
(9 answers)
Closed 7 years ago.
The classic apocryphal example of "undefined behavior" is, of course, "nasal demons" — a physical impossibility, regardless of what the C and C++ standards permit.
Because the C and C++ communities tend to put such an emphasis on the unpredictability of undefined behavior and the idea that the compiler is allowed to cause the program to do literally anything when undefined behavior is encountered, I had assumed that the standard puts no restrictions whatsoever on the behavior of, well, undefined behavior.
But the relevant quote in the C++ standard seems to be:
[C++14: defns.undefined]: [..] Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). [..]
This actually specifies a small set of possible options:
Ignoring the situation -- Yes, the standard goes on to say that this will have "unpredictable results", but that's not the same as the compiler inserting code (which I assume would be a prerequisite for, you know, nasal demons).
Behaving in a documented manner characteristic of the environment -- this actually sounds relatively benign. (I certainly haven't heard of any documented cases of nasal demons.)
Terminating translation or execution -- with a diagnostic, no less. Would that all UB would behave so nicely.
I assume that in most cases, compilers choose to ignore the undefined behavior; for example, when reading uninitialized memory, it would presumably be an anti-optimization to insert any code to ensure consistent behavior. I suppose that the stranger types of undefined behavior (such as "time travel") would fall under the second category--but this requires that such behaviors be documented and "characteristic of the environment" (so I guess nasal demons are only produced by infernal computers?).
Am I misunderstanding the definition? Are these intended as mere examples of what could constitute undefined behavior, rather than a comprehensive list of options? Is the claim that "anything can happen" meant merely as an unexpected side-effect of ignoring the situation?
Two minor points of clarification:
I thought it was clear from the original question, and I think to most people it was, but I'll spell it out anyway: I do realize that "nasal demons" is tongue-in-cheek.
Please do not write an(other) answer explaining that UB allows for platform-specific compiler optimizations, unless you also explain how it allows for optimizations that implementation-defined behavior wouldn't allow.
This question was not intended as a forum for discussion about the (de)merits of undefined behavior, but that's sort of what it became. In any case, this thread about a hypothetical C-compiler with no undefined behavior may be of additional interest to those who think this is an important topic.
Yes, it permits anything to happen. The note is just giving examples. The definition is pretty clear:
Undefined behavior: behavior for which this International Standard imposes no requirements.
Frequent point of confusion:
You should understand that "no requirement" also means means the implementation is NOT required to leave the behavior undefined or do something bizarre/nondeterministic!
The implementation is perfectly allowed by the C++ standard to document some sane behavior and behave accordingly.1 So, if your compiler claims to wrap around on signed overflow, logic (sanity?) would dictate that you're welcome to rely on that behavior on that compiler. Just don't expect another compiler to behave the same way if it doesn't claim to.
1Heck, it's even allowed to document one thing and do another. That'd be stupid, and it'd probably make you toss it into the trash—why would you trust a compiler whose documentation lies to you?—but it's not against the C++ standard.
One of the historical purposes of Undefined Behavior was to allow for the possibility that certain actions may have different potentially-useful effects on different platforms. For example, in the early days of C, given
int i=INT_MAX;
i++;
printf("%d",i);
some compilers could guarantee that the code would print some particular value (for a two's-complement machine it would typically be INT_MIN), while others would guarantee that the program would terminate without reaching the printf. Depending upon the application requirements, either behavior could be useful. Leaving the behavior undefined meant that an application where abnormal program termination was an acceptable consequence of overflow but producing seemingly-valid-but-wrong output would not be, could forgo overflow checking if run on a platform which would reliably trap it, and an application where abnormal termination in case of overflow would not be acceptable, but producing arithmetically-incorrect output would be, could forgo overflow checking if run on a platform where overflows weren't trapped.
Recently, however, some compiler authors seem to have gotten into a contest to see who can most efficiently eliminate any code whose existence would not be mandated by the standard. Given, for example...
#include <stdio.h>
int main(void)
{
int ch = getchar();
if (ch < 74)
printf("Hey there!");
else
printf("%d",ch*ch*ch*ch*ch);
}
a hyper-modern compiler may conclude that if ch is 74 or greater, the computation of ch*ch*ch*ch*ch would yield Undefined Behavior, and as a
consequence the program should print "Hey there!" unconditionally regardless
of what character was typed.
Nitpicking: You have not quoted a standard.
These are the sources used to generate drafts of the C++ standard. These sources should not be considered an ISO publication, nor should documents generated from them unless officially adopted by the C++ working group (ISO/IEC JTC1/SC22/WG21).
Interpretation: Notes are not normative according to the ISO/IEC Directives Part 2.
Notes and examples integrated in the text of a document shall only be used for giving additional information intended to assist the understanding or use of the document. They shall not contain requirements ("shall"; see 3.3.1 and Table H.1) or any information considered indispensable for the use of the document e.g. instructions (imperative; see Table H.1), recommendations ("should"; see 3.3.2 and Table H.2) or permission ("may"; see Table H.3). Notes may be written as a statement of fact.
Emphasis mine. This alone rules out "comprehensive list of options". Giving examples however does count as "additional information intended to assist the understanding .. of the document".
Do keep in mind that the "nasal demon" meme is not meant to be taken literally, just as using a balloon to explain how universe expansion works holds no truth in physical reality. It's to illustrate that it's foolhardy to discuss what "undefined behavior" should do when it's permissible to do anything. Yes, this means that there isn't an actual rubber band in outer space.
The definition of undefined behaviour, in every C and C++ standard, is essentially that the standard imposes no requirements on what happens.
Yes, that means any outcome is permitted. But there are no particular outcomes that are required to happen, nor any outcomes that are required to NOT happen. It does not matter if you have a compiler and library that consistently yields a particular behaviour in response to a particular instance of undefined behaviour - such a behaviour is not required, and may change even in a future bugfix release of your compiler - and the compiler will still be perfectly correct according to each version of the C and C++ standards.
If your host system has hardware support in the form of connection to probes that are inserted in your nostrils, it is within the realms of possibility that an occurrence of undefined behaviour will cause undesired nasal effects.
I thought I'd answer just one of your points, since the other answers answer the general question quite well, but have left this unaddressed.
"Ignoring the situation -- Yes, the standard goes on to say that this will have "unpredictable results", but that's not the same as the compiler inserting code (which I assume would be a prerequisite for, you know, nasal demons)."
A situation in which nasal demons could very reasonably be expected to occur with a sensible compiler, without the compiler inserting ANY code, would be the following:
if(!spawn_of_satan)
printf("Random debug value: %i\n", *x); // oops, null pointer deference
nasal_angels();
else
nasal_demons();
A compiler, if it can prove that that *x is a null pointer dereference, is perfectly entitled, as part of some optimisation, to say "OK, so I see that they've dereferenced a null pointer in this branch of the if. Therefore, as part of that branch I'm allowed to do anything. So I can therefore optimise to this:"
if(!spawn_of_satan)
nasal_demons();
else
nasal_demons();
"And from there, I can optimise to this:"
nasal_demons();
You can see how this sort of thing can in the right circumstances prove very useful for an optimising compiler, and yet cause disaster. I did see some examples a while back of cases where actually it IS important for optimisation to be able to optimise this sort of case. I might try to dig them out later when I have more time.
EDIT: One example that just came from the depths of my memory of such a case where it's useful for optimisation is where you very frequently check a pointer for being NULL (perhaps in inlined helper functions), even after having already dereferenced it and without having changed it. The optimising compiler can see that you've dereferenced it and so optimise out all the "is NULL" checks, since if you've dereferenced it and it IS null, anything is allowed to happen, including just not running the "is NULL" checks. I believe that similar arguments apply to other undefined behaviour.
First, it is important to note that it is not only the behaviour of the user program that is undefined, it is the behaviour of the compiler that is undefined. Similarly, UB is not encountered at runtime, it is a property of the source code.
To a compiler writer, "the behaviour is undefined" means, "you do not have to take this situation into account", or even "you can assume no source code will ever produce this situation".
A compiler can do anything, intentionally or unintentionally, when presented with UB, and still be standard compliant, so yes, if you granted access to your nose...
Then, it is not always possible to know if a program has UB or not.
Example:
int * ptr = calculateAddress();
int i = *ptr;
Knowing if this can ever be UB or not would require knowing all possible values returned by calculateAddress(), which is impossible in the general case (See "Halting Problem"). A compiler has two choices:
assume ptr will always have a valid address
insert runtime checks to guarantee a certain behaviour
The first option produces fast programs, and puts the burden of avoiding undesired effects on the programmer, while the second option produces safer but slower code.
The C and C++ standards leave this choice open, and most compilers choose the first, while Java for example mandates the second.
Why is the behaviour not implementation-defined, but undefined?
Implementation-defined means (N4296, 1.9§2):
Certain aspects and operations of the abstract machine are described in this International Standard as
implementation-defined (for example,
sizeof(int)
). These constitute the parameters of the abstract machine. Each implementation shall include documentation describing its characteristics and behavior in these
respects.
Such documentation shall define the instance of the abstract machine that corresponds to that
implementation (referred to as the “corresponding instance” below).
Emphasis mine. In other words: A compiler-writer has to document exactly how the machine-code behaves, when the source code uses implementation-defined features.
Writing to a random non-null invalid pointer is one of the most unpredictable things you can do in a program, so this would require performance-reducing runtime-checks too.
Before we had MMUs, you could destroy hardware by writing to the wrong address, which comes very close to nasal demons ;-)
Undefined behavior is simply the result of a situation coming up that the writers of the specification did not foresee.
Take the idea of a traffic light. Red means stop, yellow means prepare for red, and green means go. In this example people driving cars are the implementation of the spec.
What happens if both green and red are on? Do you stop, then go? Do you wait until red turns off and it's just green? This is a case that the spec did not describe, and as a result, anything the drivers do is undefined behavior. Some people will do one thing, some another. Since there is no guarantee about what will happen you want to avoid this situation. The same applies to code.
One of the reasons for leaving behavior undefined is to allow the compiler to make whatever assumptions it wants when optimizing.
If there exists some condition that must hold if an optimization is to be applied, and that condition is dependent on undefined behavior in the code, then the compiler may assume that it's met, since a conforming program can't depend on undefined behavior in any way. Importantly, the compiler does not need to be consistent in these assumptions. (which is not the case for implementation-defined behavior)
So suppose your code contains an admittedly contrived example like the one below:
int bar = 0;
int foo = (undefined behavior of some kind);
if (foo) {
f();
bar = 1;
}
if (!foo) {
g();
bar = 1;
}
assert(1 == bar);
The compiler is free to assume that !foo is true in the first block and foo is true in the second, and thus optimize the entire chunk of code away. Now, logically either foo or !foo must be true, and so looking at the code, you would reasonably be able to assume that bar must equal 1 once you've run the code. But because the compiler optimized in that manner, bar never gets set to 1. And now that assertion becomes false and the program terminates, which is behavior that would not have happened if foo hadn't relied on undefined behavior.
Now, is it possible for the compiler to actually insert completely new code if it sees undefined behavior? If doing so will allow it to optimize more, absolutely. Is it likely to happen often? Probably not, but you can never guarantee it, so operating on the assumption that nasal demons are possible is the only safe approach.
Undefined behaviors allow compilers to generate faster code in some cases. Consider two different processor architectures that ADD differently:
Processor A inherently discards the carry bit upon overflow, while processor B generates an error. (Of course, Processor C inherently generates Nasal Demons - its just the easiest way to discharge that extra bit of energy in a snot-powered nanobot...)
If the standard required that an error be generated, then all code compiled for processor A would basically be forced to include additional instructions, to perform some sort of check for overflow, and if so, generate an error. This would result in slower code, even if the developer know that they were only going to end up adding small numbers.
Undefined behavior sacrifices portability for speed. By allowing 'anything' to happen, the compiler can avoid writing safety-checks for situations that will never occur. (Or, you know... they might.)
Additionally, when a programmer knows exactly what an undefined behavior will actually cause in their given environment, they are free to exploit that knowledge to gain additional performance.
If you want to ensure that your code behaves exactly the same on all platforms, you need to ensure that no 'undefined behavior' ever occurs - however, this may not be your goal.
Edit: (In respons to OPs edit)
Implementation Defined behavior would require the consistent generation of nasal demons. Undefined behavior allows the sporadic generation of nasal demons.
That's where the advantage that undefined behavior has over implementation specific behavior appears. Consider that extra code may be needed to avoid inconsistent behavior on a particular system. In these cases, undefined behavior allows greater speed.

Recursion ( No return still works) [duplicate]

This question already has answers here:
Undefined, unspecified and implementation-defined behavior
(9 answers)
Closed 7 years ago.
The classic apocryphal example of "undefined behavior" is, of course, "nasal demons" — a physical impossibility, regardless of what the C and C++ standards permit.
Because the C and C++ communities tend to put such an emphasis on the unpredictability of undefined behavior and the idea that the compiler is allowed to cause the program to do literally anything when undefined behavior is encountered, I had assumed that the standard puts no restrictions whatsoever on the behavior of, well, undefined behavior.
But the relevant quote in the C++ standard seems to be:
[C++14: defns.undefined]: [..] Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). [..]
This actually specifies a small set of possible options:
Ignoring the situation -- Yes, the standard goes on to say that this will have "unpredictable results", but that's not the same as the compiler inserting code (which I assume would be a prerequisite for, you know, nasal demons).
Behaving in a documented manner characteristic of the environment -- this actually sounds relatively benign. (I certainly haven't heard of any documented cases of nasal demons.)
Terminating translation or execution -- with a diagnostic, no less. Would that all UB would behave so nicely.
I assume that in most cases, compilers choose to ignore the undefined behavior; for example, when reading uninitialized memory, it would presumably be an anti-optimization to insert any code to ensure consistent behavior. I suppose that the stranger types of undefined behavior (such as "time travel") would fall under the second category--but this requires that such behaviors be documented and "characteristic of the environment" (so I guess nasal demons are only produced by infernal computers?).
Am I misunderstanding the definition? Are these intended as mere examples of what could constitute undefined behavior, rather than a comprehensive list of options? Is the claim that "anything can happen" meant merely as an unexpected side-effect of ignoring the situation?
Two minor points of clarification:
I thought it was clear from the original question, and I think to most people it was, but I'll spell it out anyway: I do realize that "nasal demons" is tongue-in-cheek.
Please do not write an(other) answer explaining that UB allows for platform-specific compiler optimizations, unless you also explain how it allows for optimizations that implementation-defined behavior wouldn't allow.
This question was not intended as a forum for discussion about the (de)merits of undefined behavior, but that's sort of what it became. In any case, this thread about a hypothetical C-compiler with no undefined behavior may be of additional interest to those who think this is an important topic.
Yes, it permits anything to happen. The note is just giving examples. The definition is pretty clear:
Undefined behavior: behavior for which this International Standard imposes no requirements.
Frequent point of confusion:
You should understand that "no requirement" also means means the implementation is NOT required to leave the behavior undefined or do something bizarre/nondeterministic!
The implementation is perfectly allowed by the C++ standard to document some sane behavior and behave accordingly.1 So, if your compiler claims to wrap around on signed overflow, logic (sanity?) would dictate that you're welcome to rely on that behavior on that compiler. Just don't expect another compiler to behave the same way if it doesn't claim to.
1Heck, it's even allowed to document one thing and do another. That'd be stupid, and it'd probably make you toss it into the trash—why would you trust a compiler whose documentation lies to you?—but it's not against the C++ standard.
One of the historical purposes of Undefined Behavior was to allow for the possibility that certain actions may have different potentially-useful effects on different platforms. For example, in the early days of C, given
int i=INT_MAX;
i++;
printf("%d",i);
some compilers could guarantee that the code would print some particular value (for a two's-complement machine it would typically be INT_MIN), while others would guarantee that the program would terminate without reaching the printf. Depending upon the application requirements, either behavior could be useful. Leaving the behavior undefined meant that an application where abnormal program termination was an acceptable consequence of overflow but producing seemingly-valid-but-wrong output would not be, could forgo overflow checking if run on a platform which would reliably trap it, and an application where abnormal termination in case of overflow would not be acceptable, but producing arithmetically-incorrect output would be, could forgo overflow checking if run on a platform where overflows weren't trapped.
Recently, however, some compiler authors seem to have gotten into a contest to see who can most efficiently eliminate any code whose existence would not be mandated by the standard. Given, for example...
#include <stdio.h>
int main(void)
{
int ch = getchar();
if (ch < 74)
printf("Hey there!");
else
printf("%d",ch*ch*ch*ch*ch);
}
a hyper-modern compiler may conclude that if ch is 74 or greater, the computation of ch*ch*ch*ch*ch would yield Undefined Behavior, and as a
consequence the program should print "Hey there!" unconditionally regardless
of what character was typed.
Nitpicking: You have not quoted a standard.
These are the sources used to generate drafts of the C++ standard. These sources should not be considered an ISO publication, nor should documents generated from them unless officially adopted by the C++ working group (ISO/IEC JTC1/SC22/WG21).
Interpretation: Notes are not normative according to the ISO/IEC Directives Part 2.
Notes and examples integrated in the text of a document shall only be used for giving additional information intended to assist the understanding or use of the document. They shall not contain requirements ("shall"; see 3.3.1 and Table H.1) or any information considered indispensable for the use of the document e.g. instructions (imperative; see Table H.1), recommendations ("should"; see 3.3.2 and Table H.2) or permission ("may"; see Table H.3). Notes may be written as a statement of fact.
Emphasis mine. This alone rules out "comprehensive list of options". Giving examples however does count as "additional information intended to assist the understanding .. of the document".
Do keep in mind that the "nasal demon" meme is not meant to be taken literally, just as using a balloon to explain how universe expansion works holds no truth in physical reality. It's to illustrate that it's foolhardy to discuss what "undefined behavior" should do when it's permissible to do anything. Yes, this means that there isn't an actual rubber band in outer space.
The definition of undefined behaviour, in every C and C++ standard, is essentially that the standard imposes no requirements on what happens.
Yes, that means any outcome is permitted. But there are no particular outcomes that are required to happen, nor any outcomes that are required to NOT happen. It does not matter if you have a compiler and library that consistently yields a particular behaviour in response to a particular instance of undefined behaviour - such a behaviour is not required, and may change even in a future bugfix release of your compiler - and the compiler will still be perfectly correct according to each version of the C and C++ standards.
If your host system has hardware support in the form of connection to probes that are inserted in your nostrils, it is within the realms of possibility that an occurrence of undefined behaviour will cause undesired nasal effects.
I thought I'd answer just one of your points, since the other answers answer the general question quite well, but have left this unaddressed.
"Ignoring the situation -- Yes, the standard goes on to say that this will have "unpredictable results", but that's not the same as the compiler inserting code (which I assume would be a prerequisite for, you know, nasal demons)."
A situation in which nasal demons could very reasonably be expected to occur with a sensible compiler, without the compiler inserting ANY code, would be the following:
if(!spawn_of_satan)
printf("Random debug value: %i\n", *x); // oops, null pointer deference
nasal_angels();
else
nasal_demons();
A compiler, if it can prove that that *x is a null pointer dereference, is perfectly entitled, as part of some optimisation, to say "OK, so I see that they've dereferenced a null pointer in this branch of the if. Therefore, as part of that branch I'm allowed to do anything. So I can therefore optimise to this:"
if(!spawn_of_satan)
nasal_demons();
else
nasal_demons();
"And from there, I can optimise to this:"
nasal_demons();
You can see how this sort of thing can in the right circumstances prove very useful for an optimising compiler, and yet cause disaster. I did see some examples a while back of cases where actually it IS important for optimisation to be able to optimise this sort of case. I might try to dig them out later when I have more time.
EDIT: One example that just came from the depths of my memory of such a case where it's useful for optimisation is where you very frequently check a pointer for being NULL (perhaps in inlined helper functions), even after having already dereferenced it and without having changed it. The optimising compiler can see that you've dereferenced it and so optimise out all the "is NULL" checks, since if you've dereferenced it and it IS null, anything is allowed to happen, including just not running the "is NULL" checks. I believe that similar arguments apply to other undefined behaviour.
First, it is important to note that it is not only the behaviour of the user program that is undefined, it is the behaviour of the compiler that is undefined. Similarly, UB is not encountered at runtime, it is a property of the source code.
To a compiler writer, "the behaviour is undefined" means, "you do not have to take this situation into account", or even "you can assume no source code will ever produce this situation".
A compiler can do anything, intentionally or unintentionally, when presented with UB, and still be standard compliant, so yes, if you granted access to your nose...
Then, it is not always possible to know if a program has UB or not.
Example:
int * ptr = calculateAddress();
int i = *ptr;
Knowing if this can ever be UB or not would require knowing all possible values returned by calculateAddress(), which is impossible in the general case (See "Halting Problem"). A compiler has two choices:
assume ptr will always have a valid address
insert runtime checks to guarantee a certain behaviour
The first option produces fast programs, and puts the burden of avoiding undesired effects on the programmer, while the second option produces safer but slower code.
The C and C++ standards leave this choice open, and most compilers choose the first, while Java for example mandates the second.
Why is the behaviour not implementation-defined, but undefined?
Implementation-defined means (N4296, 1.9§2):
Certain aspects and operations of the abstract machine are described in this International Standard as
implementation-defined (for example,
sizeof(int)
). These constitute the parameters of the abstract machine. Each implementation shall include documentation describing its characteristics and behavior in these
respects.
Such documentation shall define the instance of the abstract machine that corresponds to that
implementation (referred to as the “corresponding instance” below).
Emphasis mine. In other words: A compiler-writer has to document exactly how the machine-code behaves, when the source code uses implementation-defined features.
Writing to a random non-null invalid pointer is one of the most unpredictable things you can do in a program, so this would require performance-reducing runtime-checks too.
Before we had MMUs, you could destroy hardware by writing to the wrong address, which comes very close to nasal demons ;-)
Undefined behavior is simply the result of a situation coming up that the writers of the specification did not foresee.
Take the idea of a traffic light. Red means stop, yellow means prepare for red, and green means go. In this example people driving cars are the implementation of the spec.
What happens if both green and red are on? Do you stop, then go? Do you wait until red turns off and it's just green? This is a case that the spec did not describe, and as a result, anything the drivers do is undefined behavior. Some people will do one thing, some another. Since there is no guarantee about what will happen you want to avoid this situation. The same applies to code.
One of the reasons for leaving behavior undefined is to allow the compiler to make whatever assumptions it wants when optimizing.
If there exists some condition that must hold if an optimization is to be applied, and that condition is dependent on undefined behavior in the code, then the compiler may assume that it's met, since a conforming program can't depend on undefined behavior in any way. Importantly, the compiler does not need to be consistent in these assumptions. (which is not the case for implementation-defined behavior)
So suppose your code contains an admittedly contrived example like the one below:
int bar = 0;
int foo = (undefined behavior of some kind);
if (foo) {
f();
bar = 1;
}
if (!foo) {
g();
bar = 1;
}
assert(1 == bar);
The compiler is free to assume that !foo is true in the first block and foo is true in the second, and thus optimize the entire chunk of code away. Now, logically either foo or !foo must be true, and so looking at the code, you would reasonably be able to assume that bar must equal 1 once you've run the code. But because the compiler optimized in that manner, bar never gets set to 1. And now that assertion becomes false and the program terminates, which is behavior that would not have happened if foo hadn't relied on undefined behavior.
Now, is it possible for the compiler to actually insert completely new code if it sees undefined behavior? If doing so will allow it to optimize more, absolutely. Is it likely to happen often? Probably not, but you can never guarantee it, so operating on the assumption that nasal demons are possible is the only safe approach.
Undefined behaviors allow compilers to generate faster code in some cases. Consider two different processor architectures that ADD differently:
Processor A inherently discards the carry bit upon overflow, while processor B generates an error. (Of course, Processor C inherently generates Nasal Demons - its just the easiest way to discharge that extra bit of energy in a snot-powered nanobot...)
If the standard required that an error be generated, then all code compiled for processor A would basically be forced to include additional instructions, to perform some sort of check for overflow, and if so, generate an error. This would result in slower code, even if the developer know that they were only going to end up adding small numbers.
Undefined behavior sacrifices portability for speed. By allowing 'anything' to happen, the compiler can avoid writing safety-checks for situations that will never occur. (Or, you know... they might.)
Additionally, when a programmer knows exactly what an undefined behavior will actually cause in their given environment, they are free to exploit that knowledge to gain additional performance.
If you want to ensure that your code behaves exactly the same on all platforms, you need to ensure that no 'undefined behavior' ever occurs - however, this may not be your goal.
Edit: (In respons to OPs edit)
Implementation Defined behavior would require the consistent generation of nasal demons. Undefined behavior allows the sporadic generation of nasal demons.
That's where the advantage that undefined behavior has over implementation specific behavior appears. Consider that extra code may be needed to avoid inconsistent behavior on a particular system. In these cases, undefined behavior allows greater speed.

What is "safe" and "unsafe" code in C/C++?

What is the difference between "safe" and "unsafe" code in C/C++?
I've read that "C++ is unsafe in ways which cause serious security vulnerabilities" in the article: How Rust Compares to Other Languages and More . What is insecure about unsafe code?
I believe John Regehr's article A Guide to Undefined Behavior in C and C++, Part 1 gives a good overview of what the article is getting at:
Programming languages typically make a distinction between normal
program actions and erroneous actions. For Turing-complete languages
we cannot reliably decide offline whether a program has the potential
to execute an error; we have to just run it and see.
In a safe programming language, errors are trapped as they happen.
Java, for example, is largely safe via its exception system. In an
unsafe programming language, errors are not trapped. Rather, after
executing an erroneous operation the program keeps going, but in a
silently faulty way that may have observable consequences later on.
Luca Cardelli’s article on type systems has a nice clear introduction
to these issues. C and C++ are unsafe in a strong sense: executing an
erroneous operation causes the entire program to be meaningless, as
opposed to just the erroneous operation having an unpredictable
result. In these languages erroneous operations are said to have
undefined behavior.
So once we go into the realm of undefined behavior we now have "unsafe" code. Another good article that covers undefined behavior as unsafe code is What Every C Programmer Should Know About Undefined Behavior #2/3:
In Part 1 of our series, we discussed what undefined behavior is, and
how it allows C and C++ compilers to produce higher performance
applications than "safe" languages. This post talks about how "unsafe"
C really is, explaining some of the highly surprising effects that
undefined behavior can cause. In Part #3, we talk about what friendly
compilers can do to mitigate some of the surprise, even if they aren't
required to.
I like to call this "Why undefined behavior is often a scary and
terrible thing for C programmers". :-)
The C and C++ are specified by their respective standards and we can find links to the latest ones here and those standard leave a lot of the behavior specified as undefined behavior. Which basically means the behavior is unpredictable. The C++ standard defined undefined behavior as follows:
behavior for which this International Standard imposes no requirements
[ Note: Undefined behavior may be expected when this International Standard omits any explicit definition of
behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior
ranges from ignoring the situation completely with unpredictable results, to behaving during translation or
program execution in a documented manner characteristic of the environment (with or without the issuance of
a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed.
—end note ]
The compiler is not required to provide a diagnostic for undefined behavior and we can find many cases where undefined behavior has lead to security vulnerabilities, one of the more well known cases is probably Linux kernel null pointer check removal:
The idea is to look for code that becomes dead when a C/C++ compiler
is smart about exploiting undefined behavior. The classic example of
this class of error was found in the Linux kernel several years ago.
The code was basically:
struct foo *s = ...;
int x = s->f;
if (!s) return ERROR;
... use s ...
The problem is that the dereference of s in line 2 permits a compiler
to infer that s is not null (if the pointer is null then the function
is undefined; the compiler can simply ignore this case). Thus, the
null check in line 3 gets silently optimized away and now the kernel
contains an exploitable bug if an attacker can find a way to invoke
this code with a null pointer
Most of the time avoiding undefined behavior is a matter of good coding practice:
Don't dereference null pointers
Don't access arrays outside their bounds
Don't use uninitialized variables
etc...
and using the right tools such as ubsan but there can be some obscure cases such as infinite loops that many developers may find surprising.