Is it possible to have UB by default? - c++

So, I've been reading the C++ standard and came to [defns.undefined] (3.27 in this C++17 draft that I'm reading. Note that while I'm citing C++17 here, I've found similar wording in other standards)--that is the definition of Undefined Behavior. I noticed this wording (emphasis mine):
Note: Undefined behavior may be expected when this International Standard omits any explicit definition of
behavior or when a program uses an erroneous construct or erroneous data
Now, thinking about this, this sort of makes sense. It's sort of saying that if the Standard doesn't give a behavior for it, it has undefined behavior. It seems to be saying that if you do something that is out of scope of the Standard, the Standard has nothing to say about it. That makes sense.
However, this is also kind of weird, because I always thought Undefined Behavior had to be explicitly declared by the Standard. Yet, this seems to imply that we should assume Undefined Behavior unless we are told otherwise.
If this is the case, then couldn't there be instances of Undefined Behavior that are Undefined Behavior because the Standard didn't explicitly give a behavior for some construct? And if such a thing is possible, could is it possible to generate an example (that would still compile) of Undefined Behavior that is Undefined Behavior because of this wording, or would anything that fall under this be near impossible to construct for some reason?

If this is the case, then couldn't there be instances of Undefined Behavior that are Undefined Behavior because the Standard didn't explicitly give a behavior for some construct?
I think this is the correct point of view. If the standard "accidentally" omits a specification of how a particular construct behaves, but it's something that we all know "should" be well-defined, then it's a defect in the standard and needs to be fixed. If, on the other hand, it's a construct that "should" be UB, then the standard is already "correct" (although there are benefits to being explicit).
For example, the standard fails to mention what happens if typeid is applied to an lvalue of polymorphic class type if the object's constructor has not yet begun executing or the destructor has completed. Therefore, the behaviour is undefined by omission. It's also something that's "obviously" UB. So there is no problem.

is it possible to generate an example (that would still compile) of Undefined Behavior that is Undefined Behavior because of this wording
The classic example is indirection through a null pointer (CWG232):
*(int*)nullptr;
[expr.unary.op]/1 says that the result of applying the indirection operator is an lvalue which denotes the object to which the argument of the operator points to, whilst null pointer doesn't point to any object. So indirection through a null pointer is UB by omission of explicit definition of behavior for the case when the argument doesn't point to an object.

Related

Is undefined versus undefined behavior

The cppreference wording on front and back is surprisingly (at least for me) asymmetric.
front:
Calling front on an empty container is undefined.
back
Calling back on an empty container causes undefined behavior.
I know I am supposed to ask only one question, but still,
Why it is different?, and
What is the difference between is undefined and causes undefined behavior?
In C++, there is no difference between "undefined" and "undefined behavior." Both terms refer to the same concept: a situation in which the standard does not specify the expected outcome of a certain operation.
What is the difference between is undefined and causes undefined behavior?
They have the same meaning here.
Why it is different?,
Most likely because the page has been written by different authors or/and has not been updated for quite some time. Still, both are intended to mean the same thing.
Update
The page has now been updated to make the documentation language more consistent. In particular, now front says:
Calling front on an empty container causes undefined behavior.

Undefined vs. Unspecified vs. Implementation-defined behavior [duplicate]

This question already has answers here:
Undefined, unspecified and implementation-defined behavior
(9 answers)
Closed 2 years ago.
Wikipedia has pages about undefined and unspecified behavior and links to them are plentifully used in comments and answers here, in SO.
Each one begins with a note to not be confused with other one but except one no very clear sentence they didn't point at the difference between them.
One of them gives an example (comparing addresses of 2 variables: &a < &b) with the comment that this will results in unspecified behavior in C++, undefined in C.
Is it possible to pinpoint the substantial difference between undefined and unspecified behavior in a clear, understandable manner?
In short:
Undefined behaviour: this is not okay to do
Unspecified behaviour: this is okay to do, but the result could be anything*
Implementation-defined behaviour: this is okay to do, the result could be anything* but the compiler manual should tell you
Or, in quotes from the C++ standard (N4659 section 3, Terms and Definitions):
3.28 Undefined behavior: behavior for which this International Standard imposes no requirements
3.29 Unspecified behavior: behavior, for a well-formed program construct and correct data, that depends on the implementation
3.12 Implementation-defined behavior: behavior, for a well-formed program construct and correct data, that depends on the implementation and
that each implementation documents
EDIT: *As pointed out by M.M in the comments, saying that the result of unspecified behaviour could be anything is not quite right. In fact as the standard itself points out, in a note for paragraph 3.29
The range of possible behaviors is usually delineated by this International Standard.
So in practise you have some idea of what the possible results are, but what exactly will happen depends on your compiler/compiler flags/platform/etc.
Unspecified and its example ( &a < &b ) seems to say the compiler writer does not have to make a commitment to where it stores variables on a stack, and the result could change if nearby items were added or deleted (without changing the order of declaration of a and b).
Implementation specific is items such as a % b where the result is at the implementation's discretion (usually based on the hardware), as to what happens when a is negative.
Here it is important to describe what will happen, but would impact performance if the standard committed to a specific behavior.
Undefined behavior is describing the point your program becomes ill-formed - it may work on a particular platform, but not for any good reasons.

Is dereferencing a NULL pointer considered unspecified or undefined behaviour?

The consensus of stackoverflow questions say that it is undefined behaviour.
However, I recently saw a 2016 talk by Charles Bay titled:
Instruction Reordering Everywhere: The C++ 'As-If" Rule and the Role of Sequence.
At 37:53 he shows the following:
C++ Terms
Undefined Behaviour: Lack of Constraints
(order of globals initialization)
Unspecified Behaviour: Constraint Violation
(dereferencing NULL
pointer)
Now I have conflicting information.
Was this a typo? Has anything changed?
It is undefined behavior.
From 8.3.2 References of the C++11 Standard (emphasis mine):
5 ... [ Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior. As described in 9.6, a reference cannot be bound directly to a bit-field. —end note ]
The examples are associated with the wrong things. Regardless of what version of the C++ standard you assume (i.e. nothing has changed within the standards, in this regard).
Dereferencing a NULL pointer gives undefined behaviour. The standard does not define any constraint on what happens as a result.
The order of globals initialisation is an example of unspecified behaviour (the standard guarantees that all globals will be initialised [that's a constraint on how globals are initialised] but the order is not specified).

What is indeterminate behavior in C++ ? How is it different from undefined behavior?

What is the difference between an indeterminate behaviour and an undefined behaviour in C++? Is this classification valid for C codes also?
The following remarks are based on the C standard, ISO-9899, rather than the C++ one, but the meanings are fundamentally the same (see sections 3.4 and 4 of the C standard; see also the C++ standard, ISO-14882, section 1.3; the latter document doesn't define 'unspecified value' as such, but does use that phrase later with the obvious meaning). The official standards documents are not free (indeed, they are expensive), but the links above are to the committee pages, and include free 'drafts' of the standard, which you can take to be essentially equivalent to the finalised standard text.
The terms describe a ladder of vagueness.
So, heading downwards....
Most of the time, the standard defines what should happen in a particular case: if you write c=a+b and a and b are int, then c is their sum (modulo some details). This, of course, is the point of a standard.
Implementation-defined behaviour is where the standard lists two or more things which are allowed to happen in a particular case; it doesn't prescribe which one is preferred, but does demand that the implementation (the actual compiler which parses the C) makes a choice between the alternatives, does that same thing consistently, and that the implementation must document the choice it makes. For example, whether a single file can be opened by multiple processes is implementation-defined.
Unspecified behaviour is where the standard lists a couple of alternatives, each of which is therefore conformant with the standard, but goes no further. An implementation must choose one of the alternatives to pick in a particular case, but doesn't have to do the same thing each time, and doesn't have to commit itself in documentation to which choice it will make. For example, the padding bits in a struct are unspecified.
Undefined behaviour is the most extreme case. Here, all bets are off. If the compiler, or the program it generates, runs into undefined behaviour, it can do anything: it can scramble memory, corrupt the stack, HCF or, in the standard extreme case, cause demons to fly out of your nose. But mostly it'll just crash. And all of these behaviours are conformant with the standard. For example, if a variable is declared both static int i; and int i; in the same scope, or if you write #include <'my file'.h>, the effect is undefined.
There are analogous definitions for 'value'.
An unspecified value is a valid value, but the standard doesn't say what it is. Thus the standard might say that a given function returns an unspecified value. You can store that value and look at it if you want to, without causing an error, but it doesn't mean anything, and the function might return a different value next time, depending on the phase of the moon.
An implementation-defined value is like implementation-defined behaviour. Like unspecified, it's a valid value, but the implementation's documentation has to commit itself on what will be returned, and do the same thing each time.
An indeterminate value even more unspecified than unspecified. It's either an unspecified value or a trap representation. A trap representation is standards-speak for some magic value which, if you try to assign it to anything, results in undefined behaviour. This wouldn't have to be an actual value; probably the best way to think about it is "if C had exceptions, a trap representation would be an exception". For example, if you declare int i; in a block, without an initialisation, the initial value of the variable i is indeterminate, meaning that if you try to assign this to something else before initialising it, the behaviour is undefined, and the compiler is entitled to try the said demons-out-of-nose trick. Of course, in most cases, the compiler will do something less dramatic/fun, like initialise it to 0 or some other random valid value, but no matter what it does, you're not entitled to object.
The point of all this imprecision is to give maximal freedom to compiler writers. That's nice for compiler writers (and is one of the reasons it's reasonably easy to get a C compiler running on such a huge range of platforms), but it does make things rather more interesting than fun for the poor users.
Edit 1: to clarify indeterminate values.
Edit 2: to include a link to the C++ standard, and note that the committee drafts are essentially equivalent to the final standard, but free.
I think the standard mentions undefined behaviour and indeterminate value. So one is about the behaviour and another about values.
These two are somewhat orthogonal, for example, the behaviour can still be well defined in the presence of indeterminate values.
EDIT 1: The last drafts of C11 and C++11 are available online here: C11 draft N1570 and C++11 draft n3242 if you don't have a copy of the final standards and wonderful what they look like. (Other adjustments to text appearance and some wording/grammar edits have been done.)
EDIT 2: Fixed all occurrences of "behaviour" to be "behavior" to match the standard.
Searching the C++11 and C11 standards there are no matches for indeterminate rule or undefined rule. There are terms like indeterminate value, indeterminately sequenced, indeterminate uninitialized, etc.
If talk of traps and exceptions seems weird in Norman Gray's answer, know that those terms do reflect the relevant definitions in Section 3 in the C11 standard.
C++ relies on C's definitions. Many useful definitions concerning types of behaviour can be found in C11's Section 3 (in C11). For example, indeterminate value is defined in 3.19.2. Do take note that C11's Section 2 (Normative References) provides other sources for additional terminology interpretation and Section 4 defines when cases such as undefined behavior occur as a result of not complying with the standard.
C11's section 3.4 defines behavior, 3.4.1 defines implementation-defined behavior, 3.4.2 defines locale-specific behavior, 3.4.3 defines undefined behavior, 3.4.4 defines unspecified behavior. For value (Section 3.19), there are implementation-defined value, indeterminate value, and unspecified value.
Loosely speaking, the term indeterminate refers to an unspecified/unknown state that by itself doesn't result in undefined behavior. For example, this C++ code involves an indeterminate value: { int x = x; }. (This is actually an example in the C++11 standard.) Here x is defined to be an integer first but at that point it does not have a well-defined value --then it is initialized to whatever (indeterminate/unknown) value it has!
The well-known term undefined behavior is defined in 3.4.3 in C11 and refers to any situation of a
nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
In other words undefined behavior is some error (in logic or state) and whatever happens next is unknown! So one could make an undefined [behavior] rule that states: avoid undefined behavior when writing C/C++ code! :-)
An indeterminate [behavior] rule would be to state: avoid writing indeterminate code unless it is needed and it does not affect program correctness or portability. So unlike undefined behavior, indeterminate behavior does not necessarily imply that code/data is erroneous, however, its subsequent use may or may not be erroneous --so care is required to ensure program correctness is maintained.
Other terms like indeterminately sequenced are in the body text (e.g., C11 5.1.2.3 para 3; C++11, section 1.9 para. 13; i.e., in [intro.executation]). (As you might guess, it refers an unspecified order of operational steps.)
IMO if one is interested in all of these nuances, acquiring both the C++11 and C11 standards is a must. This will permit one to explore to the desired level-of-detail needed with definitions, etc. If you don't have such the links provided herein will help you explore such with the last published draft C11 and C++11 standards.

What's the difference in undefined behavior between C++03 and C++11?

The new standard has different undefined behavior from the old one. The new sequencing rules, for example, mean that some arithmetic operations that used to be undefined (for such reasons as multiple writes between sequence points) are now defined.
So, what do we need to learn anew about undefined behavior?
In my opinion the new rules are more complex to describe and to understand. For example consider that:
int x = 12;
x = x++ + 1; // undefined behaviour
x = ++x + 1; // valid
I would suggest to simply avoiding multiple side effects to the same variable in the same expression that is a rule simpler to understand. AFAIK C++0X changed some cases that were undefined behaviour in the past and that are now legal uses (for example the second of the two expressions above) but remember that there is and there will always be a difference between what is legal and what is moral ;-) ... no one is forcing you to use such things.
Actually in the above case seems that the validity of the second expression happened unintentionally as a side effect of fixing another issue (#222) in the language. The decision was to make the expression valid because it was considered that changing something from UB to well defined wasn't going to do any harm. I think however that while this didn't make any damage to programs (where of course UB is the worst possible problem), it actually made some damage to the language itself... changing a rule that was already complex to explain and understand to an even more obscure one.
IMO C++ is continuing its natural evolution from C into a language where a bunch of good-looking nice and logical statements can do wonderful things... and in which another bunch of equally good-looking, equally nice and equally logical statements can make your computer to explode instead.
C++0x changes a number of previously undefined cases to now conditionally-supported cases. The semantics is:
If the implementation does not support the conditionally-supported feature, it shall document that and shall emit a diagnostic for a program that violates it.
If the implementation does support it, it should behave to additional requirements the Standard makes on it. For example, the Standard might say something is conditionally-supported with implementation-defined semantics. If so, the implementation shall document how it supports the feature.
A popular case that previously was undefined is when passing an argument of class type having a non-trivial copy constructor, a non-trivial move contructor, or a non-trivial destructor though an ellipsis function parameter. This is now conditionally-supported, with implementation-defined semantics.