The consensus of stackoverflow questions say that it is undefined behaviour.
However, I recently saw a 2016 talk by Charles Bay titled:
Instruction Reordering Everywhere: The C++ 'As-If" Rule and the Role of Sequence.
At 37:53 he shows the following:
C++ Terms
Undefined Behaviour: Lack of Constraints
(order of globals initialization)
Unspecified Behaviour: Constraint Violation
(dereferencing NULL
pointer)
Now I have conflicting information.
Was this a typo? Has anything changed?
It is undefined behavior.
From 8.3.2 References of the C++11 Standard (emphasis mine):
5 ... [ Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior. As described in 9.6, a reference cannot be bound directly to a bit-field. —end note ]
The examples are associated with the wrong things. Regardless of what version of the C++ standard you assume (i.e. nothing has changed within the standards, in this regard).
Dereferencing a NULL pointer gives undefined behaviour. The standard does not define any constraint on what happens as a result.
The order of globals initialisation is an example of unspecified behaviour (the standard guarantees that all globals will be initialised [that's a constraint on how globals are initialised] but the order is not specified).
Related
I was reading a post on some nullptr peculiarities in C++, and a particular example caused some confusion in my understanding.
Consider (simplified example from the aforementioned post):
struct A {
void non_static_mem_fn() {}
static void static_mem_fn() {}
};
A* p{nullptr};
/*1*/ *p;
/*6*/ p->non_static_mem_fn();
/*7*/ p->static_mem_fn();
According to the authors, expression /*1*/ that dereferences the nullptr does not cause undefined behaviour by itself. Same with expression /*7*/ that uses the nullptr-object to call a static function.
The justification is based on issue 315 in C++ Standard Core Language Closed Issues, Revision 100 that has
...*p is not an error when p is null unless the lvalue is converted to an rvalue (7.1 [conv.lval]), which it isn't here.
thus making a distinction between /*6*/ and /*7*/.
So, the actual dereferencing of the nullptr is not undefined behaviour (answer on SO, discussion under issue 232 of C++ Standard, ...). Thus, the validity of /*1*/ is understandable under this assumption.
However, how is /*7*/ guaranteed to not cause UB? As per the cited quote, there is no conversion of lvalue to rvalue in p->static_mem_fn();. But the same is true for /*6*/ p->non_static_mem_fn();, and I think my guess is confirmed by the quote from the same issue 315 regarding:
/*6*/ is explicitly noted as undefined in 12.2.2
[class.mfct.non-static], even though one could argue that since non_static_mem_fn(); is
empty, there is no lvalue->rvalue conversion.
(in the quote, I changed "which" and f() to get the connection to the notation used in this question).
So, why is such a distinction made for p->static_mem_fn(); and p->non_static_mem_fn(); regarding the causality of UB? Is there an intended use of calling static functions from pointers that could potentially be nullptr?
Appendix:
this question asks about why dereferencing a nullptr is undefined behaviour. While I agree that in most cases it is a bad idea, I do not believe the statement is absolutely correct as per the links and quotes here.
similar discussion in this Q/A with some links to issue 232.
I was not able to find a question devoted to static methods and the nullptr dereferencing issue. Maybe I missed some obvious answer.
Standard citations in this answer are from the C++17 spec (N4713).
One of the sections cited in your question answers the question for non-static member functions. [class.mfct.non-static]/2:
If a non-static member function of a class X is called for an object that is not of type X, or of a type derived from X, the behavior is undefined.
This applies to, for example, accessing an object through a different pointer type:
std::string foo;
A *ptr = reinterpret_cast<A *>(&foo); // not UB by itself
ptr->non_static_mem_fn(); // UB by [class.mfct.non-static]/2
A null pointer doesn't point at any valid object, so it certainly doesn't point to an object of type A either. Using your own example:
p->non_static_mem_fn(); // UB by [class.mfct.non-static]/2
With that out of the way, why does this work in the static case? Let's pull together two parts of the standard:
[expr.ref]/2:
... The expression E1->E2 is converted to the equivalent form (*(E1)).E2 ...
[class.static]/1 (emphasis mine):
... A static member may be referred to using the class member access syntax, in which case the object expression is evaluated.
The second block, in particular, says that the object expression is evaluated even for static member access. This is important if, for example, it is a function call with side effects.
Put together, this implies that these two blocks are equivalent:
// 1
p->static_mem_fn();
// 2
*p;
A::static_mem_fn();
So the final question to answer is whether *p alone is undefined behavior when p is a null pointer value.
Conventional wisdom would say "yes" but this is not actually true. There is nothing in the standard that states dereferencing a null pointer alone is UB and there are several discussions that directly support this:
Issue 315, as you have mentioned in your question, explicitly states that *p is not UB when the result is unused.
DR 1102 removes "dereferencing the null pointer" as an example of UB. The given rationale is:
There are core issues surrounding the undefined behavior of dereferencing a null pointer. It appears the intent is that dereferencing is well defined, but using the result of the dereference will yield undefined behavior. This topic is too confused to be the reference example of undefined behavior, or should be stated more precisely if it is to be retained.
This DR links to issue 232 where it is discussed to add wording that explicitly indicates *p as defined behavior when p is a null pointer, as long as the result is not used.
In conclusion:
p->non_static_mem_fn(); // UB by [class.mfct.non-static]/2
p->static_mem_fn(); // Defined behavior per issue 232 and 315.
So, I've been reading the C++ standard and came to [defns.undefined] (3.27 in this C++17 draft that I'm reading. Note that while I'm citing C++17 here, I've found similar wording in other standards)--that is the definition of Undefined Behavior. I noticed this wording (emphasis mine):
Note: Undefined behavior may be expected when this International Standard omits any explicit definition of
behavior or when a program uses an erroneous construct or erroneous data
Now, thinking about this, this sort of makes sense. It's sort of saying that if the Standard doesn't give a behavior for it, it has undefined behavior. It seems to be saying that if you do something that is out of scope of the Standard, the Standard has nothing to say about it. That makes sense.
However, this is also kind of weird, because I always thought Undefined Behavior had to be explicitly declared by the Standard. Yet, this seems to imply that we should assume Undefined Behavior unless we are told otherwise.
If this is the case, then couldn't there be instances of Undefined Behavior that are Undefined Behavior because the Standard didn't explicitly give a behavior for some construct? And if such a thing is possible, could is it possible to generate an example (that would still compile) of Undefined Behavior that is Undefined Behavior because of this wording, or would anything that fall under this be near impossible to construct for some reason?
If this is the case, then couldn't there be instances of Undefined Behavior that are Undefined Behavior because the Standard didn't explicitly give a behavior for some construct?
I think this is the correct point of view. If the standard "accidentally" omits a specification of how a particular construct behaves, but it's something that we all know "should" be well-defined, then it's a defect in the standard and needs to be fixed. If, on the other hand, it's a construct that "should" be UB, then the standard is already "correct" (although there are benefits to being explicit).
For example, the standard fails to mention what happens if typeid is applied to an lvalue of polymorphic class type if the object's constructor has not yet begun executing or the destructor has completed. Therefore, the behaviour is undefined by omission. It's also something that's "obviously" UB. So there is no problem.
is it possible to generate an example (that would still compile) of Undefined Behavior that is Undefined Behavior because of this wording
The classic example is indirection through a null pointer (CWG232):
*(int*)nullptr;
[expr.unary.op]/1 says that the result of applying the indirection operator is an lvalue which denotes the object to which the argument of the operator points to, whilst null pointer doesn't point to any object. So indirection through a null pointer is UB by omission of explicit definition of behavior for the case when the argument doesn't point to an object.
This question already has answers here:
Is null reference possible?
(4 answers)
Closed 3 years ago.
I'm wondering about what the C++ standard says about code like this:
int* ptr = NULL;
int& ref = *ptr;
int* ptr2 = &ref;
In practice the result is that ptr2 is NULL but I'm wondering, is this just an implementation detail or is this well defined in the standard?
Under different circumstances a dereferencing of a NULL pointer should result in a crash but here I'm dereferencing it to get a reference which is implemented by the compiler as a pointer so there's really no actual dereferencing of NULL.
Dereferencing a NULL pointer is undefined behavior.
In fact the standard calls this exact situation out in a note (8.3.2/4 "References"):
Note: in particular, a null reference cannot exist in a well-defined program, because the only
way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior.
As an aside: The one time I'm aware of that a NULL pointer can be "dereferenced" in a well-defined way is as the operand to the sizeof operator, because the operand to sizeof isn't actually evaluated (so the dereference never actually occurs).
Dereferencing a NULL pointer is explicitly undefined behaviour in the C++ standard, so what you see is implementation specific.
Copying from 1.9.4 in the C++0x draft standard (similar to previous standards in this respect):
Certain other operations are described
in this International Standard as
undefined (for example, the effect of
dereferencing the null pointer).
[Note: this International Standard
imposes no requirements on the
behavior of programs that contain
undefined behavior. - end note]
Dereferencing a NULL pointer is undefined behaviour. You should check if a value is NULL before dereferencing it.
For completeness, this: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232 talks specifically about this issue.
int& ref = *ptr;
The above statement doesn't actually dereference anything. So there's no problem until you use the ref (which is invalid).
So far I can't find how to deduce that the following:
int* ptr;
*ptr = 0;
is undefined behavior.
First of all, there's 5.3.1/1 that states that * means indirection which converts T* to T. But this doesn't say anything about UB.
Then there's often quoted 3.7.3.2/4 saying that using deallocation function on a non-null pointer renders the pointer invalid and later usage of the invalid pointer is UB. But in the code above there's nothing about deallocation.
How can UB be deduced in the code above?
Section 4.1 looks like a candidate (emphasis mine):
An lvalue (3.10) of a
non-function, non-array type T can be
converted to an rvalue. If T is an
incomplete type, a program that
necessitates this conversion is
ill-formed. If the object to which the
lvalue refers is not an object of type
T and is not an object of a type
derived from T, or if the object is
uninitialized, a program that
necessitates this conversion has
undefined behavior. If T is a
non-class type, the type of the rvalue
is the cv-unqualified version of T.
Otherwise, the type of the rvalue is
T.
I'm sure just searching on "uninitial" in the spec can find you more candidates.
I found the answer to this question is a unexpected corner of the C++ draft standard, section 24.2 Iterator requirements, specifically section 24.2.1 In general paragraph 5 and 10 which respectively say (emphasis mine):
[...][ Example: After the declaration of an uninitialized pointer x (as with int* x;), x must always be assumed to have a singular value of a pointer. —end example ] [...] Dereferenceable values are always non-singular.
and:
An invalid iterator is an iterator that may be singular.268
and footnote 268 says:
This definition applies to pointers, since pointers are iterators. The effect of dereferencing an iterator that has been invalidated is undefined.
Although it does look like there is some controversy over whether a null pointer is singular or not and it looks like the term singular value needs to be properly defined in a more general manner.
The intent of singular is seems to be summed up well in defect report 278. What does iterator validity mean? under the rationale section which says:
Why do we say "may be singular", instead of "is singular"? That's becuase a valid iterator is one that is known to be nonsingular. Invalidating an iterator means changing it in such a way that it's no longer known to be nonsingular. An example: inserting an element into the middle of a vector is correctly said to invalidate all iterators pointing into the vector. That doesn't necessarily mean they all become singular.
So invalidation and being uninitialized may create a value that is singular but since we can not prove they are nonsingular we must assume they are singular.
Update
An alternative common sense approach would be to note that the draft standard section 5.3.1 Unary operators paragraph 1 which says(emphasis mine):
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.[...]
and if we then go to section 3.10 Lvalues and rvalues paragraph 1 says(emphasis mine):
An lvalue (so called, historically, because lvalues could appear on the left-hand side of an assignment expression) designates a function or an object. [...]
but ptr will not, except by chance, point to a valid object.
The OP's question is nonsense. There is no requirement that the Standard say certain behaviours are undefined, and indeed I would argue that all such wording be removed from the Standard because it confuses people and makes the Standard more verbose than necessary.
The Standard defines certain behaviour. The question is, does it specify any behaviour in this case? If it does not, the behaviour is undefined whether or not it says so explicitly.
In fact the specification that some things are undefined is left in the Standard primarily as a debugging aid for the Standards writers, the idea being to generate a contradiction if there is a requirement in one place which conflicts with an explicit statement of undefined behaviour in another: that's a way to prove a defect in the Standard. Without the explicit statement of undefined behaviour, the other clause prescribing behaviour would be normative and unchallenged.
Evaluating an uninitialized pointer causes undefined behaviour. Since dereferencing the pointer first requires evaluating it, this implies that dereferencing also causes undefined behaviour.
This was true in both C++11 and C++14, although the wording changed.
In C++14 it is fully covered by [dcl.init]/12:
When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced.
If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:
where the "following cases" are particular operations on unsigned char.
In C++11, [conv.lval/2] covered this under the lvalue-to-rvalue conversion procedure (i.e. retrieving the pointer value from the storage area denoted by ptr):
A glvalue of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the glvalue refers is not
an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.
The bolded part was removed for C++14 and replaced with the extra text in [dcl.init/12].
I'm not going to pretend I know a lot about this, but some compilers would initialize the pointer to NULL and dereferencing a pointer to NULL is UB.
Also considering that uninitialized pointer could point to anything (this includes NULL) you could concluded that it's UB when you dereference it.
A note in section 8.3.2 [dcl.ref]
[Note: in particular, a null reference
cannot exist in a well-defined
program, because the only way to
create such a reference would be to
bind it to the “object” obtained by
dereferencing a null pointer, which
causes undefined behavior. As
described in 9.6, a reference cannot
be bound directly to a bitfield. ]
—ISO/IEC 14882:1998(E), the ISO C++ standard, in section 8.3.2 [dcl.ref]
I think I should have written this as comment instead, I'm not really that sure.
To dereference the pointer, you need to read from the pointer variable (not talking about the object it points to). Reading from an uninitialized variable is undefined behaviour.
What you do with the value of pointer after you have read it, doesn't matter anymore at this point, be it writing to (like in your example) or reading from the object it points to.
Even if the normal storage of something in memory would have no "room" for any trap bits or trap representations, implementations are not required to store automatic variables the same way as static-duration variables except when there is a possibility that user code might hold a pointer to them somewhere. This behavior is most visible with integer types. On a typical 32-bit system, given the code:
uint16_t foo(void);
uint16_t bar(void);
uint16_t blah(uint32_t q)
{
uint16_t a;
if (q & 1) a=foo();
if (q & 2) a=bar();
return a;
}
unsigned short test(void)
{
return blah(65540);
}
it would not be particularly surprising for test to yield 65540 even though that value is outside the representable range of uint16_t, a type which has no trap representations. If a local variable of type uint16_t holds Indeterminate Value, there is no requirement that reading it yield a value within the range of uint16_t. Since unexpected behaviors could result when using even unsigned integers in such fashion, there's no reason to expect that pointers couldn't behave in even worse fashion.
This question already has answers here:
Is null reference possible?
(4 answers)
Closed 3 years ago.
I'm wondering about what the C++ standard says about code like this:
int* ptr = NULL;
int& ref = *ptr;
int* ptr2 = &ref;
In practice the result is that ptr2 is NULL but I'm wondering, is this just an implementation detail or is this well defined in the standard?
Under different circumstances a dereferencing of a NULL pointer should result in a crash but here I'm dereferencing it to get a reference which is implemented by the compiler as a pointer so there's really no actual dereferencing of NULL.
Dereferencing a NULL pointer is undefined behavior.
In fact the standard calls this exact situation out in a note (8.3.2/4 "References"):
Note: in particular, a null reference cannot exist in a well-defined program, because the only
way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior.
As an aside: The one time I'm aware of that a NULL pointer can be "dereferenced" in a well-defined way is as the operand to the sizeof operator, because the operand to sizeof isn't actually evaluated (so the dereference never actually occurs).
Dereferencing a NULL pointer is explicitly undefined behaviour in the C++ standard, so what you see is implementation specific.
Copying from 1.9.4 in the C++0x draft standard (similar to previous standards in this respect):
Certain other operations are described
in this International Standard as
undefined (for example, the effect of
dereferencing the null pointer).
[Note: this International Standard
imposes no requirements on the
behavior of programs that contain
undefined behavior. - end note]
Dereferencing a NULL pointer is undefined behaviour. You should check if a value is NULL before dereferencing it.
For completeness, this: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232 talks specifically about this issue.
int& ref = *ptr;
The above statement doesn't actually dereference anything. So there's no problem until you use the ref (which is invalid).