Is calling a function on a NULL pointer undefined? [duplicate] - c++

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
When does invoking a member function on a null instance result in undefined behavior?
C++ standard: dereferencing NULL pointer to get a reference?
Say I have the class:
class A
{
public:
void foo() { cout << "foo"; }
};
and call foo like so:
A* a = NULL;
a->foo();
I suspect this invokes undefined behavior, since it's equivalent to (*a).foo() (or is it?), and dereferencing a NULL is UB, but I can't find the reference. Can anyone help me out? Or is it defined?
No, the function is not virtual. No, I'm not accessing any members.
EDIT: I voted to close this question but will not delete it as I couldn't find the duplicate myself, and I suspect this title might be easier to find by others.

I'm looking for the reference that says a->x is equivalent to (*a).x.
Here it is:
[C++11: 5.2.5/2]: For the first option (dot) the first expression shall have complete class type. For the second option (arrow) the first expression shall have pointer to complete class type. The expression E1->E2 is converted to the equivalent form (*(E1)).E2; the remainder of 5.2.5 will address only the first option (dot). In either case, the id-expression shall name a member of the class or of one of its base classes. [ Note: because the name of a class is inserted in its class scope (Clause 9), the name of a class is also considered a nested member of that class. —end note ] [ Note: 3.4.5 describes how names are looked up after the . and -> operators. —end note ]
There is no direct quotation for dereferencing a NULL pointer being UB, unfortunately. You may find more under this question: When does invoking a member function on a null instance result in undefined behavior?

I'm aware of at least one case where this idiom is not only allowed but relied upon: Microsoft's MFC class CWnd provides a member function GetSafeHwnd which tests if this==NULL and returns without accessing any member variables.
Of course there are plenty of people who would claim that MFC is a very bad example.
Regardless of whether the behavior is undefined or not, in practice it's not likely to behave badly. The compiler will treat a->foo() as A::foo(a) which does not do a dereference at the call site, as long as foo is not virtual.

Yes, that is UB as a has not been initialized to point to a valid memory location before it is dereferenced.

It is covered here: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232
At least a couple of places in the IS state that indirection through a
null pointer produces undefined behavior: 1.9 [intro.execution]
paragraph 4 gives "dereferencing the null pointer" as an example of
undefined behavior, and 8.3.2 [dcl.ref] paragraph 4 (in a note) uses
this supposedly undefined behavior as justification for the
nonexistence of "null references."
However, 5.3.1 [expr.unary.op] paragraph 1, which describes the unary
"*" operator, does not say that the behavior is undefined if the
operand is a null pointer, as one might expect. Furthermore, at least
one passage gives dereferencing a null pointer well-defined behavior:
5.2.8 [expr.typeid] paragraph 2 says
If the lvalue expression is obtained by applying the unary * operator
to a pointer and the pointer is a null pointer value (4.10
[conv.ptr]), the typeid expression throws the bad_typeid exception
(18.7.3 [bad.typeid]).
This is inconsistent and should be cleaned up.
Read more at the link if you want to learn more.

Related

Why is dereferencing of nullptr while using a static method not undefined behaviour in C++?

I was reading a post on some nullptr peculiarities in C++, and a particular example caused some confusion in my understanding.
Consider (simplified example from the aforementioned post):
struct A {
void non_static_mem_fn() {}
static void static_mem_fn() {}
};
A* p{nullptr};
/*1*/ *p;
/*6*/ p->non_static_mem_fn();
/*7*/ p->static_mem_fn();
According to the authors, expression /*1*/ that dereferences the nullptr does not cause undefined behaviour by itself. Same with expression /*7*/ that uses the nullptr-object to call a static function.
The justification is based on issue 315 in C++ Standard Core Language Closed Issues, Revision 100 that has
...*p is not an error when p is null unless the lvalue is converted to an rvalue (7.1 [conv.lval]), which it isn't here.
thus making a distinction between /*6*/ and /*7*/.
So, the actual dereferencing of the nullptr is not undefined behaviour (answer on SO, discussion under issue 232 of C++ Standard, ...). Thus, the validity of /*1*/ is understandable under this assumption.
However, how is /*7*/ guaranteed to not cause UB? As per the cited quote, there is no conversion of lvalue to rvalue in p->static_mem_fn();. But the same is true for /*6*/ p->non_static_mem_fn();, and I think my guess is confirmed by the quote from the same issue 315 regarding:
/*6*/ is explicitly noted as undefined in 12.2.2
[class.mfct.non-static], even though one could argue that since non_static_mem_fn(); is
empty, there is no lvalue->rvalue conversion.
(in the quote, I changed "which" and f() to get the connection to the notation used in this question).
So, why is such a distinction made for p->static_mem_fn(); and p->non_static_mem_fn(); regarding the causality of UB? Is there an intended use of calling static functions from pointers that could potentially be nullptr?
Appendix:
this question asks about why dereferencing a nullptr is undefined behaviour. While I agree that in most cases it is a bad idea, I do not believe the statement is absolutely correct as per the links and quotes here.
similar discussion in this Q/A with some links to issue 232.
I was not able to find a question devoted to static methods and the nullptr dereferencing issue. Maybe I missed some obvious answer.
Standard citations in this answer are from the C++17 spec (N4713).
One of the sections cited in your question answers the question for non-static member functions. [class.mfct.non-static]/2:
If a non-static member function of a class X is called for an object that is not of type X, or of a type derived from X, the behavior is undefined.
This applies to, for example, accessing an object through a different pointer type:
std::string foo;
A *ptr = reinterpret_cast<A *>(&foo); // not UB by itself
ptr->non_static_mem_fn(); // UB by [class.mfct.non-static]/2
A null pointer doesn't point at any valid object, so it certainly doesn't point to an object of type A either. Using your own example:
p->non_static_mem_fn(); // UB by [class.mfct.non-static]/2
With that out of the way, why does this work in the static case? Let's pull together two parts of the standard:
[expr.ref]/2:
... The expression E1->E2 is converted to the equivalent form (*(E1)).E2 ...
[class.static]/1 (emphasis mine):
... A static member may be referred to using the class member access syntax, in which case the object expression is evaluated.
The second block, in particular, says that the object expression is evaluated even for static member access. This is important if, for example, it is a function call with side effects.
Put together, this implies that these two blocks are equivalent:
// 1
p->static_mem_fn();
// 2
*p;
A::static_mem_fn();
So the final question to answer is whether *p alone is undefined behavior when p is a null pointer value.
Conventional wisdom would say "yes" but this is not actually true. There is nothing in the standard that states dereferencing a null pointer alone is UB and there are several discussions that directly support this:
Issue 315, as you have mentioned in your question, explicitly states that *p is not UB when the result is unused.
DR 1102 removes "dereferencing the null pointer" as an example of UB. The given rationale is:
There are core issues surrounding the undefined behavior of dereferencing a null pointer. It appears the intent is that dereferencing is well defined, but using the result of the dereference will yield undefined behavior. This topic is too confused to be the reference example of undefined behavior, or should be stated more precisely if it is to be retained.
This DR links to issue 232 where it is discussed to add wording that explicitly indicates *p as defined behavior when p is a null pointer, as long as the result is not used.
In conclusion:
p->non_static_mem_fn(); // UB by [class.mfct.non-static]/2
p->static_mem_fn(); // Defined behavior per issue 232 and 315.

Is the use of "new" necessary when creating a pointer in C++? [duplicate]

So far I can't find how to deduce that the following:
int* ptr;
*ptr = 0;
is undefined behavior.
First of all, there's 5.3.1/1 that states that * means indirection which converts T* to T. But this doesn't say anything about UB.
Then there's often quoted 3.7.3.2/4 saying that using deallocation function on a non-null pointer renders the pointer invalid and later usage of the invalid pointer is UB. But in the code above there's nothing about deallocation.
How can UB be deduced in the code above?
Section 4.1 looks like a candidate (emphasis mine):
An lvalue (3.10) of a
non-function, non-array type T can be
converted to an rvalue. If T is an
incomplete type, a program that
necessitates this conversion is
ill-formed. If the object to which the
lvalue refers is not an object of type
T and is not an object of a type
derived from T, or if the object is
uninitialized, a program that
necessitates this conversion has
undefined behavior. If T is a
non-class type, the type of the rvalue
is the cv-unqualified version of T.
Otherwise, the type of the rvalue is
T.
I'm sure just searching on "uninitial" in the spec can find you more candidates.
I found the answer to this question is a unexpected corner of the C++ draft standard, section 24.2 Iterator requirements, specifically section 24.2.1 In general paragraph 5 and 10 which respectively say (emphasis mine):
[...][ Example: After the declaration of an uninitialized pointer x (as with int* x;), x must always be assumed to have a singular value of a pointer. —end example ] [...] Dereferenceable values are always non-singular.
and:
An invalid iterator is an iterator that may be singular.268
and footnote 268 says:
This definition applies to pointers, since pointers are iterators. The effect of dereferencing an iterator that has been invalidated is undefined.
Although it does look like there is some controversy over whether a null pointer is singular or not and it looks like the term singular value needs to be properly defined in a more general manner.
The intent of singular is seems to be summed up well in defect report 278. What does iterator validity mean? under the rationale section which says:
Why do we say "may be singular", instead of "is singular"? That's becuase a valid iterator is one that is known to be nonsingular. Invalidating an iterator means changing it in such a way that it's no longer known to be nonsingular. An example: inserting an element into the middle of a vector is correctly said to invalidate all iterators pointing into the vector. That doesn't necessarily mean they all become singular.
So invalidation and being uninitialized may create a value that is singular but since we can not prove they are nonsingular we must assume they are singular.
Update
An alternative common sense approach would be to note that the draft standard section 5.3.1 Unary operators paragraph 1 which says(emphasis mine):
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.[...]
and if we then go to section 3.10 Lvalues and rvalues paragraph 1 says(emphasis mine):
An lvalue (so called, historically, because lvalues could appear on the left-hand side of an assignment expression) designates a function or an object. [...]
but ptr will not, except by chance, point to a valid object.
The OP's question is nonsense. There is no requirement that the Standard say certain behaviours are undefined, and indeed I would argue that all such wording be removed from the Standard because it confuses people and makes the Standard more verbose than necessary.
The Standard defines certain behaviour. The question is, does it specify any behaviour in this case? If it does not, the behaviour is undefined whether or not it says so explicitly.
In fact the specification that some things are undefined is left in the Standard primarily as a debugging aid for the Standards writers, the idea being to generate a contradiction if there is a requirement in one place which conflicts with an explicit statement of undefined behaviour in another: that's a way to prove a defect in the Standard. Without the explicit statement of undefined behaviour, the other clause prescribing behaviour would be normative and unchallenged.
Evaluating an uninitialized pointer causes undefined behaviour. Since dereferencing the pointer first requires evaluating it, this implies that dereferencing also causes undefined behaviour.
This was true in both C++11 and C++14, although the wording changed.
In C++14 it is fully covered by [dcl.init]/12:
When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced.
If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:
where the "following cases" are particular operations on unsigned char.
In C++11, [conv.lval/2] covered this under the lvalue-to-rvalue conversion procedure (i.e. retrieving the pointer value from the storage area denoted by ptr):
A glvalue of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the glvalue refers is not
an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.
The bolded part was removed for C++14 and replaced with the extra text in [dcl.init/12].
I'm not going to pretend I know a lot about this, but some compilers would initialize the pointer to NULL and dereferencing a pointer to NULL is UB.
Also considering that uninitialized pointer could point to anything (this includes NULL) you could concluded that it's UB when you dereference it.
A note in section 8.3.2 [dcl.ref]
[Note: in particular, a null reference
cannot exist in a well-defined
program, because the only way to
create such a reference would be to
bind it to the “object” obtained by
dereferencing a null pointer, which
causes undefined behavior. As
described in 9.6, a reference cannot
be bound directly to a bitfield. ]
—ISO/IEC 14882:1998(E), the ISO C++ standard, in section 8.3.2 [dcl.ref]
I think I should have written this as comment instead, I'm not really that sure.
To dereference the pointer, you need to read from the pointer variable (not talking about the object it points to). Reading from an uninitialized variable is undefined behaviour.
What you do with the value of pointer after you have read it, doesn't matter anymore at this point, be it writing to (like in your example) or reading from the object it points to.
Even if the normal storage of something in memory would have no "room" for any trap bits or trap representations, implementations are not required to store automatic variables the same way as static-duration variables except when there is a possibility that user code might hold a pointer to them somewhere. This behavior is most visible with integer types. On a typical 32-bit system, given the code:
uint16_t foo(void);
uint16_t bar(void);
uint16_t blah(uint32_t q)
{
uint16_t a;
if (q & 1) a=foo();
if (q & 2) a=bar();
return a;
}
unsigned short test(void)
{
return blah(65540);
}
it would not be particularly surprising for test to yield 65540 even though that value is outside the representable range of uint16_t, a type which has no trap representations. If a local variable of type uint16_t holds Indeterminate Value, there is no requirement that reading it yield a value within the range of uint16_t. Since unexpected behaviors could result when using even unsigned integers in such fashion, there's no reason to expect that pointers couldn't behave in even worse fashion.

Dereferencing an invalid pointer, then taking the address of the result

Consider:
int* ptr = (int*)0xDEADBEEF;
cout << (void*)&*ptr;
How illegal is the *, given that it's used in conjunction with an immediate & and given that there are no overloaded op&/op* in play?
(This has particular ramifications for addressing a past-the-end array element &myArray[n], an expression which is explicitly equivalent to &*(myArray+n). This Q&A addresses the wider case but I don't feel that it ever really satisfied the above question.)
According to the specification, the effect of dereferencing an invalid pointer itself produces undefined behaviour. It doesn't matter what you do after dereferencing it.
Assuming the variable `ptr' does not contain a pointer to a valid object, the undefined behavior occurs if the program necessitates the lvalue-to-rvalue conversion of the expression `*ptr', as specified in [conv.lval] (ISO/IEC 14882:2011, page 82, 4.1 [#1]).
During the evaluation of `&*ptr' the program does not necessitate the lvalue-to-rvalue conversion of the subexpression `*ptr', according to [expr.unary.op] (ISO/IEC 14882:2011, page 109, 5.3.1 [#3])
Hence, it is legal.
It is legal. Why wouldn't it be? You're just setting a value to a pointer, and then accessing to it. However, assigning the value by hand must be obviously specified as undefined behavior, but that's the most a general specification can say. Then, you use it in some embedded software controller, and it will give you the correct memory-mapped value for some device...

Accessing static member through invalid pointer: guaranteed to "work"? [duplicate]

This question already has answers here:
C++ static const access through a NULL pointer [duplicate]
(5 answers)
Closed 8 years ago.
Setup
Given this user-defined type:
struct T
{
static int x;
int y;
T() : y(38);
};
and the requisite definition placed somewhere useful:
int T::x = 42;
the following is the canonical way to stream the int's value to stdout:
std::cout << T::x;
Control
Meanwhile, the following is (of course) invalid due to an instance of T not existing:
T* ptr = NULL; // same if left uninitialised
std::cout << ptr->y;
Question
Now consider the horrid and evil and bad following code:
T* ptr = NULL;
std::cout << ptr->x; // remember, x is static
Dereferencing ptr is invalid, as stated above. Even though no physical memory dereference takes place here, I believe that it still counts as one, making the above code UB. Or... does it?
14882:2003 5.2.5/3 states explicitly that a->b is converted to (*(a)).b, and that:
The postfix expression before the dot or arrow is evaluated;
This evaluation happens even if the result is unnecessary to determine the value of the entire postfix expression, for example if the id-expression denotes a static member.
But it's not clear whether "evaluation" here involves an actual dereference. In fact neither 14882:2003 nor n3035 seem to explicitly say either way whether the pointer-expression has to evaluate to a pointer to a valid instance when dealing with static members.
My question is, just how invalid is this? Is it really specifically prohibited by the standard (even though there's no physical dereference), or is it just a quirk of the language that we can probably get away with? And even if it is prohibited, to what extent might we expect GCC/MSVC/Clang to treat it safely anyway?
My g++ 4.4 appeared to produce code that never attempts to push the [invalid] this pointer onto the stack, with optimisations turned off.
BTW If T were polymorphic then that would not affect this, as static members cannot be virtual.
it's not clear whether "evaluation" here involves an actual dereference.
I read "evaluation" here as "the subexpression is evaluated." That would mean that the unary * is evaluated and you perform indirection via a null pointer, yielding undefined behavior.
This issue (accessing a static member via a null pointer) is discussed in another question, When does invoking a member function on a null instance result in undefined behavior? While it discusses member functions specifically, I don't see any reason that data members are any different in this respect. There is some good discussion of the issue there.
There was a defect reported against the C++ Standard that asks "Is call of static member function through null pointer undefined?" (see CWG Defect 315) This defect is closed and its resolution states that it is valid to call a static member function via a null pointer:
p->f() is rewritten as (*p).f() according to 5.2.5 [expr.ref]. *p is not an error when p is null unless the lvalue is converted to an rvalue
However, this resolution is in fact wrong.
It presupposes the concept of an "empty lvalue," which is part of the proposed resolution for another defect, CWG defect 232, which asks the more general question, "Is indirection through a null pointer undefined behavior?"
The resolution to that defect would make certain forms of indirection through a null pointer (like calling a static member function) valid. However, that defect is still open and its resolution has not been adopted into the C++ Standard. Until that defect is closed and its resolution is incorporated into the C++ Standard, indirection via a null pointer (or dereferencing a null pointer, if one prefers that term) always yields undefined behavior.
Concerning p->a, where p is a null pointer, and a a static data member:
§9.4/2 says "A static member may be referred to using the class member
access syntax, in which case the object-expression is evaluated." (The
"object-expression" is the expression to the left of the . or the ->.)
Looking from computer side onto OOP it is yet another way to calculate where data resides in memory. When data is not static - it is calculated from instance pointer, when data is static it is always calculated as fixed pointer in data segment. Template adds nothing, since resolved at compile time.
So it is rather popular technique to use NULL as start pointer (for example evaluate offset of filed in class for persisting purposes)
So code above is correct for static data.

C++ standard: dereferencing NULL pointer to get a reference? [duplicate]

This question already has answers here:
Is null reference possible?
(4 answers)
Closed 3 years ago.
I'm wondering about what the C++ standard says about code like this:
int* ptr = NULL;
int& ref = *ptr;
int* ptr2 = &ref;
In practice the result is that ptr2 is NULL but I'm wondering, is this just an implementation detail or is this well defined in the standard?
Under different circumstances a dereferencing of a NULL pointer should result in a crash but here I'm dereferencing it to get a reference which is implemented by the compiler as a pointer so there's really no actual dereferencing of NULL.
Dereferencing a NULL pointer is undefined behavior.
In fact the standard calls this exact situation out in a note (8.3.2/4 "References"):
Note: in particular, a null reference cannot exist in a well-defined program, because the only
way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior.
As an aside: The one time I'm aware of that a NULL pointer can be "dereferenced" in a well-defined way is as the operand to the sizeof operator, because the operand to sizeof isn't actually evaluated (so the dereference never actually occurs).
Dereferencing a NULL pointer is explicitly undefined behaviour in the C++ standard, so what you see is implementation specific.
Copying from 1.9.4 in the C++0x draft standard (similar to previous standards in this respect):
Certain other operations are described
in this International Standard as
undefined (for example, the effect of
dereferencing the null pointer).
[Note: this International Standard
imposes no requirements on the
behavior of programs that contain
undefined behavior. - end note]
Dereferencing a NULL pointer is undefined behaviour. You should check if a value is NULL before dereferencing it.
For completeness, this: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232 talks specifically about this issue.
int& ref = *ptr;
The above statement doesn't actually dereference anything. So there's no problem until you use the ref (which is invalid).