At what point does dereferencing the null pointer become undefined behavior? - c++

If I don't actually access the dereferenced "object", is dereferencing the null pointer still undefined?
int* p = 0;
int& r = *p; // undefined?
int* q = &*p; // undefined?
A slightly more practical example: can I dereference the null pointer to distinguish between overloads?
void foo(Bar&);
void foo(Baz&);
foo(*(Bar*)0); // undefined?
Okay, the reference examples are definitely undefined behavior according to the standard:
a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the "object" obtained by dereferencing a null pointer, which causes undefined behavior.
Unfortunately, the emphasized part is ambiguous. Is it the binding part that causes undefined behavior, or is the dereferencing part sufficient?

I think the second opus of What every C programmer should know about Undefined Behavior might help illustrate this issue.
Taking the example of the blog:
void contains_null_check(int *P) {
int dead = *P;
if (P == 0)
return;
*P = 4;
}
Might be optimized to (RNCE: Redundant Null Check Elimintation):
void contains_null_check_after_RNCE(int *P) {
int dead = *P;
if (false) // P was dereferenced by this point, so it can't be null
return;
*P = 4;
}
Which is turn optimized into (DCE: Dead Code Elimination):
void contains_null_check_after_RNCE_and_DCE(int *P) {
//int dead = *P; -- dead store
//if (false) -- unreachable branch
// return;
*P = 4;
}
As you can see, even though dead is never used, the simple int dead = *P assignment has caused Undefined Behavior to creep in the program.
To distinguish between overloads, I'd suggest using a pointer (which might be null) rather than artificially creating a null reference and exposing yourself to Undefined Behavior.

int& r = *p; // undefined?
I think right here you've undefined behavior even if you don't actually use r (or *p)- the dereferenced object. Because after this step (i.e dereferencing the null pointer), the program behaviour is not guaranteed by the language, as the program may crash immediately which is one of the possibilities of UB. You seem to think that only reading the value of r so as to be used in real purpose invokes UB. I don't think so.
Also, the language specification clearly says "the effect of dereferencing the null pointer" invokes undefined behavior. It does not say "the effect of actually using dereferenced object from a null pointer" invokes UB. The effect of dereferencing the null pointer (or in other words undefined behavior) doesn't mean that you will necessarily and immediately get problems, or it must crash immediately after dereferencing the null pointer. No. It simply means, the program behavior is not defined after dereferencing the null pointer. That is, the program may run normally, as expected, from start to end. Or it may crash immediately, or after after some time - after few minutes, hours or days. Anything can happen anytime after dereferencing the null pointer.

Yes it is undefined behavior, because the spec says that an "lvalue designates an object or function" (at clause 3.10) and it says for the *-operator "the result [of dereferencing] is an lvalue referring to the object or function to which the expression points" (at clause 5.3.1).
That means there is no description for what happens when you dereference a null pointer. It's simply undefined behavior.

Related

What are the rules for a valid dereferencing of a null pointer?

#include <iostream>
struct X
{
bool isNull() { return this == nullptr; }
bool isNullConst() const { return this == nullptr; }
};
bool isNull(X& x) { return &x == nullptr; }
bool isNullConst(const X& x) { return &x == nullptr; }
// always false or exception.
bool isNullCopy(X x) { return &x == nullptr; }
int main()
{
X* x = nullptr;
std::cout << x->isNull() << '\n';
std::cout << (*x).isNull() << '\n';
std::cout << isNull(*x) << '\n';
// std::cout << isNull2(*x) << '\n'; // exception.
}
Here, I know that X::isNull() is equivalent to isNull(X&) and that X::isNullConst() is equivalent to isNullConst(const X&).
What I did not know is that it's normal to dereference a null pointer. I thought that any dereferencing for a null pointer would result in an exception.
After playing with pointers for a bit, I concluded that dereferencing a null pointer itself is not the problem, the problem is trying to read or write to the address pointed to by the null pointer.
And since the functions are in a well known location in memory, dereferencing a null pointer to a class and calling one of its functions will just result in calling the function with the null object as the first parameter.
That was new to me, but that's probably not the complete picture.
I thought at first that this was an OOP concept at first, thus it should work in java for example, but it didn't work here and caused an exception (which makes me think why it doesn't work in java?...):
class X
{
boolean isNull() { return this == null; }
}
public class Main {
public static void main(String[] args) {
X x = null;
System.out.println(x.isNull());
}
}
So, clearly this is something related to C++ and not OOP in general.
What are all of the situations under which dereferencing a null pointer will be valid and won't cause exceptions?
Is there something else other than pointers of structs and classes that can be dereferenced successfully even if they're null pointers?
Also, why is calling a function of a null pointer without accessing its fields raises an exception in other languages like java?
One case where dereferencing a null pointer makes sense is in Red-Black trees for example. Null pointers are considered to be black.
#define RED true
#define BLACK false;
struct Node
{
bool color;
bool isRed()
{
return this != nullptr && this->color == RED;
}
};
bool isRed(Node* node)
{
return node != nullptr && node->color == RED;
}
Here, I believe it makes more sense to include the function in the Node class itself since it's related to it. It's not very convenient to include all of the logic related to the node inside it except for the one that checks for it being null.
I thought that any dereferencing for a null pointer would result in an exception.
No. Dereferencing a null pointer is undefinded behavior in C++.
C++ is not Java. C++ does have exceptions, but they are only for exceptional casses, not used all over the place (as in Java). You are supposed to know that dereferencing a null pointer is not allowed, and a compiler assumes that it never happens in correct code. If it still happens your code is invalid.
Read about undefined behavior. It is essential to know about it when you want to do anything serious in C++.
What are the rules for a valid dereferencing of a null pointer?
The rule is: You shall not do it. When you do it your code is ill-formed no diagnostics required. This is a different way to say: Your code has undefined behavior. The compiler is not reuqired to issue an error or warning and when you ask a compiler to compile your wrong code the result can be anything.
In Java your object declarations are references. So you can deliver a null reference to a method and it won't harm since the method can check if the reference points to a null object.
But calling a method onto a null reference won't work because the method is called upon the object behind the reference. Since it is null, the method can't be called onto any object so a NullpointerException is thrown.
What are the rules for a valid dereferencing of a null pointer [in C++]?
C++ standard is actually somewhat non-specific about whether indirecting through a null pointer is valid by itself or not. It is not disallowed explicitly. The standard used to use "dereferencing the null pointer" as an example of undefined behaviour, but this example has since been removed.
There is an active core language issue CWG-232 titled "Is indirection through a null pointer undefined behavior?" where this is discussed. It has a proposed change of wording to explicitly allow indirection through a null pointer, and even to allow "empty" references in the language. The issue was created 20 years ago, has last been updated 15 years ago, when the proposed wording was found insufficient.
Here are a few examples:
X* ptr = nullptr;
*ptr;
Above, the result of the indirection is discarded. This is a case where standard is not explicit about its validity one way or another. The proposed wording would have allowed this explicitly. This is also a fairly pointless operation.
X& x = *ptr;
X* ptr2 = &x; // ptr2 == nullptr?
Above, the result of indirection through null is bound to an lvalue. This is explicitly undefined behaviour now, but the proposed wording would have allowed this.
ptr->member_function();
Above, the result of indirection goes through lvalue-to-rvalue conversion. This has undefined behaviour regardless of what the function does, and would remain undefined in the proposed resolution of CWG-232. Same applies to all of your examples.
One consequence of this is that return this == nullptr; can be optimised to return false; because this can never be null in a well defined program.
Dereferencing a nullptr in C++ is an undefined behaviour, so technically anything can happen when you try to dereference a nullptr (and I mean: anything :)).

Is it UB on derefencing pointer or not? [duplicate]

If I don't actually access the dereferenced "object", is dereferencing the null pointer still undefined?
int* p = 0;
int& r = *p; // undefined?
int* q = &*p; // undefined?
A slightly more practical example: can I dereference the null pointer to distinguish between overloads?
void foo(Bar&);
void foo(Baz&);
foo(*(Bar*)0); // undefined?
Okay, the reference examples are definitely undefined behavior according to the standard:
a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the "object" obtained by dereferencing a null pointer, which causes undefined behavior.
Unfortunately, the emphasized part is ambiguous. Is it the binding part that causes undefined behavior, or is the dereferencing part sufficient?
I think the second opus of What every C programmer should know about Undefined Behavior might help illustrate this issue.
Taking the example of the blog:
void contains_null_check(int *P) {
int dead = *P;
if (P == 0)
return;
*P = 4;
}
Might be optimized to (RNCE: Redundant Null Check Elimintation):
void contains_null_check_after_RNCE(int *P) {
int dead = *P;
if (false) // P was dereferenced by this point, so it can't be null
return;
*P = 4;
}
Which is turn optimized into (DCE: Dead Code Elimination):
void contains_null_check_after_RNCE_and_DCE(int *P) {
//int dead = *P; -- dead store
//if (false) -- unreachable branch
// return;
*P = 4;
}
As you can see, even though dead is never used, the simple int dead = *P assignment has caused Undefined Behavior to creep in the program.
To distinguish between overloads, I'd suggest using a pointer (which might be null) rather than artificially creating a null reference and exposing yourself to Undefined Behavior.
int& r = *p; // undefined?
I think right here you've undefined behavior even if you don't actually use r (or *p)- the dereferenced object. Because after this step (i.e dereferencing the null pointer), the program behaviour is not guaranteed by the language, as the program may crash immediately which is one of the possibilities of UB. You seem to think that only reading the value of r so as to be used in real purpose invokes UB. I don't think so.
Also, the language specification clearly says "the effect of dereferencing the null pointer" invokes undefined behavior. It does not say "the effect of actually using dereferenced object from a null pointer" invokes UB. The effect of dereferencing the null pointer (or in other words undefined behavior) doesn't mean that you will necessarily and immediately get problems, or it must crash immediately after dereferencing the null pointer. No. It simply means, the program behavior is not defined after dereferencing the null pointer. That is, the program may run normally, as expected, from start to end. Or it may crash immediately, or after after some time - after few minutes, hours or days. Anything can happen anytime after dereferencing the null pointer.
Yes it is undefined behavior, because the spec says that an "lvalue designates an object or function" (at clause 3.10) and it says for the *-operator "the result [of dereferencing] is an lvalue referring to the object or function to which the expression points" (at clause 5.3.1).
That means there is no description for what happens when you dereference a null pointer. It's simply undefined behavior.

Are nullptr references undefined behaviour in C++? [duplicate]

This question already has answers here:
Assigning a reference by dereferencing a NULL pointer
(5 answers)
Closed 9 years ago.
The following code fools around with nullptr pointer and reference:
#include <cstdio>
void printRefAddr(int &ref) {
printf("printAddr %p\n", &ref);
}
int main() {
int *ip = nullptr;
int &ir = *ip;
// 1. get address of nullptr reference
printf("ip=%p &ir=%p\n", ip, &ir);
// 2. dereference a nullptr pointer and pass it as reference
printRefAddr(*ip);
// 3. pass nullptr reference
printRefAddr(ir);
return 0;
}
Question: In C++ standards, are commented statements 1..3 valid code or undefined behavior?
Is this same or different with different versions of C++ (older ones would of course use 0 literal instead of nullptr keyword)?
Bonus question: are there known compilers / optimization options, which would actually cause above code to do something unexpected / crash? For example, is there a flag for any compiler, which would generate implicit assertion for nullptr everywhere where reference is initialized, including passing reference argument from *ptr?
An example output for the curious, nothing unexpected:
ip=(nil) &ir=(nil)
printAddr (nil)
printAddr (nil)
// 2. dereference a nullptr pointer and pass it as reference
Dereferencing a null pointer is Undefined Behaviour, and so whether you pass it as a reference or by value, the fact is that you've dereferenced it and therefore invoked UB, meaning from that point on all bets are off.
You've already invoked UB here:
int &ir = *ip; //ip is null, you cannot deref it without invoking UB.
Since ir is just a shadow of *ip it will not cause an undefined behavior on its own.
The undefined behavior is using a pointer which points to nullptr_t. I mean using *ip. Therefore
int &ir = *ip;
^^^
Causes an UB.

Is null reference possible?

Is this piece of code valid (and defined behavior)?
int &nullReference = *(int*)0;
Both g++ and clang++ compile it without any warning, even when using -Wall, -Wextra, -std=c++98, -pedantic, -Weffc++...
Of course the reference is not actually null, since it cannot be accessed (it would mean dereferencing a null pointer), but we could check whether it's null or not by checking its address:
if( & nullReference == 0 ) // null reference
References are not pointers.
8.3.2/1:
A reference shall be initialized to
refer to a valid object or function.
[Note: in particular, a null reference
cannot exist in a well-defined
program, because the only way to
create such a reference would be to
bind it to the “object” obtained by
dereferencing a null pointer, which
causes undefined behavior. As
described in 9.6, a reference cannot
be bound directly to a bit-field. ]
1.9/4:
Certain other operations are described
in this International Standard as
undefined (for example, the effect of
dereferencing the null pointer)
As Johannes says in a deleted answer, there's some doubt whether "dereferencing a null pointer" should be categorically stated to be undefined behavior. But this isn't one of the cases that raise doubts, since a null pointer certainly does not point to a "valid object or function", and there is no desire within the standards committee to introduce null references.
The answer depends on your view point:
If you judge by the C++ standard, you cannot get a null reference because you get undefined behavior first. After that first incidence of undefined behavior, the standard allows anything to happen. So, if you write *(int*)0, you already have undefined behavior as you are, from a language standard point of view, dereferencing a null pointer. The rest of the program is irrelevant, once this expression is executed, you are out of the game.
However, in practice, null references can easily be created from null pointers, and you won't notice until you actually try to access the value behind the null reference. Your example may be a bit too simple, as any good optimizing compiler will see the undefined behavior, and simply optimize away anything that depends on it (the null reference won't even be created, it will be optimized away).
Yet, that optimizing away depends on the compiler to prove the undefined behavior, which may not be possible to do. Consider this simple function inside a file converter.cpp:
int& toReference(int* pointer) {
return *pointer;
}
When the compiler sees this function, it does not know whether the pointer is a null pointer or not. So it just generates code that turns any pointer into the corresponding reference. (Btw: This is a noop since pointers and references are the exact same beast in assembler.) Now, if you have another file user.cpp with the code
#include "converter.h"
void foo() {
int& nullRef = toReference(nullptr);
cout << nullRef; //crash happens here
}
the compiler does not know that toReference() will dereference the passed pointer, and assume that it returns a valid reference, which will happen to be a null reference in practice. The call succeeds, but when you try to use the reference, the program crashes. Hopefully. The standard allows for anything to happen, including the appearance of pink elephants.
You may ask why this is relevant, after all, the undefined behavior was already triggered inside toReference(). The answer is debugging: Null references may propagate and proliferate just as null pointers do. If you are not aware that null references can exist, and learn to avoid creating them, you may spend quite some time trying to figure out why your member function seems to crash when it's just trying to read a plain old int member (answer: the instance in the call of the member was a null reference, so this is a null pointer, and your member is computed to be located as address 8).
So how about checking for null references? You gave the line
if( & nullReference == 0 ) // null reference
in your question. Well, that won't work: According to the standard, you have undefined behavior if you dereference a null pointer, and you cannot create a null reference without dereferencing a null pointer, so null references exist only inside the realm of undefined behavior. Since your compiler may assume that you are not triggering undefined behavior, it can assume that there is no such thing as a null reference (even though it will readily emit code that generates null references!). As such, it sees the if() condition, concludes that it cannot be true, and just throw away the entire if() statement. With the introduction of link time optimizations, it has become plain impossible to check for null references in a robust way.
TL;DR:
Null references are somewhat of a ghastly existence:
Their existence seems impossible (= by the standard),
but they exist (= by the generated machine code),
but you cannot see them if they exist (= your attempts will be optimized away),
but they may kill you unaware anyway (= your program crashes at weird points, or worse).
Your only hope is that they don't exist (= write your program to not create them).
I do hope that will not come to haunt you!
clang++ 3.5 even warns on it:
/tmp/a.C:3:7: warning: reference cannot be bound to dereferenced null pointer in well-defined C++ code; comparison may be assumed to
always evaluate to false [-Wtautological-undefined-compare]
if( & nullReference == 0 ) // null reference
^~~~~~~~~~~~~ ~
1 warning generated.
If your intention was to find a way to represent null in an enumeration of singleton objects, then it's a bad idea to (de)reference null (it C++11, nullptr).
Why not declare static singleton object that represents NULL within the class as follows and add a cast-to-pointer operator that returns nullptr ?
Edit: Corrected several mistypes and added if-statement in main() to test for the cast-to-pointer operator actually working (which I forgot to.. my bad) - March 10 2015 -
// Error.h
class Error {
public:
static Error& NOT_FOUND;
static Error& UNKNOWN;
static Error& NONE; // singleton object that represents null
public:
static vector<shared_ptr<Error>> _instances;
static Error& NewInstance(const string& name, bool isNull = false);
private:
bool _isNull;
Error(const string& name, bool isNull = false) : _name(name), _isNull(isNull) {};
Error() {};
Error(const Error& src) {};
Error& operator=(const Error& src) {};
public:
operator Error*() { return _isNull ? nullptr : this; }
};
// Error.cpp
vector<shared_ptr<Error>> Error::_instances;
Error& Error::NewInstance(const string& name, bool isNull = false)
{
shared_ptr<Error> pNewInst(new Error(name, isNull)).
Error::_instances.push_back(pNewInst);
return *pNewInst.get();
}
Error& Error::NOT_FOUND = Error::NewInstance("NOT_FOUND");
//Error& Error::NOT_FOUND = Error::NewInstance("UNKNOWN"); Edit: fixed
//Error& Error::NOT_FOUND = Error::NewInstance("NONE", true); Edit: fixed
Error& Error::UNKNOWN = Error::NewInstance("UNKNOWN");
Error& Error::NONE = Error::NewInstance("NONE");
// Main.cpp
#include "Error.h"
Error& getError() {
return Error::UNKNOWN;
}
// Edit: To see the overload of "Error*()" in Error.h actually working
Error& getErrorNone() {
return Error::NONE;
}
int main(void) {
if(getError() != Error::NONE) {
return EXIT_FAILURE;
}
// Edit: To see the overload of "Error*()" in Error.h actually working
if(getErrorNone() != nullptr) {
return EXIT_FAILURE;
}
}

Pointers assignment

What is the meaning of
*(int *)0 = 0;
It does compile successfully
It has no meaning. That's an error. It's parsed as this
(((int)0) = 0)
Thus, trying to assign to an rvalue. In this case, the right side is a cast of 0 to int (it's an int already, anyway). The result of a cast to something not a reference is always an rvalue. And you try to assign 0 to that. What Rvalues miss is an object identity. The following would work:
int a;
(int&)a = 0;
Of course, you could equally well write it as the following
int a = 0;
Update: Question was badly formatted. The actual code was this
*(int*)0 = 0
Well, now it is an lvalue. But a fundamental invariant is broken. The Standard says
An lvalue refers to an object or function
The lvalue you assign to is neither an object nor a function. The Standard even explicitly says that dereferencing a null-pointer ((int*)0 creates such a null pointer) is undefined behavior. A program usually will crash on an attempt to write to such a dereferenced "object". "Usually", because the act of dereferencing is already declared undefined by C++.
Also, note that the above is not the same as the below:
int n = 0;
*(int*)n = 0;
While the above writes to something where certainly no object is located, this one will write to something that results from reinterpreting n to a pointer. The mapping to the pointer value is implementation defined, but most compilers will just create a pointer referring to address zero here. Some systems may keep data on that location, so this one may have more chances to stay alive - depending on your system. This one is not undefined behavior necessarily, but depends on the compiler and runtime-environment it is invoked in.
If you understand the difference between the above dereference of a null pointer (only constant expressions valued 0 converted to pointers yield null pointers!) and the below dereference of a reinterpreted zero value integer, i think you have learned something important.
It will usually cause an access violation at runtime. The following is done: first 0 is cast to an int * and that yields a null pointer. Then a value 0 is written to that address (null address) - that causes undefined behaviour, usually an access violation.
Effectively it is this code:
int* address = reinterpret_cast<int*>( 0 );
*address = 0;
Its a compilation error. You cant modify a non-lvalue.
It puts a zero on address zero. On some systems you can do this. Most MMU-based systems will not allow this in run-time. I once saw an embedded OS writing to address 0 when performing time(NULL).
there is no valid lvalue in that operation so it shouldn't compile.
the left hand side of an assignment must be... err... assignable