Is null reference possible? - c++

Is this piece of code valid (and defined behavior)?
int &nullReference = *(int*)0;
Both g++ and clang++ compile it without any warning, even when using -Wall, -Wextra, -std=c++98, -pedantic, -Weffc++...
Of course the reference is not actually null, since it cannot be accessed (it would mean dereferencing a null pointer), but we could check whether it's null or not by checking its address:
if( & nullReference == 0 ) // null reference

References are not pointers.
8.3.2/1:
A reference shall be initialized to
refer to a valid object or function.
[Note: in particular, a null reference
cannot exist in a well-defined
program, because the only way to
create such a reference would be to
bind it to the “object” obtained by
dereferencing a null pointer, which
causes undefined behavior. As
described in 9.6, a reference cannot
be bound directly to a bit-field. ]
1.9/4:
Certain other operations are described
in this International Standard as
undefined (for example, the effect of
dereferencing the null pointer)
As Johannes says in a deleted answer, there's some doubt whether "dereferencing a null pointer" should be categorically stated to be undefined behavior. But this isn't one of the cases that raise doubts, since a null pointer certainly does not point to a "valid object or function", and there is no desire within the standards committee to introduce null references.

The answer depends on your view point:
If you judge by the C++ standard, you cannot get a null reference because you get undefined behavior first. After that first incidence of undefined behavior, the standard allows anything to happen. So, if you write *(int*)0, you already have undefined behavior as you are, from a language standard point of view, dereferencing a null pointer. The rest of the program is irrelevant, once this expression is executed, you are out of the game.
However, in practice, null references can easily be created from null pointers, and you won't notice until you actually try to access the value behind the null reference. Your example may be a bit too simple, as any good optimizing compiler will see the undefined behavior, and simply optimize away anything that depends on it (the null reference won't even be created, it will be optimized away).
Yet, that optimizing away depends on the compiler to prove the undefined behavior, which may not be possible to do. Consider this simple function inside a file converter.cpp:
int& toReference(int* pointer) {
return *pointer;
}
When the compiler sees this function, it does not know whether the pointer is a null pointer or not. So it just generates code that turns any pointer into the corresponding reference. (Btw: This is a noop since pointers and references are the exact same beast in assembler.) Now, if you have another file user.cpp with the code
#include "converter.h"
void foo() {
int& nullRef = toReference(nullptr);
cout << nullRef; //crash happens here
}
the compiler does not know that toReference() will dereference the passed pointer, and assume that it returns a valid reference, which will happen to be a null reference in practice. The call succeeds, but when you try to use the reference, the program crashes. Hopefully. The standard allows for anything to happen, including the appearance of pink elephants.
You may ask why this is relevant, after all, the undefined behavior was already triggered inside toReference(). The answer is debugging: Null references may propagate and proliferate just as null pointers do. If you are not aware that null references can exist, and learn to avoid creating them, you may spend quite some time trying to figure out why your member function seems to crash when it's just trying to read a plain old int member (answer: the instance in the call of the member was a null reference, so this is a null pointer, and your member is computed to be located as address 8).
So how about checking for null references? You gave the line
if( & nullReference == 0 ) // null reference
in your question. Well, that won't work: According to the standard, you have undefined behavior if you dereference a null pointer, and you cannot create a null reference without dereferencing a null pointer, so null references exist only inside the realm of undefined behavior. Since your compiler may assume that you are not triggering undefined behavior, it can assume that there is no such thing as a null reference (even though it will readily emit code that generates null references!). As such, it sees the if() condition, concludes that it cannot be true, and just throw away the entire if() statement. With the introduction of link time optimizations, it has become plain impossible to check for null references in a robust way.
TL;DR:
Null references are somewhat of a ghastly existence:
Their existence seems impossible (= by the standard),
but they exist (= by the generated machine code),
but you cannot see them if they exist (= your attempts will be optimized away),
but they may kill you unaware anyway (= your program crashes at weird points, or worse).
Your only hope is that they don't exist (= write your program to not create them).
I do hope that will not come to haunt you!

clang++ 3.5 even warns on it:
/tmp/a.C:3:7: warning: reference cannot be bound to dereferenced null pointer in well-defined C++ code; comparison may be assumed to
always evaluate to false [-Wtautological-undefined-compare]
if( & nullReference == 0 ) // null reference
^~~~~~~~~~~~~ ~
1 warning generated.

If your intention was to find a way to represent null in an enumeration of singleton objects, then it's a bad idea to (de)reference null (it C++11, nullptr).
Why not declare static singleton object that represents NULL within the class as follows and add a cast-to-pointer operator that returns nullptr ?
Edit: Corrected several mistypes and added if-statement in main() to test for the cast-to-pointer operator actually working (which I forgot to.. my bad) - March 10 2015 -
// Error.h
class Error {
public:
static Error& NOT_FOUND;
static Error& UNKNOWN;
static Error& NONE; // singleton object that represents null
public:
static vector<shared_ptr<Error>> _instances;
static Error& NewInstance(const string& name, bool isNull = false);
private:
bool _isNull;
Error(const string& name, bool isNull = false) : _name(name), _isNull(isNull) {};
Error() {};
Error(const Error& src) {};
Error& operator=(const Error& src) {};
public:
operator Error*() { return _isNull ? nullptr : this; }
};
// Error.cpp
vector<shared_ptr<Error>> Error::_instances;
Error& Error::NewInstance(const string& name, bool isNull = false)
{
shared_ptr<Error> pNewInst(new Error(name, isNull)).
Error::_instances.push_back(pNewInst);
return *pNewInst.get();
}
Error& Error::NOT_FOUND = Error::NewInstance("NOT_FOUND");
//Error& Error::NOT_FOUND = Error::NewInstance("UNKNOWN"); Edit: fixed
//Error& Error::NOT_FOUND = Error::NewInstance("NONE", true); Edit: fixed
Error& Error::UNKNOWN = Error::NewInstance("UNKNOWN");
Error& Error::NONE = Error::NewInstance("NONE");
// Main.cpp
#include "Error.h"
Error& getError() {
return Error::UNKNOWN;
}
// Edit: To see the overload of "Error*()" in Error.h actually working
Error& getErrorNone() {
return Error::NONE;
}
int main(void) {
if(getError() != Error::NONE) {
return EXIT_FAILURE;
}
// Edit: To see the overload of "Error*()" in Error.h actually working
if(getErrorNone() != nullptr) {
return EXIT_FAILURE;
}
}

Related

What are the rules for a valid dereferencing of a null pointer?

#include <iostream>
struct X
{
bool isNull() { return this == nullptr; }
bool isNullConst() const { return this == nullptr; }
};
bool isNull(X& x) { return &x == nullptr; }
bool isNullConst(const X& x) { return &x == nullptr; }
// always false or exception.
bool isNullCopy(X x) { return &x == nullptr; }
int main()
{
X* x = nullptr;
std::cout << x->isNull() << '\n';
std::cout << (*x).isNull() << '\n';
std::cout << isNull(*x) << '\n';
// std::cout << isNull2(*x) << '\n'; // exception.
}
Here, I know that X::isNull() is equivalent to isNull(X&) and that X::isNullConst() is equivalent to isNullConst(const X&).
What I did not know is that it's normal to dereference a null pointer. I thought that any dereferencing for a null pointer would result in an exception.
After playing with pointers for a bit, I concluded that dereferencing a null pointer itself is not the problem, the problem is trying to read or write to the address pointed to by the null pointer.
And since the functions are in a well known location in memory, dereferencing a null pointer to a class and calling one of its functions will just result in calling the function with the null object as the first parameter.
That was new to me, but that's probably not the complete picture.
I thought at first that this was an OOP concept at first, thus it should work in java for example, but it didn't work here and caused an exception (which makes me think why it doesn't work in java?...):
class X
{
boolean isNull() { return this == null; }
}
public class Main {
public static void main(String[] args) {
X x = null;
System.out.println(x.isNull());
}
}
So, clearly this is something related to C++ and not OOP in general.
What are all of the situations under which dereferencing a null pointer will be valid and won't cause exceptions?
Is there something else other than pointers of structs and classes that can be dereferenced successfully even if they're null pointers?
Also, why is calling a function of a null pointer without accessing its fields raises an exception in other languages like java?
One case where dereferencing a null pointer makes sense is in Red-Black trees for example. Null pointers are considered to be black.
#define RED true
#define BLACK false;
struct Node
{
bool color;
bool isRed()
{
return this != nullptr && this->color == RED;
}
};
bool isRed(Node* node)
{
return node != nullptr && node->color == RED;
}
Here, I believe it makes more sense to include the function in the Node class itself since it's related to it. It's not very convenient to include all of the logic related to the node inside it except for the one that checks for it being null.
I thought that any dereferencing for a null pointer would result in an exception.
No. Dereferencing a null pointer is undefinded behavior in C++.
C++ is not Java. C++ does have exceptions, but they are only for exceptional casses, not used all over the place (as in Java). You are supposed to know that dereferencing a null pointer is not allowed, and a compiler assumes that it never happens in correct code. If it still happens your code is invalid.
Read about undefined behavior. It is essential to know about it when you want to do anything serious in C++.
What are the rules for a valid dereferencing of a null pointer?
The rule is: You shall not do it. When you do it your code is ill-formed no diagnostics required. This is a different way to say: Your code has undefined behavior. The compiler is not reuqired to issue an error or warning and when you ask a compiler to compile your wrong code the result can be anything.
In Java your object declarations are references. So you can deliver a null reference to a method and it won't harm since the method can check if the reference points to a null object.
But calling a method onto a null reference won't work because the method is called upon the object behind the reference. Since it is null, the method can't be called onto any object so a NullpointerException is thrown.
What are the rules for a valid dereferencing of a null pointer [in C++]?
C++ standard is actually somewhat non-specific about whether indirecting through a null pointer is valid by itself or not. It is not disallowed explicitly. The standard used to use "dereferencing the null pointer" as an example of undefined behaviour, but this example has since been removed.
There is an active core language issue CWG-232 titled "Is indirection through a null pointer undefined behavior?" where this is discussed. It has a proposed change of wording to explicitly allow indirection through a null pointer, and even to allow "empty" references in the language. The issue was created 20 years ago, has last been updated 15 years ago, when the proposed wording was found insufficient.
Here are a few examples:
X* ptr = nullptr;
*ptr;
Above, the result of the indirection is discarded. This is a case where standard is not explicit about its validity one way or another. The proposed wording would have allowed this explicitly. This is also a fairly pointless operation.
X& x = *ptr;
X* ptr2 = &x; // ptr2 == nullptr?
Above, the result of indirection through null is bound to an lvalue. This is explicitly undefined behaviour now, but the proposed wording would have allowed this.
ptr->member_function();
Above, the result of indirection goes through lvalue-to-rvalue conversion. This has undefined behaviour regardless of what the function does, and would remain undefined in the proposed resolution of CWG-232. Same applies to all of your examples.
One consequence of this is that return this == nullptr; can be optimised to return false; because this can never be null in a well defined program.
Dereferencing a nullptr in C++ is an undefined behaviour, so technically anything can happen when you try to dereference a nullptr (and I mean: anything :)).

Make argument a reference and not a pointer, if null is not a valid value

It is, as far as I have known, been a good rule that a pointer like argument type to a function should be a pointer if the argument can sensible be null and it should be a reference if the argument should never be null.
Based on that "rule", I have naiively expected that doing something like
someMethodTakingAnIntReference(*aNullPointer) would fail when trying to make the call, but to my surprise the following code is running just fine which kinda makes "the rule" less usable. A developer can still read meaning from the argument type being reference, but the compiler doesn't help and the location of the runtime error does not either.
Am I misunderstanding the point of this rule, or is this undefined behavior, or...?
int test(int& something1, int& something2)
{
return something2;
}
int main()
{
int* i1 = nullptr;
int* i2 = new int{ 7 };
//this compiles and runs fine returning 7.
//I expected the *i1 to cause an error here where test is called
return test(*i1, *i2);
}
While the above works, obviously the following does not, but the same would be true if the references were just pointers; meaning that the rule and the compiler is not really helping.
int test(int& something1, int& something2)
{
return something1+something2;
}
int main()
{
int* i1 = nullptr;
int* i2 = new int{ 7 };
//this compiles and runs returning 7.
//I expected the *i1 to cause an error here where test is called
return test(*i1, *i2);
}
Writing test(*i1, *i2) causes undefined behaviour; specifically the part *i1. This is covered in the C++ Standard by [expr.unary.op]/1:
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.
This defines the behaviour of *X only for the case where X points to an object or function. Since i1 does not point to an object or function, the standard does not define the behaviour of *i1, therefore it is undefined behaviour. (This is sometimes known as "undefined by omission", and this same practice handles many other uses of lvalues that don't designate objects).
As described in the linked page, undefined behaviour does not necessitate any sort of diagnostic message. The runtime behaviour could literally be anything. The compiler could, but is not required to, generate a compilation warning or error. In general, it's up to the programmer to comply with the rules of the language. The compiler helps out to some extent but it cannot cover all cases.
You're better off thinking of references as little more than a handy notation for pointers.
They are still pointers, and the runtime error occurs when you use (dereference) a null pointer, not when you pass it to a function.
(An added advantage of references is that they can not be changed to reference something else, once initialized.)

Method in null class pointer (c++)

Let's say whe have
class Foo{
public:
bool error;
......
bool isValid(){return error==false;}
};
and somewhere
Foo *aFoo=NULL;
I usually would do if (aFoo!=NULL && aFoo->isValid()).....
But what if in the isValid method I test the nullity:
bool isValid(){return this!=NULL && error==false)
That would simplify the external testing with simply calling if (aFoo->isValid())
I've tested it in some compilers and it works but I wonder if it is standard and could cause problems when porting to other environments.
The compiler is free to optimize away the check -- calling any non-static member of any class through an invalid (or NULL pointer) is undefined behavior. Please don't do this.
Why not simply a namespace-scope function like this?
bool isValid(Foo* f) {return f && f->isValid();}
An if-Statement like
if (aFoo->isValid())
Implies that the pointer is pointing to a valid object. It would be a huge source of confusion and very error prone.
Finally, your code would indeed invoke undefined behavior - aFoo->isValid is per definition equivalent to (*aFoo).isValid:
N3337, §5.2.5/2
The expression E1->E2 is converted to the equivalent form
(*(E1)).E2;
which would dereference a null pointer to obtain a null reference, which is clearly undefined:
N3337, §8.3.2/5
[ Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the
“object” obtained by dereferencing a null pointer, which causes
undefined behavior. […] — end note ]
Generally it would be bad design and in standard C++ it doesn't make much sense as your internal NULL check implies that you would call a null pointer, which is undefined behavior.
This topic was discusses here:
Checking if this is null

How do I test whether a reference is NULL?

I want to check to see whether something is null, e.g.:
string xxx(const NotMyClass& obj) {
if (obj == NULL) {
//...
}
}
But the compiler complains about this: there are 5 possible overloads of ==.
So I tried this:
if (obj == static_cast<NotMyClass>(NULL)) {
This crashes because NotMyClass's == overload doesn't handle nulls.
edit: for everyone tell me it can't be NULL, I'm certainly getting something NULL like in my debugger:
In a well-formed C++ program, references are never NULL (more accurately, the address of an object to which you have a reference may never be NULL).
So not only is the answer "no, there's no way", a corollary is "this makes no sense".
Your statement regarding C makes no sense either, since C does not have references.
And as for Java, its "references" are more like C++ pointers in many ways, including this one.
Comparing such specific behaviours between different languages is something of a fool's errand.
If you need this "optional object" behaviour, then you're looking for pointers:
std::string xxx(const NotMyClass* ptr) {
if (ptr == NULL)
throw SomeException();
const NotMyClass& ref = *ptr;
/* ... */
}
But consider whether you really need this; a decent alternative might be boost::optional if you really do.
What you're asking makes no sense. References in C++ can never be "null", since they can only ever be created by aliasing an existing object, and they cannot be rebound. Once a reference to x, always a reference to x.
(A reference may become "dangling" if the original object's lifetime ends before that of the reference, but that's a programming error and not a checkable runtime condition.)
You don't need to test this, as references in C++ can't be NULL. Pointers can be NULL, but you're not using them here.
As others said, well-defined code never has NULL references, so it's not your responsibility to test for them.
That doesn't strictly mean they aren't ever created in practice though (but hopefully in intermediate, rather than production code). It's possible in some compilers, though definitely not standard C++, to get a reference whose address is NULL:
int * p = NULL;
int & x = *p;
Often won't crash (yet), although by the C++ standard, it's nondeterministic behavior after the second line. This is a side-effect of references typically being implemented with pointers "behind the scenes." It will probably crash later down the line when someone uses x.
If you're trying to debug such a situation, you can test if the address of x is not NULL:
#include <cassert>
// ...
assert(&x != NULL);
As people have said references in C++ should never be null (NULL or nullptr), however it is still possible to get null references, especially if you do some evil casting. (A long time ago I did such a thing when I didn't know any better.)
To test if a reference is null (NULL or nullptr) convert it to a pointer and then test. So:
if (&obj == nullptr)
is what you are effectively looking for.
But now since you know how to do it, don't. Just assume that references can never be null and let the application crash if they are, because by then something else must have gone horribly wrong and the program should be terminated.

Is using NULL references OK?

I came across this code:
void f(const std::string &s);
And then a call:
f( *((std::string*)NULL) );
And I was wondering what others think of this construction, it is used to signal that function f() should use some default value (which it computes) instead of some user provided value.
I am not sure what to think of it, it looks weird but what do you think of this construction?
No. It is undefined behaviour and can lead to code to do anything (including reformatting you hard disk, core dumping or insulting your mother).
If you need to be able to pass NULL, then use pointers. Code that takes a reference can assume it refers to a valid object.
Addendum: The C++03 Standard (ISO/IEC 14882, 2nd edition 2003) says, in §8.3.2 "References", paragraph 4:
A reference shall be initialized to refer to a valid object
or function. [Note: in particular, a null reference cannot exist in a well-defined program, because the only
way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer,
which causes undefined behavior. As described in 9.6, a reference cannot be bound directly to a bit-field. ]
[Bold added for emphasis]
You will sometimes see constructions like this in fairly esoteric template library code, but only inside a sizeof() where it is harmless.
Supposing you wanted to know the size of the return type of a function-like type F if it was passed a reference to a type T as an argument (both of those being template parameters). You could write:
sizeof(F(T()))
But what if T happens to have no public default constructor? So you do this instead:
sizeof(F(*((T *)0)))
The expression passed to sizeof never executes - it just gets analyzed to the point where the compiler knows the size of the result.
I'm curious - does function 'f' actually check for this condition? Because if it doesn't, and it tries to use the string, then this is clearly going to crash when you try to use it.
And if 'f' does check the reference for NULL, then why isn't it just using a pointer? Is there some hard and fast rule that you won't use pointers and some knucklehead obeyed the letter of the law without thinking about what it meant?
I'd just like to know...
Is using NULL references OK?
No, unless you do not like your boss and your job ;)
This is something VERY bad. One of most important point of reference that it
can't be NULL (unless you force it)
for the case you can make "empty object", which will play the role of the zero pointer
class Foo
{
static Foo empty;
public:
static bool isEmpty( const Foo& ref )
{
return &ref==∅
}
}
As others already said: A reference has to be valid. That's why it's a reference instead of a pointer.
If you want to make f() have a default behavior, you might want to use this:
static const std::string default_for_f;
void f(const std::string &s = default_for_f)
{
if (&s == &default_for_f)
{
// make default processing
}
else
...
}
...
void bar()
{
f(); // call with default behavior
f(default_for_f); // call with default behavior
f(std::string()); // call with other behavior
}
You can spare the default parameter for f(). (Some people hate default parameters.)
f( *((std::string*)NULL) );
This is essentially dereferencing NULL, which on most systems is #defined to be 0. Last I checked 0x00000000 is an invalid memory address for doing anything.
Whatever happened to just checking
if (std::string.length() > 0) ....