According to ISO C++, dereferencing a null pointer is undefined behaviour. My curiosity is, why? Why standard has decided to declare it undefined behaviour? What is the rationale behind this decision? Compiler dependency? Doesn't seem, because according to C99 standard, as far as I know, it is well defined. Machine dependency? Any ideas?
Defining consistent behavior for dereferencing a NULL pointer would require the compiler to check for NULL pointers before each dereference on most CPU architectures. This is an unacceptable burden for a language that is designed for speed.
It also only fixes a small part of a larger problem - there are many ways to have an invalid pointer beyond a NULL pointer.
The primary reason is that by the time they wrote the original C standard there were a number of implementations that allowed it, but gave conflicting results.
On the PDP-11, it happened that address 0 always contained the value 0, so dereferencing a null pointer also gave the value 0. Quite a few people who used these machines felt that since they were the original machine C had been written on/used to program, that this should be considered canonical behavior for C on all machines (even though it originally happened quite accidentally).
On some other machines (Interdata comes to mind, though my memory could easily be wrong) address 0 was put to normal use, so it could contain other values. There was also some hardware on which address 0 was actually some memory-mapped hardware, so reading/writing it did special things -- not at all equivalent to reading/writing normal memory at all.
The camps wouldn't agree on what should happen, so they made it undefined behavior.
Edit: I suppose I should add that by the time the wrote the C++ standard, its being undefined behavior was already well established in C, and (apparently) nobody thought there was a good reason to create a conflict on this point so they kept the same.
The only way to give defined behaviour would be to add a runtime check to every pointer dereference, and every pointer arithmetic operation. In some situations, this overhead would be unacceptable, and would make C++ unsuitable for the high-performance applications it's often used for.
C++ allows you to create your own smart pointer types (or use ones supplied by libraries), which can include such a check in cases where safety is more important than performance.
Dereferencing a null pointer is also undefined in C, according to clause 6.5.3.2/4 of the C99 standard.
This answer from #Johannes Schaub - litb, puts forward an interesting rationale, which seems pretty convincing.
The formal problem with merely dereferencing a null pointer is that determining the identity of the resulting lvalue expression is not possible: Each such expression that results from dereferencing a pointer must unambiguously refer to an object or a function when that expression is evaluated. If you dereference a null pointer, you don't have an object or function that this lvalue identifies. This is the argument the Standard uses to forbid null-references.
Another problem that adds to the confusion is that the semantics of the typeid operator make part of this misery well defined. It says that if it was given an lvalue that resulted from dereferencing a null pointer, the result is throwing a bad_typeid exception. Although, this is a limited area where there exist an exception (no pun) to the above problem of finding an identity. Other cases exist where similar exception to undefined behavior is made (although much less subtle and with a reference on the affected sections).
The committee discussed to solve this problem globally, by defining a kind of lvalue that does not have an object or function identity: The so called empty lvalue. That concept, however, still had problems, and they decided not to adopt it.
Note:
Marking this as community wiki, since the answer & the credit should go to the original poster. I am just pasting the relevant parts of the original answer here.
The real question is, what behavior would you expect ?
A null pointer is, by definition, a singular value that represents the absence of an object. The result of dereferencing a pointer is to obtain a reference to the object pointed to.
So how do you get a good reference... from a pointer that points into the void ?
You do not. Thus the undefined behavior.
I suspect it's because if the behavior is well-defined the compiler has to insert code anywhere pointers are dereferenced. If it's implementation defined then one possible behavior could still be a hard crash. If it's unspecified then either the compilers for some systems have extra undue burden or they may generate code that causes hard crashes.
Thus to avoid any possible extra burden on compilers they left the behavior undefined.
Sometimes you need an invalid pointer (also see MmBadPointer on Windows), to represent "nothing".
If everything was valid, then that wouldn't be possible. So they made NULL invalid, and disallowed you from dereferencing it.
Here is a simple test & example:
Allocate a pointer:
int * pointer;
? What value is in the pointer when it is created?
? What is the pointer pointing to?
? What happens when I dereference this point in its current state?
Marking the end of a linked list.
In a linked list, a node points to another node, except for the last.
What is the value of the pointer in the last node?
What happens when you derefernce the "next" field of the last node?
The needs to be a value that indicates a pointer is not pointing to anything or that it's in an invalid state. This is where the NULL pointer concept comes into play. The linked list can use a NULL pointer to indicate the end of the list.
Arguments have been made elsewhere that having well-defined behaviour for null-pointer-references is impossible without a lot of overhead, which I think is true. This is because AFAIU "well-defined" here also means "portable". If you would not treat nullptr references specially, you would end up generating instructions that simply try to read address 0, but that produces different behaviour on different processors, so that would not be well-defined.
So, I guess this is why derereferencing nullptr (and probably also other invalid pointers) is marked as undefined.
I do wonder why this is undefined rather then unspecified or implementation-defined, which are distict from undefined behaviour, but require more consistency.
In particular, when a program triggers undefined behaviour, the compiler can do pretty much anything (e.g. throw away your entire program maybe?) and still be considered correct, which is somewhat problematic. In practice, you would expect that compilers would just compile a null-pointer-dereference to a read of address zero, but with modern optimizers becoming better, but also more sensitive to undefined behaviour, I think, they sometimes do things that end up more thoroughly breaking the program. E.g. consider the following:
matthijs#grubby:~$ cat test.c
unsigned foo () {
unsigned *foo = 0;
return *foo;
}
matthijs#grubby:~$ arm-none-eabi-gcc -c test.c -Os && objdump -d test.o
test.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <foo>:
0: e3a03000 mov r3, #0
4: e5933000 ldr r3, [r3]
8: e7f000f0 udf #0
This program just dereferences and accesses a null pointer, which results in an "Undefined instruction" being generated (halting the program at runtime).
This might be ok when this is an accidental nullpointer dereference, but in this case I was actually writing a bootloader that needs to read address 0 (which contains the reset vector), so I was quite surprised this happened.
So, not so much an answer, but some extra perspective on the matter.
According to original C standard NULL can be any value - not necessarily zero.
The language definition states that for each pointer type, there is a special value - the `null pointer' - which is distinguishable from all other pointer values and which is 'guaranteed to compare unequal to a pointer to any object or function.' That is, a null pointer points definitively nowhere; it is not the address of any object or function
There is a null pointer for each pointer type, and the internal values of null pointers for different types may be different.
(From http://c-faq.com/null/null1.html)
Although dereferencing a NULL pointer in C/C++ indeed leads undefined behavior from the language standpoint, such operation is well defined in compilers for targets which have memory at corresponding address. In this case, the result of such operation consists in simply reading the memory at address 0.
Also, many compilers will allow you to dereference a NULL pointer as long as you don't bind the referenced value. This is done to provide compatibility to non-conforming yet widespread code, like
#define offsetof(st, m) ((size_t)(&((st *)0)->m))
There was even a discussion to make this behaviour part of the standard.
Because you cannot create a null reference. C++ doesn't allow it. Therefore you cannot dereference a null pointer.
Mainly it is undefined because there is no logical way to handle it.
You can actually dereference a null pointer. Someone did it here: http://www.codeproject.com/KB/system/soviet_kernel_hack.aspx
Related
According to ISO C++, dereferencing a null pointer is undefined behaviour. My curiosity is, why? Why standard has decided to declare it undefined behaviour? What is the rationale behind this decision? Compiler dependency? Doesn't seem, because according to C99 standard, as far as I know, it is well defined. Machine dependency? Any ideas?
Defining consistent behavior for dereferencing a NULL pointer would require the compiler to check for NULL pointers before each dereference on most CPU architectures. This is an unacceptable burden for a language that is designed for speed.
It also only fixes a small part of a larger problem - there are many ways to have an invalid pointer beyond a NULL pointer.
The primary reason is that by the time they wrote the original C standard there were a number of implementations that allowed it, but gave conflicting results.
On the PDP-11, it happened that address 0 always contained the value 0, so dereferencing a null pointer also gave the value 0. Quite a few people who used these machines felt that since they were the original machine C had been written on/used to program, that this should be considered canonical behavior for C on all machines (even though it originally happened quite accidentally).
On some other machines (Interdata comes to mind, though my memory could easily be wrong) address 0 was put to normal use, so it could contain other values. There was also some hardware on which address 0 was actually some memory-mapped hardware, so reading/writing it did special things -- not at all equivalent to reading/writing normal memory at all.
The camps wouldn't agree on what should happen, so they made it undefined behavior.
Edit: I suppose I should add that by the time the wrote the C++ standard, its being undefined behavior was already well established in C, and (apparently) nobody thought there was a good reason to create a conflict on this point so they kept the same.
The only way to give defined behaviour would be to add a runtime check to every pointer dereference, and every pointer arithmetic operation. In some situations, this overhead would be unacceptable, and would make C++ unsuitable for the high-performance applications it's often used for.
C++ allows you to create your own smart pointer types (or use ones supplied by libraries), which can include such a check in cases where safety is more important than performance.
Dereferencing a null pointer is also undefined in C, according to clause 6.5.3.2/4 of the C99 standard.
This answer from #Johannes Schaub - litb, puts forward an interesting rationale, which seems pretty convincing.
The formal problem with merely dereferencing a null pointer is that determining the identity of the resulting lvalue expression is not possible: Each such expression that results from dereferencing a pointer must unambiguously refer to an object or a function when that expression is evaluated. If you dereference a null pointer, you don't have an object or function that this lvalue identifies. This is the argument the Standard uses to forbid null-references.
Another problem that adds to the confusion is that the semantics of the typeid operator make part of this misery well defined. It says that if it was given an lvalue that resulted from dereferencing a null pointer, the result is throwing a bad_typeid exception. Although, this is a limited area where there exist an exception (no pun) to the above problem of finding an identity. Other cases exist where similar exception to undefined behavior is made (although much less subtle and with a reference on the affected sections).
The committee discussed to solve this problem globally, by defining a kind of lvalue that does not have an object or function identity: The so called empty lvalue. That concept, however, still had problems, and they decided not to adopt it.
Note:
Marking this as community wiki, since the answer & the credit should go to the original poster. I am just pasting the relevant parts of the original answer here.
The real question is, what behavior would you expect ?
A null pointer is, by definition, a singular value that represents the absence of an object. The result of dereferencing a pointer is to obtain a reference to the object pointed to.
So how do you get a good reference... from a pointer that points into the void ?
You do not. Thus the undefined behavior.
I suspect it's because if the behavior is well-defined the compiler has to insert code anywhere pointers are dereferenced. If it's implementation defined then one possible behavior could still be a hard crash. If it's unspecified then either the compilers for some systems have extra undue burden or they may generate code that causes hard crashes.
Thus to avoid any possible extra burden on compilers they left the behavior undefined.
Sometimes you need an invalid pointer (also see MmBadPointer on Windows), to represent "nothing".
If everything was valid, then that wouldn't be possible. So they made NULL invalid, and disallowed you from dereferencing it.
Here is a simple test & example:
Allocate a pointer:
int * pointer;
? What value is in the pointer when it is created?
? What is the pointer pointing to?
? What happens when I dereference this point in its current state?
Marking the end of a linked list.
In a linked list, a node points to another node, except for the last.
What is the value of the pointer in the last node?
What happens when you derefernce the "next" field of the last node?
The needs to be a value that indicates a pointer is not pointing to anything or that it's in an invalid state. This is where the NULL pointer concept comes into play. The linked list can use a NULL pointer to indicate the end of the list.
Arguments have been made elsewhere that having well-defined behaviour for null-pointer-references is impossible without a lot of overhead, which I think is true. This is because AFAIU "well-defined" here also means "portable". If you would not treat nullptr references specially, you would end up generating instructions that simply try to read address 0, but that produces different behaviour on different processors, so that would not be well-defined.
So, I guess this is why derereferencing nullptr (and probably also other invalid pointers) is marked as undefined.
I do wonder why this is undefined rather then unspecified or implementation-defined, which are distict from undefined behaviour, but require more consistency.
In particular, when a program triggers undefined behaviour, the compiler can do pretty much anything (e.g. throw away your entire program maybe?) and still be considered correct, which is somewhat problematic. In practice, you would expect that compilers would just compile a null-pointer-dereference to a read of address zero, but with modern optimizers becoming better, but also more sensitive to undefined behaviour, I think, they sometimes do things that end up more thoroughly breaking the program. E.g. consider the following:
matthijs#grubby:~$ cat test.c
unsigned foo () {
unsigned *foo = 0;
return *foo;
}
matthijs#grubby:~$ arm-none-eabi-gcc -c test.c -Os && objdump -d test.o
test.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <foo>:
0: e3a03000 mov r3, #0
4: e5933000 ldr r3, [r3]
8: e7f000f0 udf #0
This program just dereferences and accesses a null pointer, which results in an "Undefined instruction" being generated (halting the program at runtime).
This might be ok when this is an accidental nullpointer dereference, but in this case I was actually writing a bootloader that needs to read address 0 (which contains the reset vector), so I was quite surprised this happened.
So, not so much an answer, but some extra perspective on the matter.
According to original C standard NULL can be any value - not necessarily zero.
The language definition states that for each pointer type, there is a special value - the `null pointer' - which is distinguishable from all other pointer values and which is 'guaranteed to compare unequal to a pointer to any object or function.' That is, a null pointer points definitively nowhere; it is not the address of any object or function
There is a null pointer for each pointer type, and the internal values of null pointers for different types may be different.
(From http://c-faq.com/null/null1.html)
Although dereferencing a NULL pointer in C/C++ indeed leads undefined behavior from the language standpoint, such operation is well defined in compilers for targets which have memory at corresponding address. In this case, the result of such operation consists in simply reading the memory at address 0.
Also, many compilers will allow you to dereference a NULL pointer as long as you don't bind the referenced value. This is done to provide compatibility to non-conforming yet widespread code, like
#define offsetof(st, m) ((size_t)(&((st *)0)->m))
There was even a discussion to make this behaviour part of the standard.
Because you cannot create a null reference. C++ doesn't allow it. Therefore you cannot dereference a null pointer.
Mainly it is undefined because there is no logical way to handle it.
You can actually dereference a null pointer. Someone did it here: http://www.codeproject.com/KB/system/soviet_kernel_hack.aspx
In my early days with C++, I seem to recall you could call a member function with a NULL pointer, and check for that in the member function:
class Thing {public: void x();}
void Thing::x()
{ if (this == NULL) return; //nothing to do
...do stuff...
}
Thing* p = NULL; //nullptr these days, of course
p->x(); //no crash
Doing this may seem silly, but it was absolutely wonderful when writing recursive functions to traverse data structures, where navigating could easily run into the blind alley of a NULL; navigation functions could do a single check for NULL at the top and then blithely call themselves to try to navigate deeper without littering the code with additional checks.
According to g++ at least, the freedom (if it ever existed) has been revoked. The compiler warns about it, and if compiling optimized, it causes crashes.
Question 1: does the C++ standard (any flavor) disallow a NULL this? Or is g++ just getting in my face?
Question 2. More philosophically, why? 'this' is just another pointer. The glory of pointers is that they can be nullptr, and that's a useful condition.
I know I can get around this by making static functions, passing as first parameter a pointer to the data structure (hellllo Days of C) and then check the pointer. I'm just surprised I'd need to.
Edit: To upvote an answer I'd like to see chapter and verse from the standard on why this is disallowed. Note that my example at NO POINT dereferences NULL. Nothing is virtual here, and p is copied to "argument this" but then checked before use. No defererence occurs! so dereference of NULL can't be used as a claim of UB.
People are making a knee-jerk reaction to *p and assuming it isn't valid if p is NULL. But it is, and the evidence is here:
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232
In fact it calls out two cases when a pointer, p, is surprisingly valid as *p: when p is null or when p points one element past the end of an array. What you must never do is USE the value of *p... other than to take the address of it. &*p where p == nullptr for any pointer type p IS valid. It's fine to point out that p->x() is really (*p).x(), but at the end of the day that translates to x(&*p) and that is perfectly well formed and valid. For p=nullptr... it simply becomes x(nullptr).
I think my debate should be with the standards community; in their haste to undercut the concept of a null reference, they left wording unclear. Since no one here has demanded p->x() is UB without trying to demand that it's UB because *p is UB; and because *p is definitely not UB because no aspect of x() uses the referenced value, I'm going to put this down to g++ overreaching on a standard ambiguity. The absolutely identical mechanism using a static function and extra parameter is well defined, so it's not like it stops my refactor effort. Consider the question withdrawn; portable code can't assume this==nullptr will work but there's a portable solution available, so in the end it doesn't matter.
To be in a situation where this is nullptr implies you called a non-static member function without using a valid instance such as with a pointer set to nullptr. Since this is forbidden, to obtain a null this you must already be in undefined behavior. In other words, this is never nullptr unless you have undefined behavior. Due to the nature of undefined behavior, you can simplify the statement to simply be "this is never nullptr" since no rule needs to be upheld in the presence of undefined behavior.
Question 1: does the C++ standard (any flavor) disallow a NULL this?
Or is g++ just getting in my face?
The C++ standard disallows it -- calling a method on a NULL pointer is officially 'undefined behavior' and you must avoid doing it or you will get bit. In particular, optimizers will assume that the this-pointer is non-NULL when making optimizations, leading to strange/unexpected behaviors at runtime (I know this from experience :))
Question 2. More philosophically, why? 'this' is just another pointer.
The glory of pointers is that they can be nullptr, and that's a useful
condition.
I'm not sure it matters, really; it's what is specified in the C++ standard, and they probably had their reasons (philosophical or otherwise), but since the standard specifies it, the compilers expect it, therefore as programmers we have to abide by it, or face undefined behavior. (One can imagine an alternate universe where NULL this-pointers are allowed, but we don't live there)
The question has already been answered - it is undefined behavior to dereference a null pointer, and using *obj or obj-> are both dereferencing.
Now (since I assume you have a question on how to work around this) the solution is to use static function:
class Foo {
static auto bar_st(Foo* foo) { if (foo) return foo->bar(); }
}
Having said that, I do think that gcc's decision of eliminating all branches for nullptr this was not a wise one. Nobody gained by that, and a lot of people suffered. What's the benefit?
C++ does not allow calling member functions of null object. Objects need identity and that can not be stored to null pointer. What would happen if member function would read or write a field of a object referenced by null pointer?
It sounds like you could use null object pattern in your code to create wanted result.
Null pointer is recognised a problematic entity in object oriented languages because in most languages it is not a object. This creates a need for code that specifically handles the case something being null. While checking for special null pointer is the norm. There are other approaches. Smalltalk actually has a NullObject which has methods its own methods. As all objects it can also be extended. Go programming language does allow calling struct member functions for something that is nil (which sounds like something required in the question).
this might be null too if you delete this (which is possible but not recommended)
I am reading this interesting article. http://www.codeproject.com/Articles/746630/O-Object-Pool-in-Cplusplus
I can't understand this line _firstDeleted = *((T **)_firstDeleted);
_firstDeleted already has type T*. Can anyone explain the purpose of that statement?
When an object is destroyed, its first sizeof(T*) bytes are overwritten with the address of the next free object.
(That is, *T is actually no longer storing a T but a T*, if you see what I mean. The cast performs this reinterpretation. It is formally rather undefined.)
This has the effect of the deleted objects forming a linked list of available memory blocks.
Reusing the object memory for this list means that you don't need a separate list of free blocks.
Code like *((T **)content) = _firstDeleted; is known as "deferencing a type-punned pointer", aka breaking the strict aliasing rules, aka undefined behavior. What the author is trying to do is have _firstDeleted point to the first "free" object in the map of uninitialized memory, with all the undefined behavior goodness that comes with it.
Modern C++ (even though the article was published in 2014) would probably use safer facilities for uninitialized storage, like the C++ standard library.
The following code (or its equivalent which uses explicit casts of null literal to get rid of temporary variable) is often used to calculate the offset of a specific member variable within a class or struct:
class Class {
public:
int first;
int second;
};
Class* ptr = 0;
size_t offset = reinterpret_cast<char*>(&ptr->second) -
reinterpret_cast<char*>(ptr);
&ptr->second looks like it is equivalent to the following:
&(ptr->second)
which in turn is equivalent to
&((*ptr).second)
which dereferences an object instance pointer and yields undefined behavior for null pointers.
So is the original fine or does it yield UB?
Despite the fact that it does nothing, char* foo = 0; *foo; is could be undefined behavior.
Dereferencing a null pointer is could be undefined behavior. And yes , ptr->foo is equivalent to (*ptr).foo, and *ptr dereferences a null pointer.
There is currently an open issue in the working groups about if *(char*)0 is undefined behavior if you don't read or write to it. Parts of the standard imply it is, other parts imply it is not. The current notes there seem to lean towards making it defined.
Now, this is in theory. How about in practice?
Under most compilers, this works because no checks are done at dereferencing time: memory around where null pointer point to is guarded against access, and the above expression simply takes an address of something around null, it does not read or write the value there.
This is why cpp reference offsetof lists pretty much that trick as a possible implementation. The fact that some (many? most? every one I've checked?) compilers implement offsetof in a similar or equivalent manner does not mean that the behavior is well defined under the C++ standard.
However, given the ambiguity, compilers are free to add checks at every instruction that dereferences a pointer, and execute arbitrary code (fail fast error reporting, for example) if null is indeed dereferenced. Such instrumentation might even be useful to find bugs where they occur, instead of where the symptom occurs. And on systems where there is writable memory near 0 such instrumentation could be key (pre-OSX MacOS had some writable memory that controlled system functions near 0).
Such compilers could still write offsetof that way, and introduce pragmas or the like to block the instrumentation in the generated code. Or they could switch to an intrinsic.
Going a step further, C++ leaves lots of latitude on how non-standard-layout data is arranged. In theory, classes could be implemented as rather complex data structures and not the nearly standard-layout structures we have grown to expect, and the code would still be valid C++. Accessing member variables to non-standard-layout types and taking their address could be problematic: I do not know if there is any guarantee that the offset of a member variable in a non-standard layout type does not change between instances!
Finally, some compilers have aggressive optimization settings that find code that executes undefined behavior (at least under certain branches or conditions), and uses that to mark that branch as unreachable. If it is decided that null dereference is undefined behavior, this could be a problem. A classic example is gcc's aggressive signed integer overflow branch eliminator. If the standard dictates something is undefined behavior, the compiler is free to consider that branch unreachable. If the null dereference is not behind a branch in a function, the compiler is free to declare all code that calls that function to be unreachable, and recurse.
And it would be free to do this in not the current, but the next version of your compiler.
Writing code that is standards-valid is not just about writing code that compiles today cleanly. While the degree to which dereferencing and not using a null pointer is defined is currently ambiguous, relying on something that is only ambiguously defined is risky.
My C++ knowledge is somewhat piecemeal. I was reworking some code at work. I changed a function to return a reference to a type. Inside, I look up an object based on an identifier passed in, then return a reference to the object if found. Of course I ran into the issue of what to return if I don't find the object, and in looking around the web, many people claim that returning a "null reference" in C++ is impossible. Based on this advice, I tried the trick of returning a success/fail boolean, and making the object reference an out parameter. However, I ran into the roadblock of needing to initialize the references I would pass as actual parameters, and of course there is no way to do this. I retreated to the usual approach of just returning a pointer.
I asked a colleague about it. He uses the following trick quite often, which is accepted by both a recent version of the Sun compiler and by gcc:
MyType& someFunc(int id)
{
// successful case here:
// ...
// fail case:
return *static_cast<MyType*>(0);
}
// Use:
...
MyType& mt = somefunc(myIdNum);
if (&mt) // test for "null reference"
{
// whatever
}
...
I have been maintaining this code base for a while, but I find that I don't have as much time to look up the small details about the language as I would like. I've been digging through my reference book but the answer to this one eludes me.
Now, I had a C++ course a few years ago, and therein we emphasized that in C++ everything is types, so I try to keep that in mind when thinking things through. Deconstructing the expression: "static_cast<MyType>(0);", it indeed seems to me that we take a literal zero, cast it to a pointer to MyType (which makes it a null pointer), and then apply the dereferencing operator in the context of assigning to a reference type (the return type), which should give me a reference to the same object pointed to by the pointer. This sure looks like returning a null reference to me.
Any advice in explaining why this works (or why it shouldn't) would be greatly appreciated.
Thanks,
Chuck
This code doesn't work, though it may appear to work. This line dereferences a null pointer:
return *static_cast<MyType*>(0);
The zero, cast to a pointer type, results in a null pointer; this null pointer is then dereferenced using the unary-*.
Dereferencing a null pointer results in undefined behavior, so your program may do anything. In the example you describe, you get a "null reference" (or, it appears you get a null reference), but it would also be reasonable for your program to crash or for anything else to happen.
I agree with other posters that the behaviour of your example is undefined and really shouldn't be used. I offer some alternatives here. Each of them has pros and cons
If the object can't be found, throw an exception which is caught in the calling layer.
Create a globally accessible instance of MyType which is a simple shell object (i.e. static const MyType BAD_MYTYPE) and can be used to represent a bad object.
If it's likely that the object will not be found often then maybe pass the object in by reference as a parameter and return a bool or other error code indicating success / failure. If it can't find the object, you just don't assign it in the function.
Use pointers instead and check for 0 on return.
Use Boost smart pointers which allow for the validity of the returned object to be checked.
My personal preference would be for one of the first three.
That is undefined behavior. Because it is undefined behavior, it may "work" on your current compiler, but it could break if you ever upgrade/change your compiler.
From the C++03 spec:
8.3.2/4 ... A reference shall be initialized to refer to a valid object or function. [Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior.
If you are married to that return-a-reference interface, then the Right Thing®
would be to throw an exception if you can't find an object for the given ID. At least that way your poor user can trap the condition with a catch.
If you go dereferencing a null pointer on them, they have no defined way to handle the error.
The line return *static_cast<MyType*>(0); is dereferencing the null pointer which causes undefined behaviour.
Well, you said it yourself in the title of your question. The code appears to work with a null reference.
When people say null references do not exist in C++, it does not mean that the compiler will generate an error message if you try to create one. As you've found out, there's nothing to stop you from creating a reference from a null pointer.
It simply means that the C++ standard does not specify what should happen if you do it. Null references are not possible because the C++ standard says that references must not be created from null pointers. (but doesn't say anything about generating a compile error if you try to do it.)
It is undefined behavior. In practice, because references are typically implemented as pointers, it usually seems to work if you try to create a reference from a null pointer. But there's no guarantee that it'll work. Or that it'll keep working tomorrow. It's not possible in C++ because if you do it, the behavior specified by C++ no longer applies. Your program might do anything
This might all sound a bit hypothetical, because hey, it seems to work just fine. But keep in mind that just because it works, and it makes sense that it works, when naively compiled, it may break when the compiler tries to apply some optimization or other. When the compiler sees a reference, it is guaranteed by the language rules that it is not null. So what happens if the compiler uses this assumption to speed up the code, and you go behind its back creating a "null reference"?
As others have already said, the code unfortunately only appears to work... however, more fundamentally, you are breaking a contract here.
In C++, a reference is meant to be an alias for an object. Based on the signature of your function I would never expect to be passed a 'NULL' reference, and I would certainly NOT test it before using it. As James McNellis said, creating a NULL reference is undefined behavior, however in most case this behavior only creeps in when you actually try to use it, and you are now exposing the users of your methods to nasty / tricky to nail down bugs.
I won't go any further on that issue, just points you toward Herb Sutter's pick on the issue.
Now for the solution to your problem.
The evident solution is of course a plain pointer. People expect a pointer to be possibly null, so they will test it when it's returned to them (and if they don't it's their damn fault for being lazy).
There are other alternatives, but they mainly boil down to having a special value indicating your failure and there is not much point using complicated designs just for the sake of it...
The last alternative is the use of exceptions here, however I am myself partial to the advice. Exceptions are meant for exceptional situations you see, and for a find/search feature it really depends on the expected result:
if you are implementing some internal factory where you register modules and then call them back later on, then not being able to retrieve one module would indicate an error in the program and as such being reported by an exception is fine
if you are implementing a search engine for a database of yours, thus dealing with user input, then not finding a result matching the input criteria is quite likely to occur, and thus I would not use exceptions in this circumstance, for it's not a programming error but a perfectly normal course
Note: other ways include
Boost.Optional, though I find it clumsy to wrap a reference with it
A shared pointer or weak pointer, to control/monitor the object lifetime in case it may be deleted while you still use it... monitoring does not work by itself in Multi-Threaded environment though
A sentinel value (usually declared static const), but it only works if your object has a meaningful "bad" or "null" value. It's certainly not the approach I would recommend since once again you give an object but it blows up in the users hand if they do anything with it
As others have mentioned, your code is erroneous since it dereferences a null pointer.
Secondly, you are using the reference return type incorrectly, returning a reference in your example is usually not good in C/C++ since it violates the memory management model of the language where all objects are referenced to by pointers to a memory address. If you rewrite C/C++ code that was written using pointers into code that uses references you will end up with these problems. The code using pointers could return a NULL pointer without causing problems, but when returning a reference you must return something that can be cast into a 0 or false statement.
Most important of all is the pattern where your erroneous code only gets executed in the case of an exception, in this example your "fail case". Incorrect error handling, error logging with bugs etc are the most disastreous bugs in computer systems since they never show up in the happy case but always causes a system breakdown when something doesnt follow the normal flow.
The only way to ensure that your code is correct is to have testcases with 100% coverage, that means also testing the error handling, which in your example probably would cause a segmentation fault on your program.