why C++ recognizes an uninitialized raw-pointer as true? - c++

Why the following code produces a seg-fault
//somewhere in main
...
int *pointer;
if(pointer)
cout << *pointer;
...
But slightly changed following code doesn't
//somewhere in main
...
int *pointer = nullptr;
if(pointer)
cout << *pointer;
...
the question is what in C++ makes an uninitialized pointer true - and leads to a crash

Why C++ Recognizes an uninitialized Raw Pointer (or say a daemon) as true?
The behaviour may appear to be so, because the behaviour of the program is undefined.
Why the following code produces a SEGMENTATION FAULT!!!
Because the behaviour of the program is undefined, and that is one of the possible behaviours.
But slightly changed following code doesn't
Because you don't read an indeterminate value in the changed program, and the behaviour of that program is well defined, and the defined behaviour is that the if-statement won't be entered.
In conclusion: Don't read uninitialised variables. Otherwise you'll end up with a broken, useless program.
Although a compiler isn't required to diagnose undefined behaviour for you, luckily high quality compilers are able to detect such simple mistake. Here is example output:
warning: 'pointer' is used uninitialized [-Wuninitialized]
if(pointer)
^~
Compilers generally cannot detect all complex violations. However, runtime sanitisers can detect even complex cases. Example output:
==1==WARNING: MemorySanitizer: use-of-uninitialized-value
Aside from reading uninitialised values, even if it was initialised, if (pointer) doesn't necessarily mean that you're allowed to indirect through the pointer. It only means that the pointer isn't null. Besides null, other pointer values can be unsafe to indirect through.

Because your unitialized Pointer gets implicitly converted to a boolean.
Where 0 converts to false and every other value to true.

Related

Initialisation of pointer with "nullptr", "NULL", and "0" results in segmentation error [duplicate]

According to ISO C++, dereferencing a null pointer is undefined behaviour. My curiosity is, why? Why standard has decided to declare it undefined behaviour? What is the rationale behind this decision? Compiler dependency? Doesn't seem, because according to C99 standard, as far as I know, it is well defined. Machine dependency? Any ideas?
Defining consistent behavior for dereferencing a NULL pointer would require the compiler to check for NULL pointers before each dereference on most CPU architectures. This is an unacceptable burden for a language that is designed for speed.
It also only fixes a small part of a larger problem - there are many ways to have an invalid pointer beyond a NULL pointer.
The primary reason is that by the time they wrote the original C standard there were a number of implementations that allowed it, but gave conflicting results.
On the PDP-11, it happened that address 0 always contained the value 0, so dereferencing a null pointer also gave the value 0. Quite a few people who used these machines felt that since they were the original machine C had been written on/used to program, that this should be considered canonical behavior for C on all machines (even though it originally happened quite accidentally).
On some other machines (Interdata comes to mind, though my memory could easily be wrong) address 0 was put to normal use, so it could contain other values. There was also some hardware on which address 0 was actually some memory-mapped hardware, so reading/writing it did special things -- not at all equivalent to reading/writing normal memory at all.
The camps wouldn't agree on what should happen, so they made it undefined behavior.
Edit: I suppose I should add that by the time the wrote the C++ standard, its being undefined behavior was already well established in C, and (apparently) nobody thought there was a good reason to create a conflict on this point so they kept the same.
The only way to give defined behaviour would be to add a runtime check to every pointer dereference, and every pointer arithmetic operation. In some situations, this overhead would be unacceptable, and would make C++ unsuitable for the high-performance applications it's often used for.
C++ allows you to create your own smart pointer types (or use ones supplied by libraries), which can include such a check in cases where safety is more important than performance.
Dereferencing a null pointer is also undefined in C, according to clause 6.5.3.2/4 of the C99 standard.
This answer from #Johannes Schaub - litb, puts forward an interesting rationale, which seems pretty convincing.
The formal problem with merely dereferencing a null pointer is that determining the identity of the resulting lvalue expression is not possible: Each such expression that results from dereferencing a pointer must unambiguously refer to an object or a function when that expression is evaluated. If you dereference a null pointer, you don't have an object or function that this lvalue identifies. This is the argument the Standard uses to forbid null-references.
Another problem that adds to the confusion is that the semantics of the typeid operator make part of this misery well defined. It says that if it was given an lvalue that resulted from dereferencing a null pointer, the result is throwing a bad_typeid exception. Although, this is a limited area where there exist an exception (no pun) to the above problem of finding an identity. Other cases exist where similar exception to undefined behavior is made (although much less subtle and with a reference on the affected sections).
The committee discussed to solve this problem globally, by defining a kind of lvalue that does not have an object or function identity: The so called empty lvalue. That concept, however, still had problems, and they decided not to adopt it.
Note:
Marking this as community wiki, since the answer & the credit should go to the original poster. I am just pasting the relevant parts of the original answer here.
The real question is, what behavior would you expect ?
A null pointer is, by definition, a singular value that represents the absence of an object. The result of dereferencing a pointer is to obtain a reference to the object pointed to.
So how do you get a good reference... from a pointer that points into the void ?
You do not. Thus the undefined behavior.
I suspect it's because if the behavior is well-defined the compiler has to insert code anywhere pointers are dereferenced. If it's implementation defined then one possible behavior could still be a hard crash. If it's unspecified then either the compilers for some systems have extra undue burden or they may generate code that causes hard crashes.
Thus to avoid any possible extra burden on compilers they left the behavior undefined.
Sometimes you need an invalid pointer (also see MmBadPointer on Windows), to represent "nothing".
If everything was valid, then that wouldn't be possible. So they made NULL invalid, and disallowed you from dereferencing it.
Here is a simple test & example:
Allocate a pointer:
int * pointer;
? What value is in the pointer when it is created?
? What is the pointer pointing to?
? What happens when I dereference this point in its current state?
Marking the end of a linked list.
In a linked list, a node points to another node, except for the last.
What is the value of the pointer in the last node?
What happens when you derefernce the "next" field of the last node?
The needs to be a value that indicates a pointer is not pointing to anything or that it's in an invalid state. This is where the NULL pointer concept comes into play. The linked list can use a NULL pointer to indicate the end of the list.
Arguments have been made elsewhere that having well-defined behaviour for null-pointer-references is impossible without a lot of overhead, which I think is true. This is because AFAIU "well-defined" here also means "portable". If you would not treat nullptr references specially, you would end up generating instructions that simply try to read address 0, but that produces different behaviour on different processors, so that would not be well-defined.
So, I guess this is why derereferencing nullptr (and probably also other invalid pointers) is marked as undefined.
I do wonder why this is undefined rather then unspecified or implementation-defined, which are distict from undefined behaviour, but require more consistency.
In particular, when a program triggers undefined behaviour, the compiler can do pretty much anything (e.g. throw away your entire program maybe?) and still be considered correct, which is somewhat problematic. In practice, you would expect that compilers would just compile a null-pointer-dereference to a read of address zero, but with modern optimizers becoming better, but also more sensitive to undefined behaviour, I think, they sometimes do things that end up more thoroughly breaking the program. E.g. consider the following:
matthijs#grubby:~$ cat test.c
unsigned foo () {
unsigned *foo = 0;
return *foo;
}
matthijs#grubby:~$ arm-none-eabi-gcc -c test.c -Os && objdump -d test.o
test.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <foo>:
0: e3a03000 mov r3, #0
4: e5933000 ldr r3, [r3]
8: e7f000f0 udf #0
This program just dereferences and accesses a null pointer, which results in an "Undefined instruction" being generated (halting the program at runtime).
This might be ok when this is an accidental nullpointer dereference, but in this case I was actually writing a bootloader that needs to read address 0 (which contains the reset vector), so I was quite surprised this happened.
So, not so much an answer, but some extra perspective on the matter.
According to original C standard NULL can be any value - not necessarily zero.
The language definition states that for each pointer type, there is a special value - the `null pointer' - which is distinguishable from all other pointer values and which is 'guaranteed to compare unequal to a pointer to any object or function.' That is, a null pointer points definitively nowhere; it is not the address of any object or function
There is a null pointer for each pointer type, and the internal values of null pointers for different types may be different.
(From http://c-faq.com/null/null1.html)
Although dereferencing a NULL pointer in C/C++ indeed leads undefined behavior from the language standpoint, such operation is well defined in compilers for targets which have memory at corresponding address. In this case, the result of such operation consists in simply reading the memory at address 0.
Also, many compilers will allow you to dereference a NULL pointer as long as you don't bind the referenced value. This is done to provide compatibility to non-conforming yet widespread code, like
#define offsetof(st, m) ((size_t)(&((st *)0)->m))
There was even a discussion to make this behaviour part of the standard.
Because you cannot create a null reference. C++ doesn't allow it. Therefore you cannot dereference a null pointer.
Mainly it is undefined because there is no logical way to handle it.
You can actually dereference a null pointer. Someone did it here: http://www.codeproject.com/KB/system/soviet_kernel_hack.aspx

What happens when I do int*p=p in c/cpp?

Below code is getting compiled in MinGw. How does it get compiled? How is it possible to assign a variable which is not yet created?
int main()
{
int*p=p;
return 0;
}
How does it get compiled?
The point of declaration of a variable starts at the end of its declarator, but before its initialiser. This allows more legitimate self-referential declarations like
void * p = &p;
as well as undefined initialisations like yours.
How is it possible to assign a variable which is not yet created?
There is no assignment here, just initialisation.
The variable has been created (in the sense of having storage allocated for it), but not initialised. You initialise it from whatever indeterminate value happened to be in that storage, with undefined behaviour.
Most compilers will give a warning or error about using uninitialised values, if you ask them to.
Let's take a look at what happens with the int*p=p; statement:
The compiler allocates space on the stack to hold the yet uninitialized value of variable p
Then the compiler initializes p with its uninitialized value
So, essentially there should be no problem with the code except that it assigns a variable an uninitialized value.
Actually there is no much difference than the following code:
int *q; // define a pointer and do not initialize it
int *p = q; // assign the value of the uninitizlized pointer to another pointer
The likely result ("what it compiles to") will be the declaration of a pointer variable that is not initialized at all (which is subsequently optimized out since it is not used, so the net result would be "empty main").
The pointer is declared and initialized. So far, this is an ordinary and legal thing. However, it is initialized to itself, and its value is only in a valid, initialized state after the end of the statement (that is, at the location of the semicolon).
This, unsurprisingly, makes the statement undefined behavior.
By definition, invoking undefined behavior could in principle cause just about everything (although often quoted dramatic effects like formatting your harddrive or setting the computer on fire are exaggerated).
The compiler might actually generate an instruction that moves a register (or memory location) to itself, which would be a no-op instruction on most architectures, but could cause a hardware exception killing your process on some exotic architectures which have special validating registers for pointers (in case the "random" value is incidentially an invalid address).
The compiler will however not insert any "format harddisk" statements.
In practice, optimizing compilers will nowadays often assume "didn't happen" when they encounter undefined behavior, so it is most likely that the compiler will simply honor the declaration, and do nothing else.
This is, in every sense, perfectly allowable in the light of undefined behavior. Further, it is the easiest and least troublesome option for the compiler.

Does taking address of member variable through a null pointer yield undefined behavior?

The following code (or its equivalent which uses explicit casts of null literal to get rid of temporary variable) is often used to calculate the offset of a specific member variable within a class or struct:
class Class {
public:
int first;
int second;
};
Class* ptr = 0;
size_t offset = reinterpret_cast<char*>(&ptr->second) -
reinterpret_cast<char*>(ptr);
&ptr->second looks like it is equivalent to the following:
&(ptr->second)
which in turn is equivalent to
&((*ptr).second)
which dereferences an object instance pointer and yields undefined behavior for null pointers.
So is the original fine or does it yield UB?
Despite the fact that it does nothing, char* foo = 0; *foo; is could be undefined behavior.
Dereferencing a null pointer is could be undefined behavior. And yes , ptr->foo is equivalent to (*ptr).foo, and *ptr dereferences a null pointer.
There is currently an open issue in the working groups about if *(char*)0 is undefined behavior if you don't read or write to it. Parts of the standard imply it is, other parts imply it is not. The current notes there seem to lean towards making it defined.
Now, this is in theory. How about in practice?
Under most compilers, this works because no checks are done at dereferencing time: memory around where null pointer point to is guarded against access, and the above expression simply takes an address of something around null, it does not read or write the value there.
This is why cpp reference offsetof lists pretty much that trick as a possible implementation. The fact that some (many? most? every one I've checked?) compilers implement offsetof in a similar or equivalent manner does not mean that the behavior is well defined under the C++ standard.
However, given the ambiguity, compilers are free to add checks at every instruction that dereferences a pointer, and execute arbitrary code (fail fast error reporting, for example) if null is indeed dereferenced. Such instrumentation might even be useful to find bugs where they occur, instead of where the symptom occurs. And on systems where there is writable memory near 0 such instrumentation could be key (pre-OSX MacOS had some writable memory that controlled system functions near 0).
Such compilers could still write offsetof that way, and introduce pragmas or the like to block the instrumentation in the generated code. Or they could switch to an intrinsic.
Going a step further, C++ leaves lots of latitude on how non-standard-layout data is arranged. In theory, classes could be implemented as rather complex data structures and not the nearly standard-layout structures we have grown to expect, and the code would still be valid C++. Accessing member variables to non-standard-layout types and taking their address could be problematic: I do not know if there is any guarantee that the offset of a member variable in a non-standard layout type does not change between instances!
Finally, some compilers have aggressive optimization settings that find code that executes undefined behavior (at least under certain branches or conditions), and uses that to mark that branch as unreachable. If it is decided that null dereference is undefined behavior, this could be a problem. A classic example is gcc's aggressive signed integer overflow branch eliminator. If the standard dictates something is undefined behavior, the compiler is free to consider that branch unreachable. If the null dereference is not behind a branch in a function, the compiler is free to declare all code that calls that function to be unreachable, and recurse.
And it would be free to do this in not the current, but the next version of your compiler.
Writing code that is standards-valid is not just about writing code that compiles today cleanly. While the degree to which dereferencing and not using a null pointer is defined is currently ambiguous, relying on something that is only ambiguously defined is risky.

Assigning a reference by dereferencing a NULL pointer

int& fun()
{
int * temp = NULL;
return *temp;
}
In the above method, I am trying to do the dereferencing of a NULL pointer. When I call this function it does not give exception. I found when return type is by reference it does not give exception if it is by value then it does. Even when dereferencing of NULL pointer is assinged to reference (like the below line) then also it does not give.
int* temp = NULL:
int& temp1 = *temp;
Here my question is that does not compiler do the dereferencing in case of reference?
Dereferencing a null pointer is Undefined Behavior.
An Undefined Behavior means anything can happen, So it is not possible to define a behavior for this.
Admittedly, I am going to add this C++ standard quote for the nth time, but seems it needs to be.
Regarding Undefined Behavior,
C++ Standard section 1.3.24 states:
Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
NOTE:
Also, just to bring it to your notice:
Using a returned reference or pointer to a local variable inside a function is also an Undefined Behavior. You should be allocating the pointer on freestore(heap) using new and then returning a reference/pointer to it.
EDIT:
As #James McNellis, appropriately points out in the comments,
If the returned pointer or reference is not used, the behavior is well defined.
When you dereference a null pointer, you don't necessarily get an exception; all that is guaranteed is that the behavior is undefined (which really means that there is no guarantee at all as to what the behavior is).
Once the *temp expression is evaluated, it is impossible to reason about the behavior of the program.
You are not allowed to dereference a null pointer, so the compiler can generate code assuming that you don't do that. If you do it anyway, the compiler might be nice and tell you, but it doesn't have to. It's your part of the contract that says you must not do it.
In this case, I bet the compiler will be nice and tell you the problem already at compile time, if you just set the warning level properly.
Don't * a null pointer, it's UB. (undefined behavior, you can never assume it'll do anything short of lighting your dog on fire and forcing you to take shrooms which will lead to FABULOUS anecdotes)
Some history and information of null pointers in the Algol/C family: http://en.wikipedia.org/wiki/Pointer_(computing)#Null_pointer
Examples and implications of undefined behavior: http://en.wikipedia.org/wiki/Undefined_behavior#Examples_in_C
I don't sure I understand what you're trying todo. Dereferencing of ** NULL** pointer is not defined.
In case you want to indicate that you method not always returns value you can declare it as:
bool fun(int &val);
or stl way (similar to std::map insert):
std::pair<int, bool> fun();
or boost way:
boost::optional<int> fun();

Is using an invalid pointer value legal in C?

The following code is undefined behavior in C++ (although it will work okay on almost any widely used implementation):
int* pointer; //uninitialized - likely illegal pointer value
pointer++; //incrementing an illegal pointer is UB
Is the above code legal in C?
It's undefined behavior in C as well because on certain architectures, loading an invalid pointer into a register triggers a hardware fault.
See Is storing an invalid pointer automatically undefined behavior?
It is undefined behavior in C99. The value of pointer is "indeterminate" (6.7.8.10) and an indeterminate value can be a trap value that causes undefinedness when used.
Not legal. Code like this will compile, but with warnings. Don't ignore them. Don't write code like this. It can affect your system in many not so nice ways. My university teacher once told us he managed to erase one machine's BIOS using code with undefined behaviour.