I'm looking for clarification on this bit of code. The call to A::hello() works (I expected a segv). The segfault does come through on the access to member x, so it seems like the method resolution alone doesn't actually dereference bla?
I compiled with optimization off, gcc 4.6.3. Why doesn't bla->hello() blow up? Just wonderin' what's going on. Thanks.
class A
{
public:
int x;
A() { cout << "constructing a" << endl; }
void hello()
{
cout << "hello a" << endl;
}
};
int main()
{
A * bla;
bla = NULL;
bla->hello(); // prints "hello a"
bla->x = 5; // segfault
}
Your program exhibits undefined behavior. "Seems to work" is one possible manifestation of undefined behavior.
In particular, bla->hello() call appears to work because hello() doesn't actually use this in any way, so it just happens not to notice that this is not a valid pointer.
You are dereferencing NULL pointer, i.e. trying to access object stored at address NULL:
bla = NULL;
bla->hello();
bla->x = 5;
which results in undefined behavior, which means that anything can happen, including the seg fault while assigning 5 to the member x and also including deceptive "works as expected" effect while invoking the hello method.
At least in the typical implementation, when you call a non-virtual member function via a pointer, that pointer is not dereferenced to find the function.
The type of the pointer is used to determine the scope (context) in which to search for the name of the function, but that happens entirely at compile time. What the pointer points at (including "nothing", in the case of a null pointer) is irrelevant to finding the function itself.
After the compiler finds the correct function, it typically translates a call like a->b(c); into something roughly equivalent to: b(a, c); -- a then becomes the value of this inside the function. When the function refers to data in the object, this is dereferenced to find the correct object, then an offset is normally applied to find the correct item in that object.
If the member function never attempts to use any of the object's members, the fact that this is a null pointer doesn't affect anything.
If, on the other hand, the member function does attempt to use a member of the object, it'll attempt to dereference this to do that, and if this is a NULL pointer, that won't work. Likewise, calling a virtual function always uses the vtable pointer, which is in the object, so attempting to call a virtual function via a null pointer can be expected to fail (regardless of whether the code in the virtual function refers to data in the object or not).
As I said to start with, I'm talking about the typical implementation here. From the viewpoint of the standard, you simply have undefined behavior, and that's the end of it. In theory, an implementation doesn't have to work the way I've described. In reality, however, essentially all the reasonably popular implementations of C++ (e.g., MS VC++, g++, Clang, and Intel) all work very similarly in these respects.
As long as a member function doesn't dereference this, it's usually "safe" to call it, but you're then in undefined behavior land.
In theory, this is undefined behavior. In practice, you do not use this pointer when calling hello(), it does not reference your class at all, and therefore works and does not generate memory access violation. When you do bla->x, however, you are trying to reference a memory though bla pointer which is uninitialized, and it crashes. Again, even in this case there is no guarantee that it will crash, this is undefined behavior.
Related
A question came up here on SO asking "Why is this working" when a pointer became dangling. The answers were that it's UB, which means it may work or not.
I learned in a tutorial that:
#include <iostream>
struct Foo
{
int member;
void function() { std::cout << "hello";}
};
int main()
{
Foo* fooObj = nullptr;
fooObj->member = 5; // This will cause a read access violation but...
fooObj->function(); // Because this doesn't refer to any memory specific to
// the Foo object, and doesn't touch any of its members
// It will work.
}
Would this be the equivalent of:
static void function(Foo* fooObj) // Foo* essentially being the "this" pointer
{
std::cout << "Hello";
// Foo pointer, even though dangling or null, isn't touched. And so should
// run fine.
}
Am I wrong about this? Is it UB even though as I explained just calling a function and not accessing the invalid Foo pointer?
You're reasoning about what happens in practice. Undefined behavior is allowed to do the thing you expect... but it is not guaranteed.
For the non-static case, this is straightforward to prove using the rule found in [class.mfct.non-static]:
If a non-static member function of a class X is called for an object that is not of type X, or of a type derived from X, the behavior is undefined.
Note that there's no consideration about whether the non-static member function accesses *this. The object is simply required to have the correct dynamic type, and *(Foo*)nullptr certainly does not.
In particular, even on platforms which use the implementation you describe, the call
fooObj->func();
gets converted to
__assume(fooObj); Foo_func(fooObj);
and is optimization-unstable.
Here's an example which will work contrary to your expectations:
int main()
{
Foo* fooObj = nullptr;
fooObj->func();
if (fooObj) {
fooObj->member = 5; // This will cause a read access violation!
}
}
On real systems, this is likely to end up with an access violation on the commented line, because the compiler used the fact that fooObj can't be null in fooObj->func() to eliminate the if test following it.
Don't do things that are UB even if you think you know what your platform does. Optimization instability is real.
Also, the Standard is even more restrictive that you might think. This will also cause UB:
struct Foo
{
int member;
void func() { std::cout << "hello";}
static void s_func() { std::cout << "greetings";}
};
int main()
{
Foo* fooObj = nullptr;
fooObj->s_func(); // well-formed call to static member,
// but unlike Foo::s_func(), it requires *fooObj to be a valid object of type Foo
}
The relevant portions of the Standard are found in [expr.ref]:
The expression E1->E2 is converted to the equivalent form (*(E1)).E2
and the accompanying footnote
If the class member access expression is evaluated, the subexpression evaluation happens even if the result is unnecessary to determine the value of the entire postfix expression, for example if the id-expression denotes a static member.
This means that the code in question definitely evaluates (*fooObj), attempting to create a reference to a non-existent object. There have been several proposals to make this allowed and only forbid allowing lvalue->rvalue conversion on such a reference, but those have been rejected this far; even forming the reference is illegal in all versions of the Standard to date.
In practice this is usually how major compilers implement member functions, yes. This means that your test program would probably appear to run "just fine".
Having said that, dereferencing a pointer pointing to nullptr is undefined behavior which means that all bets are off and the whole program and it's output is meaningless, anything could happen.
You can never rely on this behavior, optimizers in particular could mess all of this code up because they're allowed to assume that fooObj is never nullptr.
Compiler isn't obliged by standard to implement member function by passing it a pointer to the class instance. Yes, there is pseudo-pointer "this", but it is unrelated element, guaranteed to be "understood".
nullptr pointer doesn't point on any existing object, and -> () calls a member of that object. From standard's view, this is nonsense and result of such operation is undefined (and potentially, catastrophic).
If function() would be virtual, then call is allowed to fail, because address of function would be unavailable (vtable might be implemented as part of object and doesn't exist if object doesn't).
if the member function (method) behaves like that and meant to be called like that it should be a static member function (method). Static method doesn't access non-static fields and doesn't call non-static methods of class. If it is static, the call could look like this as well:
Foo::function();
A question came up here on SO asking "Why is this working" when a pointer became dangling. The answers were that it's UB, which means it may work or not.
I learned in a tutorial that:
#include <iostream>
struct Foo
{
int member;
void function() { std::cout << "hello";}
};
int main()
{
Foo* fooObj = nullptr;
fooObj->member = 5; // This will cause a read access violation but...
fooObj->function(); // Because this doesn't refer to any memory specific to
// the Foo object, and doesn't touch any of its members
// It will work.
}
Would this be the equivalent of:
static void function(Foo* fooObj) // Foo* essentially being the "this" pointer
{
std::cout << "Hello";
// Foo pointer, even though dangling or null, isn't touched. And so should
// run fine.
}
Am I wrong about this? Is it UB even though as I explained just calling a function and not accessing the invalid Foo pointer?
You're reasoning about what happens in practice. Undefined behavior is allowed to do the thing you expect... but it is not guaranteed.
For the non-static case, this is straightforward to prove using the rule found in [class.mfct.non-static]:
If a non-static member function of a class X is called for an object that is not of type X, or of a type derived from X, the behavior is undefined.
Note that there's no consideration about whether the non-static member function accesses *this. The object is simply required to have the correct dynamic type, and *(Foo*)nullptr certainly does not.
In particular, even on platforms which use the implementation you describe, the call
fooObj->func();
gets converted to
__assume(fooObj); Foo_func(fooObj);
and is optimization-unstable.
Here's an example which will work contrary to your expectations:
int main()
{
Foo* fooObj = nullptr;
fooObj->func();
if (fooObj) {
fooObj->member = 5; // This will cause a read access violation!
}
}
On real systems, this is likely to end up with an access violation on the commented line, because the compiler used the fact that fooObj can't be null in fooObj->func() to eliminate the if test following it.
Don't do things that are UB even if you think you know what your platform does. Optimization instability is real.
Also, the Standard is even more restrictive that you might think. This will also cause UB:
struct Foo
{
int member;
void func() { std::cout << "hello";}
static void s_func() { std::cout << "greetings";}
};
int main()
{
Foo* fooObj = nullptr;
fooObj->s_func(); // well-formed call to static member,
// but unlike Foo::s_func(), it requires *fooObj to be a valid object of type Foo
}
The relevant portions of the Standard are found in [expr.ref]:
The expression E1->E2 is converted to the equivalent form (*(E1)).E2
and the accompanying footnote
If the class member access expression is evaluated, the subexpression evaluation happens even if the result is unnecessary to determine the value of the entire postfix expression, for example if the id-expression denotes a static member.
This means that the code in question definitely evaluates (*fooObj), attempting to create a reference to a non-existent object. There have been several proposals to make this allowed and only forbid allowing lvalue->rvalue conversion on such a reference, but those have been rejected this far; even forming the reference is illegal in all versions of the Standard to date.
In practice this is usually how major compilers implement member functions, yes. This means that your test program would probably appear to run "just fine".
Having said that, dereferencing a pointer pointing to nullptr is undefined behavior which means that all bets are off and the whole program and it's output is meaningless, anything could happen.
You can never rely on this behavior, optimizers in particular could mess all of this code up because they're allowed to assume that fooObj is never nullptr.
Compiler isn't obliged by standard to implement member function by passing it a pointer to the class instance. Yes, there is pseudo-pointer "this", but it is unrelated element, guaranteed to be "understood".
nullptr pointer doesn't point on any existing object, and -> () calls a member of that object. From standard's view, this is nonsense and result of such operation is undefined (and potentially, catastrophic).
If function() would be virtual, then call is allowed to fail, because address of function would be unavailable (vtable might be implemented as part of object and doesn't exist if object doesn't).
if the member function (method) behaves like that and meant to be called like that it should be a static member function (method). Static method doesn't access non-static fields and doesn't call non-static methods of class. If it is static, the call could look like this as well:
Foo::function();
So I'm trying to learn a bit more about the differences between C-style-casts, static_cast, dynamic_cast and I decided to try this example which should reflect the differences between C-style-casts and static_cast pretty good.
class B
{
public:
void hi() { cout << "hello" << endl; }
};
class D: public B {};
class A {};
int main()
{
A* a = new A();
B* b = (B*)a;
b->hi();
}
Well this code snippet should reflect that the C-style-cast goes very wrong and the bad cast is not detected at all. Partially it happens that way. The bad cast is not detected, but I was surprised when the program, instead of crashing at b->hi();, it printed on the screen the word "hello".
Now, why is this happening ? What object was used to call such a method, when there's no B object instantiated ? I'm using g++ to compile.
As others said it is undefined behaviour.
Why it is working though? It is probably because the function call is linked statically, during compile-time (it's not virtual function). The function B::hi() exists so it is called. Try to add variable to class B and use it in function hi(). Then you will see problem (trash value) on the screen:
class B
{
public:
void hi() { cout << "hello, my value is " << x << endl; }
private:
int x = 5;
};
Otherwise you could make function hi() virtual. Then function is linked dynamically, at runtime and program crashes immediately:
class B
{
public:
virtual void hi() { cout << "hello" << endl; }
};
This only works because of the implementation of the hi() method itself, and the peculiar part of the C++ spec called undefined behaviour.
Casting using a C-style cast to an incompatible pointer type is undefined behaviour - literally anything at all could happen.
In this case, the compiler has obviously decided to just trust you, and has decided to believe that b is indeed a valid pointer to an instance of B - this is in fact all a C-style cast will ever do, as they involve no runtime behaviour. When you call hi() on it, the method works because:
It doesn't access any instance variables belonging to B but not A (indeed, it doesn't access any instance variables at all)
It's not virtual, so it doesn't need to be looked up in b's vtable to be called
Therefore it works, but in almost all non-trivial cases such a malformed cast followed by a method call would result in a crash or memory corruption. And you can't rely on this kind of behaviour - undefined doesn't mean it has to be the same every time you run it, either. The compiler is perfectly within its rights with this code to insert a random number generator and, upon generating a 1, start up a complete copy of the original Doom. Keep that firmly in mind whenever anything involving undefined behaviour appears to work, because it might not work tomorrow and you need to treat it like that.
Now, why is this happening ?
Because it can happen. Anything can happen. The behaviour is undefined.
The fact that something unexpected happened demonstrates well why UB is so dangerous. If it always caused a crash, then it would be far easier to deal with.
What object was used to call such a method
Most likely, the compiler blindly trusts you, and assumes that b does point to an object of type B (which it doesn't). It probably would use the pointed memory as if the assumption was true. The member function didn't access any of the memory that belongs to the object, and the behaviour happened to be the same as if there had been an object of correct type.
Being undefined, the behaviour could be completely different. If you try to run the program again, the standard doesn't guarantee that demons won't fly out of your nose.
Amazingly people may call it feature but I use to say it another bug of C++ that we can call member function through pointer without assigning any object. See following example:
class A{
public:
virtual void f1(){cout<<"f1\n";}
void f2(){cout<<"f2\n";};
};
int main(){
A *p=0;
p->f2();
return 0;
}
Output:
f2
We have checked this in different compilers & platforms but result is same, however if we call virtual function through pointer without object then there occur run-time error. Here reason is obvious for virtual function when object is checked it is not found so there comes error.
This is not a bug. You triggered an Undefined Behavior. You may get anything including the result you expected.
Dereferencing a NULL pointer is undefined behavior.
BTW, there is no thing such as "bug of C++". The bugs may occur in C++ Compilers not in the language it self.
As pointed out, this is undefined behaviour, so anything goes.
To answer the question in terms of the implementation, why do you see this behaviour?
The non-virtual call is implemented as just an ordinary function call, with the this pointer (value null) passed in as paremeter. The parameter is not dereferenced (as no member variables are used), so the call succeeds.
The virtual call requires a lookup in the vtable to get the adress of the actual function to call. The vtable address is stored in a pointer in the data of the object itself. Thus to read it, a de-reference of the this pointer is required - segmentation fault.
When you create a class by
class A{
public:
virtual void f1(){cout<<"f1\n";}
void f2(){cout<<"f2\n";};
};
The Compiler puts the code of member functions in the text area.
When you do p->MemberFunction() then the compiler just deferences p and tries to find the function MemberFunction using the type information of p which is Class A.
Now since the function's code exists in the text area so it is called. If the function had references to some class variables then while accessing them, you might have gotten a Segmentation Fault as there is no object, but since that is not the case, hence the function executes properly.
NOTE: It all depends on how a compiler implements member function access. Some compiler may choose to see if the pointer of object is null before accessing the member function, but then the pointer may have some garbage value instead of 0 which a compiler cannot check, so generally compilers ignore this check.
You can achieve a lot with undefined behavior. You can even call a function which only takes 1 argument and receive the second one like this:
#include <iostream>
void Func(int x)
{
uintptr_t ptr = reinterpret_cast<uintptr_t>(&x) + sizeof(x);
uintptr_t* sPtr = (uintptr_t*)ptr;
const char* secondArgument = (const char*)*sPtr;
std::cout << secondArgument << std::endl;
}
int main()
{
typedef void(*PROCADDR)(int, const char*);
PROCADDR ext_addr = reinterpret_cast<PROCADDR>(&Func);
//call the function
ext_addr(10, "arg");
return 0;
}
Compile and run under windows and you will get "arg" as result for the second argument. This is not a fault within C++, it is just plain stupid on my part :)
This will work on most compilers. When you make a call to a method (non virtual), the compiler translates:
obj.foo();
to something:
foo(&obj);
Where &obj becomes the this pointer for foo method. When you use a pointer:
Obj *pObj = NULL;
pObj->foo();
For the compiler it is nothing but:
foo(pObj);
i.e.:
foo(NULL);
Calling any function with null pointer is not a crime, the null pointer (i.e. pointer having null value) will be pushed to call stack. It is up to the target function to check if null was passed to it. It is like calling:
strlen(NULL);
Which will compile, and also run, if it is handled:
size_t strlen(const char* ptr) {
if (ptr==NULL) return 0;
... // rest of code if `ptr` is not null
}
Thus, this is very much valid:
((A*)NULL)->f2();
As long as f2 is non-virtual, and if f2 doesn't read/write anything out of this, including any virtual function calls. Static data and function access will still be okay.
However, if method is virtual, the function call is not as simple as it appears. Compiler puts some additional code to perform late binding of given function. The late binding is totally based on what is being pointed by this pointer. It is compiler dependent, but a call like:
obj->virtual_fun();
Will involve looking up the current type of obj by virtual function table lookup. therefore, obj must not be null.
Mistakenly I wrote something daft, which to my surprise worked.
class A
{ public:
void print()
{
std::cout << "You can't/won't see me !!!" << std::endl;
}
A* get_me_the_current_object()
{
return this;
}
};
int main()
{
A* abc = dynamic_cast<A*>(abc);
abc->print();
}
Here, A* abc = dynamic_cast<A*>(abc), I am doing dynamic_cast on a pointer, which isn't declared. But, it works, so I assumed that the above statement is broken as:
A* abc;
abc = dynamic_cast<A*>(abc);
and therefore, it works. However, on trying some more weird scenarios such as:
A* abc;
abc->print();
and, further
A* abc = abc->get_me_the_current_object();
abc->print();
I was flabbergasted, looking at how these examples worked and the mapping was done.
Can someone please elaborate on how these are working? Thanks in advance.
You've made the common mistake of thinking that undefined behaviour and C++ bugs mean you should expect to see a dramatic crash or your computer catching fire. Sometimes nothing happens. That doesn't mean the code "works", because it's still got a bug, it's just that the symptoms haven't shown up ... yet.
But, it works, so I assumed that the above statement is broken as:
Yes, all you're doing is converting an uninitialized pointer to the same type, i.e. no conversion needed, so the compiler does nothing. Your pointer is still the same type and is still uninitialized.
This is similar to this:
int i = i;
This is valid according to the grammar of C++, because i is in scope at that point, but is undefined because it copies an uninitialized object. It's unlikely to actually set your computer on fire though, it appears to "work".
Can someone please elaborate on how these are working?
Technically you're dereferencing an invalid pointer, which is undefined behaviour, but since your member functions don't actually use any members of the object, the invalid this pointer is not dereferenced, so the code "works" (or at least appears to.)
This is similar to:
void function(A* that)
{
std::cout << "Hello, world!\n";
}
A* p;
function(p);
Because the that pointer is not used (like the this pointer is not used in your member functions) this doesn't necessarily crash, although it might do on implementations where even copying an uninitialized pointer could cause a hardware fault. In your example it seems that your compiler doesn't need to dereference abc to call a non-static member function, and passing it as the hidden this parameter does not cause a hardware fault, but the behaviour is still undefined even though it doesn't fail in an obvious way such as a segfault..
abc is uninitialized and points to an undefined location in memory, but your methods don't read anything from *this so they won't crash.
The fact that they won't crash is almost certainly implementation defined behavior though.