According to C++ Standard, it's perfectly acceptable to do this:
class P
{
void Method() {}
};
...
P* p = NULL;
p->Method();
However, a slight change to this:
class P
{
virtual void Method() {}
};
...
P* p = NULL;
p->Method();
produces an access violation when compiled with Visual Studio 2005.
As far as I understand, this is caused by some quirk in Microsoft's compiler implementation and not by my sheer incompetence for a change, so the questions are:
1) Does this behavior persist in more recent versions of VS?
2) Are there any, I don't know, compiler settings that prevent this access violation?
According to C++ Standard, it's perfectly acceptable to do this
No it is not!
Dereferencing a NULL pointer is Undefined Behavior as per the C++ Standard.[#1]
However, If you do not access any members inside a non virtual member function it will most likely work on every implementation because for a non virtual member function the this only needs to be derefernced for accessing members of this since there are no members being accessed inside the function hence the result.
However, just because the observable behavior is okay does not mean the program is well-formed. correct.
It still is ill-formed.
It is an invalid program nevertheless.
The second version crashes because while accessing a virtual member function, the this pointer needs to be dereferenced just even for calling the appropriate member function even if there are no members accessed within that member function.
A good read:
What's the difference between how virtual and non-virtual member functions are called?
[#1]Reference:
C++03 Standard: ยง1.9/4
Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer). [Note: this International Standard imposes no requirements on the behavior of programs that contain undefined behavior. ]
As said by AIs... I'll even explain why: in many C++ implementations the this pointer is simply passed as the first "hidden" parameter of the method. So what you see as
void Method() {}
is really
void Method(P* this) {}
But for virtual methods it's more complex. The runtime needs to access the pointer to find the "real" type of P* to be able to call the "right" virtual implementation of the method. So it's something like
p->virtualTable->Method(p);
so p is always used.
First of all, neither one will even compile, because you've defined Method as private.
Assuming you make Method public, you end up with undefined behavior in both cases. Based on the typical implementation, most compilers will allow the first to "work" (for a rather loose definition of work) while the second will essentially always fail.
This is because a non-virtual member function is basically a normal function that receives an extra parameter. Inside that function, the keyword this refers to that extra parameter, which is a pointer to the class instance for which the function was invoked. If you invoke the member function via a null pointer, it mostly means that inside that function this will be a null pointer. As long as nothing in the function attempts to dereference this, chances are pretty good that you see any noticeable side effects.
A virtual function, however, is basically a function called via a pointer. In a typical implementation, any class that has one or more virtual functions (whether defined directly in that class, or inherited from a base class) will have a vtable. Each instance of that class (i.e., each object) will contain a pointer to the vtable for its class. When you try to call a virtual function via a pointer, the compiler will generate code that:
Dereferences that pointer.
Gets the vtable pointer from the proper offset in that object
dereferences the vtable pointer to get the class' vtable
looks at the proper offset in the vtable to get a pointer to the function to invoke
invokes that function
Given a null pointer, step one of that process is going to break.
I'd note for the record that this applies to virtually all C++ compilers. VC++ is far from unique in this regard. Quite the contrary -- while it's theoretically possible for a compiler to implement virtual functions (for one example) differently than this, the reality is that every compiler of which I'm aware works essentially identically for the kind of code you posted. Virtually all C++ compilers will show similar behavior given the same code -- major differences in implementation are mostly a theoretical possibility, not one you're at all likely to encounter in practice.
Related
What is the performance difference between calling a virtual function from a derived class pointer directly vs from a base class pointer to the same derived class?
In the derived pointer case, will the call be statically bound, or dynamically bound? I think it'll be dynamically bound because there's no guarantee the derived pointer isn't actually pointing to a further derived class. Would the situation change if I have the derived class directly by value (not through pointer or reference)? So the 3 cases:
base pointer to derived
derived pointer to derived
derived by value
I'm concerned about performance because the code will be run on a microcontroller.
Demonstrating code
struct Base {
// virtual destructor left out for brevity
virtual void method() = 0;
};
struct Derived : public Base {
// implementation here
void method() {
}
}
// ... in source file
// call virtual method from base class pointer, guaranteed vtable lookup
Base* base = new Derived;
base->method();
// call virtual method from derived class pointer, any difference?
Derived* derived = new Derived;
derived->method();
// call virtual method from derived class value
Derived derivedValue;
derived.method();
In theory, the only C++ syntax that makes a difference is a member function call that uses qualified member name. In terms of your class definitions that would be
derived->Derived::method();
This call ignores the dynamic type of the object and goes directly to Derived::method(), i.e. it's bound statically. This is only possible for calling methods declared in the class itself or in one of its ancestor classes.
Everything else is a regular virtual function call, which is resolved in accordance with the dynamic type of the object used in the call, i.e. it is bound dynamically.
In practice, compilers will strive to optimize the code and replace dynamically-bound calls with statically-bound calls in contexts where the dynamic type of the object is known at compile time. For example
Derived derivedValue;
derivedValue.method();
will typically produce a statically-bound call in virtually every modern compiler, even though the language specification does not provide any special treatment for this situation.
Also, virtual method calls made directly from constructors and destructors are typically compiled into statically-bound calls.
Of course, a smart compiler might be able to bind the call statically in a much greater variety of contexts. For example, both
Base* base = new Derived;
base->method();
and
Derived* derived = new Derived;
derived->method();
can be seen by the compiler as trivial situations that easily allow for statically-bound calls.
Virtual functions must be compiled to work as if they were always called virtually. If your compiler compiles a virtual call as a static call, that's an optimization that must satisfy this as-if rule.
From this, it follows that the compiler must be able to prove the exact type of the object in question. And there are some valid ways in which it can do this:
If the compiler sees the creation of the object (the new expression or the automatic variable from which the address is taken) and can prove that that creation is actually the source of the current pointer value, that gives it the precise dynamic type it needs. All your examples fall into this category.
While a constructor runs, the type of the object is exactly the class containing the running constructor. So any virtual function call made in a constructor can be resolved statically.
Likewise, while a destructor runs, the type of the object is exactly the class containing the running destructor. Again, any virtual function call can be resolved statically.
Afaik, these are all the cases that allow the compiler to convert a dynamic dispatch into a static call.
All of these are optimizations, though, the compiler may decide to perform the runtime vtable lookup anyway. But good optimizing compilers should be able to detect all three cases.
There should be no difference between the first two cases, since the very idea of virtual functions is to call always the actual implementation. Leaving compiler optimisations aside (which in theory could optimise all virtual function calls away if you construct the object in the same compilation unit and there is no way the pointer can be altered in between), the second call must be implemented as a indirect (virtual) call as well, since there could be a third class inheriting from Derived and implementing that function as well. I would assume that the third call will not be virtual, since the compiler knows the actual type already at compile time. Actually you could make sure of this by not defining the function as virtual, if you know you will always do the call on the derived class directly.
For really lightweight code running on a small microcontroller I would recommend avoiding defining functions as virtual at all. Usually there is no runtime abstraction required. If you write a library and need some kind of abstraction, you can maybe work with templates instead (which give you some compile-time abstraction).
At least on PC CPUs I often find virtual calls one of the most expensive indirections you can have (probably because branch prediction is more difficult). Sometimes one can also transform the indirection to the data level, e.g. you keep one generic function which operates on different data which is indirected with pointers to the actual implementation. Of course this will work only in some very specific cases.
At run-time.
BUT: Performance as compared to what? It isn't valid to compare a virtual function call to a non-virtual function call. You need to compare it to a non-virtual function call plus an if, a switch, an indirection, or some other means of providing the same function. If the function doesn't embody a choice among implementations, i.e. doesn't need to be virtual, don't make it virtual.
I am reading Inside the C++ Object Model. In section 1.3
So, then, why is it that, given
Bear b;
ZooAnimal za = b;
// ZooAnimal::rotate() invoked
za.rotate();
the instance of rotate() invoked is the ZooAnimal instance and not that of Bear? Moreover, if memberwise initialization copies the values of one object to another, why is za's vptr not addressing Bear's virtual table?
The answer to the second question is that the compiler intercedes in the initialization and assignment of one class object with another. The compiler must ensure that if an object contains one or more vptrs, those vptr values are not initialized or changed by the source object .
So I wrote the test code below:
#include <stdio.h>
class Base{
public:
virtual void vfunc() { puts("Base::vfunc()"); }
};
class Derived: public Base
{
public:
virtual void vfunc() { puts("Derived::vfunc()"); }
};
#include <string.h>
int main()
{
Derived d;
Base b_assign = d;
Base b_memcpy;
memcpy(&b_memcpy, &d, sizeof(Base));
b_assign.vfunc();
b_memcpy.vfunc();
printf("sizeof Base : %d\n", sizeof(Base));
Base &b_ref = d;
b_ref.vfunc();
printf("b_assign: %x; b_memcpy: %x; b_ref: %x\n",
*(int *)&b_assign,
*(int *)&b_memcpy,
*(int *)&b_ref);
return 0;
}
The result
Base::vfunc()
Base::vfunc()
sizeof Base : 4
Derived::vfunc()
b_assign: 80487b4; b_memcpy: 8048780; b_ref: 8048780
My question is why b_memcpy still called Base::vfunc()
What you are doing is illegal in C++ language, meaning that the behavior of your b_memcpy object is undefined. The latter means that any behavior is "correct" and your expectations are completely unfounded. There's not much point in trying to analyze undefined behavior - it is not supposed to follow any logic.
In practice, it is quite possible that your manipulations with memcpy did actually copy Derived's virtual table pointer to b_memcpy object. And your experiments with b_ref confirm that. However, when a virtual method is called though an immediate object (as is the case with b_memcpy.vfunc() call) most implementations optimize away the access to the virtual table and perform a direct (non-virtual) call to the target function. Formal rules of the language state that no legal action can ever make b_memcpy.vfunc() call to dispatch to anything other than Base::vfunc(), which is why the compiler can safely replace this call with a direct call to Base::vfunc(). This is why any virtual table manipulations will normally have no effect on b_memcpy.vfunc() call.
The behavior you've invoked is undefined because the standard says it's undefined, and your compiler takes advantage of that fact. Lets look at g++ for a concrete example. The assembly it generates for the line b_memcpy.vfunc(); with optimizations disabled looks like this:
lea rax, [rbp-48]
mov rdi, rax
call Base::vfunc()
As you can see, the vtable wasn't even referenced. Since the compiler knows the static type of b_memcpy it has no reason to dispatch that method call polymorphically. b_memcpy can't be anything other than a Base object, so it just generates a call to Base::vfunc() as it would with any other method call.
Going a bit further, lets add a function like this:
void callVfunc(Base& b)
{
b.vfunc();
}
Now if we call callVfunc(b_memcpy); we can see different results. Here we get a different result depending on the optimization level at which I compile the code. On -O0 and -O1 Derived::vfunc() is called and on -O2 and -O3 Base::vfunc() is printed. Again, since the standard says the behavior of your program is undefined, the compiler makes no effort to produce a predictable result, and simply relies on the assumptions made by the language. Since the compiler knows b_memcpy is a Base object, it can simply inline the call to puts("Base::vfunc()"); when the optimization level allows for it.
You aren't allowed to do
memcpy(&b_memcpy, &d, sizeof(Base));
- it's undefined behaviour, because b_memcpy and d aren't "plain old data" objects (because they have virtual member functions).
If you wrote:
b_memcpy = d;
then it would print Base::vfunc() as expected.
Any use of a vptr is outside the scope of the standard
Granted, the use of memcpy here has UB
The answers pointing out that any use of memcpy, or other byte manipulation of non-PODs, that is, of any object with a vptr, has undefined behavior, are strictly technically correct but do not answer the question. The question is predicated on the existence of a vptr (vtable pointer) which isn't even mandated by the standard: of course the answer will involve facts outside the standard and the result bill not be guaranteed by the standard!
Standard text is not relevant regarding the vptr
The issue is not that you are not allowed to manipulate the vptr; the notion of being allowed by the standard to manipulate anything that is not even described in the standard text is absurd. Of course not standard way to change the vptr will exist and this is beside the point.
The vptr encodes the type of a polymorphic object
The issue here is not what the standard says about the vptr, the issue is what the vptr represents, and what the standard says about that: the vptr represents the dynamic type of an object. Whenever the result of an operation depends on the dynamic type, the compiler will generate code to use the vptr.
[Note regarding MI: I say "the" vptr (as if the only one vptr), but when MI (multiple inheritance) is involved, objects can have more than one vptr, each representing the complete object viewed as a particular polymorphic base class type. (A polymorphic class is a class with a least one virtual function.)]
[Note regarding virtual bases: I mention only the vptr, but some compilers insert other pointers to represent aspects of the dynamic type, like the location of virtual base subobjects, and some other compilers use the vptr for that purpose. What is true about the vptr is also true about these other internal pointers.]
So a particular value of the vptr corresponds to a dynamic type: that is the type of most derived object.
Changes of the dynamic type of an object during its lifetime
During construction, the dynamic type changes, and that is why virtual function calls from inside the constructor can be "surprising". Some people say that the rules of calling virtual functions during construction are special, but they are absolutely not: the final overrider is called; that override is the one the class corresponding to the most derived object that has been constructed, and in a constructor C::C(arg-list), it is always the type of the class C.
During destruction, the dynamic type changes, in the reverse order. Calls to virtual function from inside destructors follow the same rules.
What it means when something is left undefined
You can do low level manipulations that are not sanctioned in the standard. That a behavior is not explicitly defined in the C++ standard does not imply that it is not described elsewhere. Just because the result of a manipulation is explicitly described has having UB (undefined behavior) in the C++ standard does not mean your implementation cannot define it.
You can also use your knowledge of the way the compilers work: if strict separate compilation is used, that is when the compiler can get no information from separately compiled code, every separately compiled function is a "black box". You can use this fact: the compiler will have to assume that anything that a separately compiled function could do will be done. Even with inside a given function, you can use asm directive to get the same effects: an asm directive with no constraint can do anything that is legal in C++. The effect is a "forget what you know from code analysis at that point" directive.
The standard describes what can change the dynamic type, and nothing is allowed to change it except construction/destruction, so only an "external" (blackbox) function is is otherwise allowed to perform construction/destruction can change a dynamic type.
Calling constructors on an existing object is not allowed, except to reconstruct it with the exact same type (and with restrictions) see [basic.life]/8 :
If, after the lifetime of an object has ended and before the storage
which the object occupied is reused or released, a new object is
created at the storage location which the original object occupied, a
pointer that pointed to the original object, a reference that referred
to the original object, or the name of the original object will
automatically refer to the new object and, once the lifetime of the
new object has started, can be used to manipulate the new object, if:
(8.1) the storage for the new object exactly overlays the storage
location which the original object occupied, and
(8.2) the new object is of the same type as the original object
(ignoring the top-level cv-qualifiers), and
(8.3) the type of the original object is not const-qualified, and, if
a class type, does not contain any non-static data member whose type
is const-qualified or a reference type, and
(8.4) the original object was a most derived object ([intro.object])
of type T and the new object is a most derived object of type T (that
is, they are not base class subobjects).
This means that the only case where you could call a constructor (with placement new) and still use the same expressions that used to designate the objects (its name, pointers to it, etc.) are those where the dynamic type would not change, so the vptr would still be the same.
On other words, if you want to overwrite the vptr using low level tricks, you could; but only if you write the same value.
On other words, don't try to hack the vptr.
I am reading something about virtual table. When it comes to pointer __vptr,
it is stated that by the author
Unlike the *this pointer, which is actually a function parameter used by the compiler to resolve self-references, *__vptr is a real pointer. Consequently, it makes each class object allocated bigger by the size of one pointer.
What does it mean here by saying this is actually a function parameter? And this is not a real pointer?
Both pointers are real in the sense that they store an address of something else in memory. By "real" the author means "stored within the class", as opposed to this pointer, which is passed to member functions without being stored in the object itself. Essentially, the pointer to __vptr is part of the object, while this pointer is not.
this is always a hidden implicit formal argument. Practically speaking, every non static member function of a class is getting an implicit first argument which is this
so in
class Foo {
int x; // a field, i.e. an instance variable
void bar(double x);
};
the Foo::bar function has two arguments, exactly as if it was the C (not C++) function
void Foo__bar(Foo* mythis, double x);
And actually, name mangling and the compiler is transforming the first into a very close equivalent of the second. (I am using mythis instead of this because this is a keyword in C++).
In principle, the ABI of your implementation could mandate a different passing convention for this (e.g. use another machine register) and for other explicit arguments. In practice, it often does not. On my Linux system the x86-64 ABI (its figure 3.4 page 21) defines a calling convention that passes this (and first pointer formal argument to C function) in %rdi processor register.
Practically speaking, in C++, most -but not all- member functions are small (defined inside the class) and inlined by the optimizing compiler (and the latest C++11 and C++14 standards have been written with optimizing compilers in mind; see also this). In that case, the question of where is this stored becomes practically meaningless... (because of the inlining).
The virtual method table (vtable) is generally an implicit first pointer field (or instance variable) of objects, but things could become more complex, e.g. with virtual multiple inheritance. the vtable data itself (the addresses of virtual functions) is generated by the compiler. See also this answer.
In theory, a C++ implementation could provide the dynamic method dispatching by another mechanism than vtable. In practice, I know no C++ implementation doing that.
As, the title says:
Why is calling non virtual member function on deleted pointer an undefined behavior?
Note the Question does not ask if it is an Undefined Behavior, it asks Why it is undefined behavior.
Consider the following program:
#include<iostream>
class Myclass
{
//int i
public:
void doSomething()
{
std::cout<<"Inside doSomething";
//i = 10;
}
};
int main()
{
Myclass *ptr = new Myclass;
delete ptr;
ptr->doSomething();
return 0;
}
In the above code, the compiler does not actually dereference this while calling member function doSomething(). Note that the function is not an virtual function & the compilers convert the member function call to a usual function call by passing this as the first parameter to the function(As I understand this is implementation defined). They can do so because the compiler can exactly determine which function to call at compile time itself. So practically, calling the member function through deleted pointer does not dereference the this. The this is dereferenced only if any member is accessed inside the function body.(i.e: Uncommenting code in above example that accesses i)
If an member is not accessed within the function there is no purpose that the above code should actually invoke undefined behavior.
So why does the standard mandate that calling the non virtual member function through deleted pointer is an undefined behavior, when in fact it can reliably say that dereferencing the this should be the statement which should cause undefined behavior? Is it merely for sake of simplicity for users of the language that standard simply generalizes it or is there some deeper semantic involved in this mandate?
My feeling is that perhaps since it is implementation defined how compilers can invoke the member function may be that is the reason standard cannot enforce the actual point where UB occurs.
Can someone confirm?
Because the number of cases in which it might be reliable are so slim, and doing it is still an ineffably stupid idea. There's no benefit to defining the behaviour.
So why does the standard mandate that calling the non virtual member function through deleted pointer is an undefined behavior, when in fact it can reliably say that dereferencing the this should be the statement which should cause undefined behavior?
[expr.ref] paragraph 2 says that a member function call such as ptr->doSomething() is equivalent to (*ptr).doSomething() so calling a non-static member function is a dereference. If the pointer is invalid that's undefined behaviour.
Whether the generated code actually needs to dereference the pointer for specific cases is not relevant, the abstract machine that the compiler models does do a dereference in principle.
Complicating the language to define exactly which cases would be allowed as long as they don't access any members would have almost zero benefit. In the case where you can't see the function definition you have no idea if calling it would be safe, because you can't know if the function uses this or not.
Just don't do it, there's no good reason to, and it's a Good Thing that the language forbids it.
In C++ language (according to C++03) the very attempt to use the value of an invalid pointer is causing undefined behavior already. There's no need to dereference it for the UB to happen. Just reading the pointer value is enough. The concept of "invalid value" that causes UB when you merely attempt to read that value actually extends to almost all scalar types, not just to pointers.
After delete the pointer is generally invalid in that specific sense, i.e. reading a pointer that supposedly points to something that has just been "deleted" leads to undefined behavior.
int *p = new int();
delete p;
int *p1 = p; // <- undefined behavior
Calling a member function through an invalid pointer is just a specific case of the above. The pointer is used as an argument for the implicit parameter this. Passing a pointer is an non-reference argument is an act of reading it, which is why the behavior is undefined in your example.
So, your question really boils down to why reading invalid pointer values causes undefined behavior.
Well, there could be many platform-specific reasons for that. For example, on some platforms the act of reading a pointer might lead to the pointer value being loaded into some dedicated address-specific register. If the pointer is invalid, the hardware/OS might detect it immediately and trigger a program fault. In fact, this is how our popular x86 platform works with regard to segment registers. The only reason we don't hear much about it is that the popular OSes stick to flat memory model that simply does not actively use segment registers.
C++11 actually states that dereferencing invalid pointer values causes undefined behavior, while all other uses of invalid pointer value cause implementation-defined behavior. It also notes that implementation-defined behavior in case of "copying an invalid pointer" might lead to "a system-generated runtime fault". So it might actually be possible to carefully maneuver one's way through the labyrinth of C++11 specification and successfully arrive at the conclusion that calling a non-virtual method through an invalid pointer should result in implementation-defined behavior mentioned above. By in any case the possibility of "a system-generated runtime fault" will always be there.
Dereferencing of this in this case is effectively an implementation detail. I'm not saying that the this pointer is not defined by the standard, because it is, but from a semantically abstracted standpoint what is the purpose of allowing the use of objects that have been destroyed, just because there is a corner case in which in practice it will be "safe"? None. So it's not. No object exists, so you may not call a function on it.
vtable contains pointers to virtual functions of that class. Does it also contains pointers to non-virtual functions as well?
Thx!
It's an implementation detail, but no. If an implementation put pointers to non-virtual functions into a vtable it couldn't use these pointers for making function calls because it would often cause incorrect non-virtual functions to be called.
When a non-virtual function is called the implementation must use the static type of the object on which the function is being called to determine the correct function to call. A function stored in a vtable accessed by a vptr will be dependent on the dynamic type of the object, not any static type of a reference or pointer through which it is being accessed.
No, it doesn't.
As calls to non-virtual methods can be resolved during compilation (since compiler knows the addresses of non virtual functions), the compiler generates instructions to call them 'directly' (i.e. statically).
There is no reason to go through vtable indirection mechanism for methods which are known during compiling.
Whether or not a "vtable" is used by any implementation isn't defined by the standard. Most implementations use a table of function pointers although the functions pointed to are typically not directly those being called (instead, the pointed to function may adjust the pointer before calling the actual function).
Whether or not non-virtual functions show up in this table is also not defined by standard. After all, the standard doesn't even require the existence of a vtable. Normally, non-virtual function are not in a virtual function table since any necessary pointer adjustments and call can be resolved at compile- or link-time. I could imagine an implementation treating all functions similarly and, thus, using a pointer in the virtual function table in all cases. I wouldn't necessary be very popular. However, it might be a good way to implement C++ in an environment where it seamlessly interacts with a more flexible object system, e.g., languages where individual functions can be replaced at run-time (my understanding is that something like this is possible, e.g., in python).
No. A vtable only contains pointers to virtual functions in the same class or file.