Inheritance in C++ internals - c++

Can some one explain me how inheritance is implemented in C++ ?
Does the base class gets actually copied to that location or just refers to that location ?
What happens if a function in base class is overridden in derived class ? Does it replace it with the new function or copies it in other location in derived class memory ?

first of all you need to understand that C++ is quite different to e.g. Java, because there is no notion of a "Class" retained at runtime. All OO-features are compiled down to things which could also be achieved by plain C or assembler.
Having said this, what acutally happens is that the compiler generates kind-of a struct, whenever you use your class definition. And when you invoke a "method" on your object, actually the compiler just encodes a call to a function which resides somewhere in the generated executable.
Now, if your class inherits from another class, the compiler somehow includes the fields of the baseclass in the struct he uses for the derived class. E.g. it could place these fields at the front and place the fields corresponding to the derived class after that. Please note: you must not make any assumptions regarding the concrete memory layout the C++ compiler uses. If you do so, you're basically on your own and loose any portability.
How is the inheritance implemented? well, it depends!
if you use a normal function, then the compiler will use the concrete type he's figured out and just encode a jump to the right function.
if you use a virtual function, the compiler will generate a vtable and generate code to look up a function pointer from that vtable, depending on the run time type of the object
This distinction is very important in practice. Note, it is not true that inheritance is allways implemented through a vtable in C++ (this is a common gotcha). Only if you mark a certain member function as virtual (or have done so for the same member function in a baseclass), then you'll get a call which is directed at runtime to the right function. Because of this, a virtual function call is much slower than a non-virtual call (might be several hundered times)

Inheritance in C++ is often accomplished via the vtable. The linked Wikipedia article is a good starting point for your questions. If I went into more detail in this answer, it would essentially be a regurgitation of it.

Related

C++ Method Override and Overloading (Compiler level)

I know what the difference between the two. Overriding basically lets you "redefine" your a method in a child class and overloading basically lets you "redefine" your method with different arguments or parameters. I'm a little confused on what's going on under the hood though. I read that when you overload a method, the compiler will have all the overloaded methods and find the best match or report an error if none exists. This is obviously done during compile time but I'm confused on how Override works. I've read that handling overrides is extremely hard because you'll have to check if the return type matches with the class hierarchy and there can be a lot of class levels to check
(ie. class Living is the super class of Human and Animal. Human and Animal can have many derived classes which means we will have a deep level of classes).
Without getting too detailed, how does overriding work at the compiler level and why is it that overriding is done during run time and not compile time?
It depends on if the overridden method is virtual or not. If the overridden method is not virtual, then under the hood it usually works in the same way as overloading, the compiler looks at the static type of the object and calls the correct function based on that.
For objects with virtual methods a vtable is usually used. This is a collection of function pointers to the virtual methods. The reason this is done at run time is to allow for runtime polymorphism. The usual way that a vtable is generate is the compilier will generate a single vtable for each class and populate it with the required pointers at compile time and include this in the executable. The constructor will then set a hidden pointer in the class to point to the correct vtable. When looking up methods it first dereferences the hidden pointer to find the vtable then dereferences the correct slot from the vtable.

Modification of base class needs subclases recompilation?

suppose I have a base class and a derived class. I have compiled my program and running it. Now suppose I want to do some changes in my base class.
My questions are:
Question 1: If i separately do the changes in base class file only and recompile only base class then whether the changes will reflect in derived class objects also which are already instantiated or do i need to recompile derived class also. Other way of asking this question could be whether the copy is created to base class member functions or pointers are stored so that changes automatically gets reflected ??
Question 2: If not updated automatically, then is there any way to do this ??
C++ doesn't have reflection, so you need to recompile the whole thing.
This isn't well-defined by the language (since it doesn't address dynamic linking), but we can lay out some cases that may work, and some that almost certainly won't.
Should work:
changing a non-inlined function body in base.cpp
so long as cross-module/link-time inlining isn't enabled
assuming the derived class doesn't depends only on the interface and not on changed behaviour
adding static or non-virtual methods
beware of changing overload resolution though
Likely to fail horribly:
changing the prototype of any method or constructor used in the derived class
this includes changes that wouldn't normally be visible in calling code (such as adding defaulted arguments) or changing the type of an argument even where there is an implicit conversion, etc.
adding, removing or re-ordering virtual methods
adding, removing or re-ordering data members or base classes
There are several assumptions underlying these:
your base class and derived class are in separate dynamic libs (eg. base.so and derived.so). This isn't clear from your question
the only reasons for runtime compatibility to break are
because the layout in memory changed (ie, you modified the base-class instance size, or the vtable size, or the size or order of members)
because the code that would be generated at a call site changed (ie, because you added arguments with default values, or changed an argument to an implicitly-convertible type)
If the first assumption isn't true, the question seems pointless, since you'll have to rebuild the whole lib anyway.
The second assumption could be broken if you change or upgrade your compiler, or change the compiler flags, or the version of any other library you link against.
With respect to inlining, you can get horribly subtle bugs if different dynamic libs have inlined different versions of the code. You really won't enjoy trying to debug those.
C++ is a statically compiled language. That means that every type checking is done at compile time, so if you modify a type, every line of code that depends on the modification must be recompiled. It includes base class modification and subclases, as in your case.
Note that it could be a problem, because if you are writting an API, and you modify the API implementation, the API and every code that uses the code you have modified (The user code) must be recompiled.
The classic thechnique to reduce recompilation is the PIMPL idiom.
PIMPL hides the implementation of the class through a pointer to a implementation class stored as member of the original class. Note that the original class acts only as a interface. So if the implementation is modified, the interface not, so users of the class not need to recompile.

Why bother with virtual functions in c++?

This is not a question about how they work and declared, this I think is pretty much clear to me. The question is about why to implement this?
I suppose the practical reason is to simplify bunch of other code to relate and declare their variables of base type, to handle objects and their specific methods from many other subclasses?
Could this be done by templating and typechecking, like I do it in Objective C? If so, what is more efficient? I find it confusing to declare object as one class and instantiate it as another, even if it is its child.
SOrry for stupid questions, but I havent done any real projects in C++ yet and since I am active Objective C developer (it is much smaller language thus relying heavily on SDK's functionalities, like OSX, iOS) I need to have clear view on any parallel ways of both cousins.
Yes, this can be done with templates, but then the caller must know what the actual type of the object is (the concrete class) and this increases coupling.
With virtual functions the caller doesn't need to know the actual class - it operates through a pointer to a base class, so you can compile the client once and the implementor can change the actual implementation as much as it wants and the client doesn't have to know about that as long as the interface is unchanged.
Virtual functions implement polymorphism. I don't know Obj-C, so I cannot compare both, but the motivating use case is that you can use derived objects in place of base objects and the code will work. If you have a compiled and working function foo that operates on a reference to base you need not modify it to have it work with an instance of derived.
You could do that (assuming that you had runtime type information) by obtaining the real type of the argument and then dispatching directly to the appropriate function with a switch of shorts, but that would require either manually modifying the switch for each new type (high maintenance cost) or having reflection (unavailable in C++) to obtain the method pointer. Even then, after obtaining a method pointer you would have to call it, which is as expensive as the virtual call.
As to the cost associated to a virtual call, basically (in all implementations with a virtual method table) a call to a virtual function foo applied on object o: o.foo() is translated to o.vptr[ 3 ](), where 3 is the position of foo in the virtual table, and that is a compile time constant. This basically is a double indirection:
From the object o obtain the pointer to the vtable, index that table to obtain the pointer to the function and then call. The extra cost compared with a direct non-polymorphic call is just the table lookup. (In fact there can be other hidden costs when using multiple inheritance, as the implicit this pointer might have to be shifted), but the cost of the virtual dispatch is very small.
I don't know the first thing about Objective-C, but here's why you want to "declare an object as one class and instantiate it as another": the Liskov Substitution Principle.
Since a PDF is a document, and an OpenOffice.org document is a document, and a Word Document is a document, it's quite natural to write
Document *d;
if (ends_with(filename, ".pdf"))
d = new PdfDocument(filename);
else if (ends_with(filename, ".doc"))
d = new WordDocument(filename);
else
// you get the point
d->print();
Now, for this to work, print would have to be virtual, or be implemented using virtual functions, or be implemented using a crude hack that reinvents the virtual wheel. The program need to know at runtime which of various print methods to apply.
Templating solves a different problem, where you determine at compile time which of the various containers you're going to use (for example) when you want to store a bunch of elements. If you operate on those containers with template functions, then you don't need to rewrite them when you switch containers, or add another container to your program.
A virtual function is important in inheritance. Think of an example where you have a CMonster class and then a CRaidBoss and CBoss class that inherit from CMonster.
Both need to be drawn. A CMonster has a Draw() function, but the way a CRaidBoss and a CBoss are drawn is different. Thus, the implementation is left to them by utilizing the virtual function Draw.
Well, the idea is simply to allow the compiler to perform checks for you.
It's like a lot of features : ways to hide what you don't want to have to do yourself. That's abstraction.
Inheritance, interfaces, etc. allow you to provide an interface to the compiler for the implementation code to match.
If you didn't have the virtual function mecanism, you would have to write :
class A
{
void do_something();
};
class B : public A
{
void do_something(); // this one "hide" the A::do_something(), it replace it.
};
void DoSomething( A* object )
{
// calling object->do_something will ALWAYS call A::do_something()
// that's not what you want if object is B...
// so we have to check manually:
B* b_object = dynamic_cast<B*>( object );
if( b_object != NULL ) // ok it's a b object, call B::do_something();
{
b_object->do_something()
}
else
{
object->do_something(); // that's a A, call A::do_something();
}
}
Here there are several problems :
you have to write this for each function redefined in a class hierarchy.
you have one additional if for each child class.
you have to touch this function again each time you add a definition to the whole hierarcy.
it's visible code, you can get it wrong easily, each time
So, marking functions virtual does this correctly in an implicit way, rerouting automatically, in a dynamic way, the function call to the correct implementation, depending on the final type of the object.
You dont' have to write any logic so you can't get errors in this code and have an additional thing to worry about.
It's the kind of thing you don't want to bother with as it can be done by the compiler/runtime.
The use of templates is also technically known as polymorphism from theorists. Yep, both are valid approach to the problem. The implementation technics employed will explain better or worse performance for them.
For example, Java implements templates, but through template erasure. This means that it is only apparently using templates, under the surface is plain old polymorphism.
C++ has very powerful templates. The use of templates makes code quicker, though each use of a template instantiates it for the given type. This means that, if you use an std::vector for ints, doubles and strings, you'll have three different vector classes: this means that the size of the executable will suffer.

Is there a way to determine at runtime if an object can do a method in C++?

In Perl, there is a UNIVERSAL::can method you can call on any class or object to determine if it's able to do something:
sub FooBar::foo {}
print "Yup!\n" if FooBar->can('foo'); #prints "Yup!"
Say I have a base class pointer in C++ that can be any of a number of different derived classes, is there an easy way to accomplish something similar to this? I don't want to have to touch anything in the other derived classes, I can only change the area in the base class that calls the function, and the one derived class that supports it.
EDIT: Wait, this is obvious now (nevermind the question), I could just implement it in the base that returns a number representing UNIMPLEMENTED, then check that the return is not this when you call it. I'm not sure why I was thinking of things in such a complicated manner.
I was also thinking I would derive my class from another one that implemented foo then see if a dynamic cast to this class worked or not.
If you have a pointer or reference to a base class, you can use dynamic_cast to see which derived class it is (and therefore which derived class's methods it supports).
If you can add methods to the base class, you can add a virtual bool can_foo() {return false;} and override it in the subclass that has foo to return true.
C++ does not have built in run-time reflection. You are perfectly free to build your own reflection implementation into your class hierarchy. This usually involves a static map that gets populated with a list of names and functions. You have to manually register each function you want available, and have consistency as to the calling convention and function signature.
I believe the most-correct way would be to use the typeid<> operator and get a reference to the type_info object, and then you could compare that (== operator) to the desired type_info for the data types you wish to care about.
This doesn't give you method-level inspection, and does require that you've built with RTTI enabled (I believe that using typeid<> on an object that was built without RTTI results with "undefined" behavior), but there you are.
MSDN has an online reference to get you started : http://msdn.microsoft.com/en-us/library/b2ay8610%28VS.80%29.aspx

C++ : implications of making a method virtual

Should be a newbie question...
I have existing code in an existing class, A, that I want to extend in order to override an existing method, A::f().
So now I want to create class B to override f(), since I don't want to just change A::f() because other code depends on it.
To do this, I need to change A::f() to a virtual method, I believe.
My question is besides allowing a method to be dynamically invoked (to use B's implementation and not A's) are there any other implications to making a method virtual? Am I breaking some kind of good programming practice? Will this affect any other code trying to use A::f()?
Please let me know.
Thanks,
jbu
edit: my question was more along the lines of is there anything wrong with making someone else's method virtual? even though you're not changing someone else's implementation, you're still having to go into someone's existing code and make changes to the declaration.
If you make the function virtual inside of the base class, anything that derives from it will also have it virtual.
Once virtual, if you create an instance of A, then it will still call A::f.
If you create an instance of B and store it in a pointer of type A*. And then you call A*::->f, then it will call B's B::f.
As for side effects, there probably won't be any side effects, other than a slight (unnoticeable) performance loss.
There is a very small side effect as well, there could be a class C that also derives from A, and it may implement C::f, and expect that if A*::->f was called, then it expects A::f to be called. But this is not very common.
But more than likely, if C exists, then it does not implement C::f at all, and in which case everything is fine.
Be careful though, if you are using an already compiled library and you are modifying it's header files, what you are expecting to work probably will not. You will need to recompile the header and source files.
You could consider doing the following to avoid side effects:
Create a type A2 that derives from A and make it's f virtual
Use pointers of type A2 instead of A
Derive B from type A2.
In this way anything that used A will work in the same way guaranteed
Depending on what you need you may also be able to use a has-a relationship instead of a is-a.
There is a small implied performance penalty of a vtable lookup every time a virtual function is called. If it were not virtual, function calls are direct, since the code location is known at compile time. Wheras at runtime, a virtual function address must be referenced from the vtable of the object you're calling upon.
To do this, I need to change A::f() to
a virtual method, I believe.
Nope, you do not need to change it to a virtual method in order to override it. However, if you are using polymorphism you need to, i.e. if you have a lot of different classes deriving from A but stored as pointers to A.
There's also a memory overhead for virtual functions because of the vtable (apart from what spoulson mentioned)
There are other ways of accomplishing your goal. Does it make sense for B to be an A? For example, it makes sense for a Cat to be an Animal, but not for a Cat to be a Dog. Perhaps both A and B should derive from a base class, if they are related.
Is there just common functionality you can factor out? It sounds to me like you'll never be using these classes polymorphically, and just want the functionality. I would suggest you take that common functionality out and then make your two separate classes.
As for cost, if you're using A ad B directly, the compile will by-pass any virtual dispatching and just go straight to the functions calls, as if they were never virtual. If you pass a B into a place expecting `A1 (as a reference or pointer), then it will have to dispatch.
There are 2 performance hits when speaking about virtual methods.
vtable dispatching, its nothing to really worry about
virtual functions are never inlined, this can be much worse than the previous one, function inlining is something that can really speed things in some situations, it can never happen with a virtual function.
How kosher it is to change somebody else's code depends entirely on the local mores and customs. It isn't something we can answer for you.
The next question is whether the class was designed to be inherited from. In many cases, classes are not, and changing them to be useful base classes, without changing other aspects, can be tricky. A non-base class is likely to have everything private except the public functions, so if you need to access more of the internals in B you'll have to make more modifications to A.
If you're going to use class B instead of class A, then you can just override the function without making it virtual. If you're going to create objects of class B and refer to them as pointers to A, then you do need to make f() virtual. You also should make the destructor virtual.
It is good programming practise to use virtual methods where they are deserved. Virtual methods have many implications as to how sensible your C++ Class is.
Without virtual functions you cannot create interfaces in C++. A interface is a class with all undefined virtual functions.
However sometimes using virtual methods is not good. It doesn't always make sense to use a virtual methods to change the functionality of an object, since it implies sub-classing. Often you can just change the functionality using function objects or function pointers.
As mentioned a virtual function creates a table which a running program will reference to check what function to use.
C++ has many gotchas which is why one needs to be very aware of what they want to do and what the best way of doing it is. There aren't as many ways of doing something as it seems when compared to runtime dynamic OO programming languages such as Java or C#. Some ways will be either outright wrong, or will eventually lead to undefined behavior as your code evolves.
Since you have asked a very good question :D, I suggest you buy Scott Myer's Book: Effective C++, and Bjarne Stroustrup's book: The C++ Programming Language. These will teach you the subtleties of OO in C++ particularly when to use what feature.
If thats the first virtual method the class is going to have, you're making it no longer a POD. This can break things, although the chances for that are slim.
POD: http://en.wikipedia.org/wiki/Plain_old_data_structures