Do derived objects cast from base need to use a vtable - c++

If i call an inherited method on a derived class instance does the code require the use of a vtable? Or can the method calls be 'static' (Not sure if that is the correct usage of the word)
For example:
Derived derived_instance;
derived_instance.virtual_method_from_base_class();
I am using msvc, but i guess that most major compilers implement this roughly the same way.
I am (now) aware that the behavior is implementation-specific, i'm curious about the implementation.
EDIT:
I should probaby add that the reason that we are interested is that the function is called a lot of times, and it is very simple, and i am not allowed to edit the function itself in any way, i was just wondering if would be possible, and if there would be any benifit to eliminating the dynamic-dispach anyway.
I have profiled and counted functions etc etc before you all get on my backs about optomization.

Both of your examples will require that Derived has a constructor accepting a Base and create a new instance of Derived. Assuming that you have such a constructor and that this is what you want, then the compiler would "probably" be able to determine the dynamic object type statically and avoid the virtual call (if it decides to make such optimizations).
Note that the behavior is not undefined, it's just implementation-specific. There's a huge difference between the two.
If you want to avoid creating a new instance (or, more likely, that's not what you want) then you could use a reference cast static_cast<Derived&>(base_instance).virtual_method_from_base_class(); but while that avoids creating a new object it won't allow you to avoid the virtual call.
If you really want to cast at compile time what you're looking for is most likely the CRTP http://en.wikipedia.org/wiki/Curiously_recurring_template_pattern which allows you to type everything at compile time, avoiding virtual calls.
EDIT for updated question: In the case you've shown now, I would suspect many compilers capable of statically determining the dynamic type and avoiding the virtual call.

Vtable only come into play when you use pointers or references. For objects, it's always the specific class method which is invoked.

You can simply qualify the call, then there is no virtual function dispatch:
Derived derived_instance;
derived_instance.Derived::virtual_method_from_base_class();
However, I suspect that that would be premature optimization.
Do measure.

Related

How can I form a `final` reference to a non-final class?

final is an excellent keyword. It lets me prevent inheritance from a class. It also lets the compiler skip the runtime dispatch mechanisms when calling virtual functions or accessing virtual bases.
Suppose now that I have some non-final class T with virtual functions and/or base classes. Suppose also that I have a reference to an instance of this class. How could I tell the compiler that this particular reference is to the fully-derived complete object, and not to a base sub-object of a more derived type?
My motivations are classes like optional and vector. If I invoke optional<T>::operator*(), I get a T&. However, I know with certainty that this reference really is a T, not some more derived object. The same applies to vector<T> and all the ways I have of accessing its elements.
I think it would be a great optimization to skip the dynamic dispatch in such cases, especially in debug mode and on compilers not smart enough to look through the optional and vector implementations and devirtualize the calls.
Formally, you can do this:
void final(A &a) {
static_cast<A*>(dynamic_cast<void*>(&a))->foo();
}
dynamic_cast<void*> returns a pointer to the most-derived type (and static_cast from void* cannot select a polymorphic base class), so the compiler can know that A::foo is being called. However, compilers don’t seem to take advantage of this information; they even generate the obvious extra instructions to actually perform the dynamic cast (even though it’ll of course be undefined behavior if it fails).
You can, certainly, devirtualize yourself by actually writing a.A::foo() whenever verbosity and genericity permit.

Example of polymorphism preventing compiler optimization?

Cannot remember where I saw it now- but somewhere I read that dynamic polymorphism prevents the compiler from making various optimizations.
Besides inlining, could somebody please englighten me with any examples of such "missed" optimization opportunities which polymorphism prevents the compiler from making?
With:
Derived d;
d.vMethod(); // that will call Derived::vMethod statically (allowing inlining).
With (unless one of Derived or Derived::vMethod is declared final in C++11):
void foo(Derived& d)
{
d.vMethod(); // this will call virtually vMethod (disallowing inlining).
}
Virtual call has an additional cost (as indirection through the vtable).
C++11 introduces final keyword which may turn the last example in static call.
At least in C++, polymorphic objects must be in the form of pointers or references. Sometimes this prevents the possibility of putting them on stack variables, or List types, you need to use List. Stack variables spare dynamic allocations etc.
A call to Poly.vmethod() is always resolved at compile time, even if vmethod() is virtual, while Poly->vmethod() consults the virtual method table. (Well, if the method is virtual it is meant to be polymorphic. Static methods are statically resolved in either case.)
Return value optimization (RVO) is another trick that does not have place when returning pointers or references. RVO is typically implemented by passing a hidden parameter: a pointer to a memory region, that is filled with the "returned" object. The size and the type of this region must be perfectly known at compile time.

performance implications of deep inheritance tree in c++

Is there any efficiency disadvantage associated with deep inheritance trees (in c++), i.e, a large set of classes A, B, C, and so on, such that B extends A, C extends B, and so one. One efficiency implication that I can think of is, that when we instantiate the bottom most class, say C, then the constructors of B and A are also called, which will have performance implications.
Let's enumerate the operations we should consider:
Construction/destruction
Each constructor/destructor will call its base class equivalents. However, as James McNellis pointed out, you were obviously going to do that work anyway. You didn't derived from A just because it was there. So the work is going to get done one way or another.
Yes, it will involve a few more function calls. But function call overhead will be nothing compared to the actual work any significantly deep class hierarchy will have to actually do. If you're at the point where function call overhead is actually important for performance, I would strongly suggest that calling constructors at all is probably not what you want to be doing in that code.
Object Size
In general, the overhead for a derived class is nothing. The overhead for virtual members is a pointer or for virtual inheritance.
Member Function Calls, Static
By this, I mean calling non-virtual member functions, or calling virtual member functions with class names (ClassName::FunctionName syntax). Both of these allow the compiler to know at compile time which function to call.
The performance of this is invariant with the size of the hierarchy, since it's compile-time determined.
Member Function Calls, Dynamic
This is calling virtual functions with the full and complete expectation of runtime calls.
Under most sane C++ implementations, this is invariant with the size of the object hierarchy. Most implementations use a v-table for each class. Each object has a v-table pointer as a member. For any particular dynamic call, the compiler accesses the v-table pointer, picks out the method, and calls it. Since the v-table is the same for each class, it won't be any slower for a class that has a deep hierarchy than one with a shallow one.
Virtual inheritance plays a bit with this.
Pointer Casts, Static
This refers to static_cast or any equivalent operation. This means the implicit cast from a derived class to a base class, the explicit use of static_cast or C-style casts, etc.
Note that this technically includes reference casting.
The performance of static casts between classes (up or down) is invariant with the size of the hierarchy. Any pointer offsets will be compile-time generated. This should be true for virtual inheritance as well as non-virtual inheritance, but I'm not 100% certain of that.
Pointer Casts, Dynamic
This obviously refers to the explicit use of dynamic_cast. This is typically used when casting from a base class to a derived one.
The performance of dynamic_cast will likely change for a large hierarchy. But sane implementations should only check the classes between the current class and the requested one. So it's simply linear in the number of classes between the two, not linear in the number of classes in the hierarchy.
Typeof
This means the use of the typeof operator to fetch the std::type_info object associated with an object.
The performance of this will be invariant with the size of the hierarchy. If the class is a virtual one (has virtual functions or virtual base classes), then it will simply pull it out of the vtable. If it's not virtual, then it's compile-time defined.
Conclusion
In short, most operations are invariant with the size of the hierarchy. But even in the cases where it has an impact, it's not a problem.
I'd be more concerned with some design ethic where you felt the need to build such a hierarchy. In my experience, hierarchies like this come from two lines of design.
The Java/C# ideal of having everything derived from a common base class. This is a horrible idea in C++ and should never be used. Each object should derive from what it needs to, and only that. C++ was built on the "pay for what you use" principle, and deriving from a common base works against that. In general, anything you could do with such a common base class is either something you shouldn't be doing period, or something that could be done with function overloading (using operator<< to convert to strings, for example).
Misuse of inheritance. Using inheritance when you should be using containment. Inheritance creates an "is a" relationship between objects. More often than not, "has a" relationships (one object having another as a member) are far more useful and flexible. They make it easier to hide data, and you don't allow the user to pretend one class is another.
Make sure that your design does not fall afoul of one of these principles.
There will be but not as bad as the programmer performance implications.
As #Nicol points out, it may be doing a number of things.
If those are things that you require to be done, regardless of design, because they are all precisely necessary steps in getting the program from call main to exit within the fewest possible cycles, then your design is simply a matter of coding clarity (or maybe lack of it :).
In my experience performance tuning, as in this example, what I often see as a huge source of wasted time is over-design of data (i.e. class) structures.
Wierdly enough, the justification for the data structures is often (guess what?) - performance!
In my experience, the thing to do with data structure is keep it as simple as possible and as normalized as possible. If it is completely normalized, then any single change to it can't make it inconsistent. You can't always achieve complete normality, in which case you have to deal with the possibility that the data can be temporarily inconsistent.
This is why people write notification handlers, and this is encouraged in OOP.
The idea is, if you change something in one place, that can trigger notifications that "automatically" propagate the change to other places, trying to maintain consistency.
The problem with notifications is they can run away. Simply changing some boolean property from true to false can cause a fire-storm of notifications ripping through the data structure in ways no one programmer understands, updating databases, painting windows, zipping files, etc. etc. I often find this is where most clock cycles go.
I think it is simpler and far more efficient to temporarily tolerate inconsistency, and periodically repair it with some kind of sweeping process.
Another way data structures go along with huge inefficiency is if the data is effectively being interpreted by some process to produce some output.
This is very common in graphics.
If the data changes at a very slow rate, it may make sense to "compile" it rather than "interpret" it.
In other words, translate it into a simpler instruction set, or source code which is compiled "on the fly", which can then execute far more quickly to produce the desired output.

Should I use virtual 'Initialize()' functions to initialize an object of my class?

I'm currently having a discussion with my teacher about class design and we came to the point of Initialize() functions, which he heavily promotes. Example:
class Foo{
public:
Foo()
{ // acquire light-weight resources only / default initialize
}
virtual void Initialize()
{ // do allocation, acquire heavy-weight resources, load data from disk
}
// optionally provide a Destroy() function
// virtual void Destroy(){ /*...*/ }
};
Everything with optional parameters of course.
Now, he also puts emphasis on extendability and usage in class hierarchies (he's a game developer and his company sells a game engine), with the following arguments (taken verbatim, only translated):
Arguments against constructors:
can't be overridden by derived classes
can't call virtual functions
Arguments for Initialize() functions:
derived class can completely replace initialization code
derived class can do the base class initialization at any time during its own initialization
I have always been taught to do the real initialization directly in the constructor and to not provide such Initialize() functions. That said, I for sure don't have as much experience as he does when it comes to deploying a library / engine, so I thought I'd ask at good ol' SO.
So, what exactly are the arguments for and against such Initialize() functions? Does it depend on the environment where it should be used? If yes, please provide reasonings for library / engine developers or, if you can, even game developer in general.
Edit: I should have mentioned, that such classes will be used as member variables in other classes only, as anything else wouldn't make sense for them. Sorry.
For Initialize: exactly what your teacher says, but in well-designed code you'll probably never need it.
Against: non-standard, may defeat the purpose of a constructor if used spuriously. More importantly: client needs to remember to call Initialize. So, either instances will be in an inconsistent state upon construction, or they need lots of extra bookkeeping to prevent client code from calling anything else:
void Foo::im_a_method()
{
if (!fully_initialized)
throw Unitialized("Foo::im_a_method called before Initialize");
// do actual work
}
The only way to prevent this kind of code is to start using factory functions. So, if you use Initialize in every class, you'll need a factory for every hierarchy.
In other words: don't do this if it's not necessary; always check if the code can be redesigned in terms of standard constructs. And certainly don't add a public Destroy member, that's the destructor's task. Destructors can (and in inheritance situations, must) be virtual anyway.
I"m against 'double initialization' in C++ whatsoever.
Arguments against constructors:
can't be overridden by derived classes
can't call virtual functions
If you have to write such code, it means your design is wrong (e.g. MFC). Design your base class so all the necessary information that can be overridden is passed through the parameters of its constructor, so the derived class can override it like this:
Derived::Derived() : Base(GetSomeParameter())
{
}
This is a terrible, terrible idea. Ask yourself- what's the point of the constructor if you just have to call Initialize() later? If the derived class wants to override the base class, then don't derive.
When the constructor finishes, it should make sense to use the object. If it doesn't, you've done it wrong.
One argument for preferring initialization in the constructor: it makes it easier to ensure that every object has a valid state. Using two-phase initialization, there's a window where the object is ill-formed.
One argument against using the constructor is that the only way of signalling a problem is through throwing an exception; there's no ability to return anything from a constructor.
Another plus for a separate initialization function is that it makes it easier to support multiple constructors with different parameter lists.
As with everything this is really a design decision that should be made with the specific requirements of the problem at hand, rather than making a blanket generalization.
A voice of dissension is in order here.
You might be working in an environment where you have no choice but to separate construction and initialization. Welcome to my world. Don't tell me to find a different environment; I have no choice. The preferred embodiment of the products I create is not in my hands.
Tell me how to initialize some aspects of object B with respect to object C, other aspects with respect to object A; some aspects of object C with respect to object B, other aspects with respect to object A. The next time around the situation may well be reversed. I won't even get into how to initialize object A. The apparently circular initialization dependencies can be resolved, but not by the constructors.
Similar concerns goes for destruction versus shutdown. The object may need to live past shutdown, it may need to be reused for Monte Carlo purposes, and it might need to be restarted from a checkpoint dumped three months ago. Putting all of the deallocation code directly in the destructor is a very bad idea because it leaks.
Forget about the Initialize() function - that is the job of the constructor.
When an object is created, if the construction passed successfully (no exception thrown), the object should be fully initialized.
While I agree with the downsides of doing initialization exclusively in the constructor, I do think that those are actually signs of bad design.
A deriving class should not need to override base class initialization behaviour entirely. This is a design flaw which should be cured, rather than introducing Initialize()-functions as a workaround.
Not calling Initialize may be easy to do accidentally and won't give you a properly constructed object. It also doesn't follow the RAII principle since there are separate steps in constructing/destructing the object: What happens if Initialize fails (how do you deal with the invalid object)?
By forcing default initialization you may end up doing more work than doing initialization in the constructor proper.
Ignoring the RAII implications, which others have adequately covered, a virtual initialization method greatly complicates your design. You can't have any private data, because for the ability to override the initialization routine to be at all useful, the derived object needs access to it. So now the class's invariants are required to be maintained not only by the class, but by every class that inherits from it. Avoiding that sort of burden is part of the point behind inheritance in the first place, and the reason constructors work the way they do with regard to subobject creation.
Others have argued at length against the use of Initialize, I myself see one use: laziness.
For example:
File file("/tmp/xxx");
foo(file);
Now, if foo never uses file (after all), then it's completely unnecessary to try and read it (and would indeed be a waste of resources).
In this situation, I support Lazy Initialization, however it should not rely on the client calling the function, but rather each member function should check if it is necessary to initialize or not. In this example name() does not require it, but encoding() does.
Only use initialize function if you don't have the data available at point of creation.
For example, you're dynamically building a model of data, and the data that determines the object hierarchy must be consumed before the data that describes object parameters.
If you use it, then you should make the constructor private and use factory methods instead that call the initialize() method for you. For example:
class MyClass
{
public:
static std::unique_ptr<MyClass> Create()
{
std::unique_ptr<MyClass> result(new MyClass);
result->initialize();
return result;
}
private:
MyClass();
void initialize();
};
That said, initializer methods are not very elegant, but they can be useful for the exact reasons your teacher said. I would not consider them 'wrong' per se. If your design is good then you probably will never need them. However, real-life code sometimes forces you to make compromises.
Some members simply must have values at construction (e.g. references, const values, objects designed for RAII without default constructors)... they can't be constructed in the initialise() function, and some can't be reassigned then.
So, in general it's not a choice of constructor vs. initialise(), it's a question of whether you'll end up having code split between the two.
Of bases and members that could be initialised later, for the derived class to do it implies they're not private; if you go so far as to make bases/members non-private for the sake of delaying initialisaton you break encapsulation - one of the core principles of OOP. Breaking encapsulation prevents base class developer(s) from reasoning about the invariants the class should protect; they can't develop their code without risking breaking derived classes - which they might not have visibility into.
Other times it's possible but sometimes inefficient if you must default construct a base or member with a value you'll never use, then assign it a different value soon after. The optimiser may help - particularly if both functions are inlined and called in quick succession - but may not.
[constructors] can't be overridden by derived classes
...so you can actually rely on them doing what the base class needs...
[constructors] can't call virtual functions
The CRTP allows derived classes to inject functionality - that's typically a better option than a separate initialise() routine, being faster.
Arguments for Initialize() functions:
derived class can completely replace initialization code
I'd say that's an argument against, as above.
derived class can do the base class initialization at any time during its own initialization
That's flexible but risky - if the base class isn't initialised the derived class could easily end up (due to oversight during the evolution of the code) calling something that relies on that base being initialised and consequently fails at run time.
More generally, there's the question of reliable invocation, usage and error handling. With initialise, client code has to remember to call it with failures evident at runtime not compile time. Issues may be reported using return types instead of exceptions or state, which can sometimes be better.
If initialise() needs to be called to set say a pointer to nullptr or a value safe for the destructor to delete, but some other data member or code throws first, all hell breaks loose.
initialise() also forces the entire class to be non-const in the client code, even if the client just wants to create an initial state and ensure it won't be further modified - basically you've thrown const-correctness out the window.
Code doing things like p_x = new X(values, for, initialisation);, f(X(values, for initialisation), v.push_back(X(values, for initialisation)) won't be possible - forcing verbose and clumsy alternatives.
If a destroy() function is also used, many of the above problems are exacerbated.

When is it appropriate to use virtual methods?

I understand that virtual methods allow a derived class to override methods inherited from a base class. When is it appropriate/inappropriate to use virtual methods? It's not always known whether or not a class will be sub classed. Should everything be made virtual, just "in case?" Or will that cause significant overhead?
First a slightly pedantic remark - in C++ standardese we call them member functions, not methods, though the two terms are equivalent.
I see two reasons NOT to make a member function virtual.
"YAGNI" - "You Ain't Gonna Need It". If you are not sure a class will be derived from, assume it won't be and don't make member functions virtual. Nothing says "don't derive from me" like a non-virtual destructor by the way (edit: In C++11 and up, you have the final keyword] which is even better). It's also about intent. If it's not your intent to use the class polymorphically, don't make anything virtual. If you arbitrarily make members virtual you are inviting abuses of the Liskov Substitution Principle and those classes of bugs are painful to track down and solve.
Performance / memory footprint. A class that has no virtual member functions does not require a VTable (virtual table, used to redirect polymorphic calls through a base class pointer) and thus (potentially) takes up less space in memory. Also, a straight member function call is (potentially) faster than a virtual member function call.
Don't prematurely pessimize your class by pre-emptively making member functions virtual.
When you design a class you should have a pretty good idea as to whether it represents an interface (in which case you mark the appropriate overrideable methods and destructor virtual) OR it's intended to be used as-is, possibly composing or composed with other objects.
In other words your intent for the class should be your guide. Making everything virtual is often overkill and sometimes misleading regarding which methods are intended to support runtime polymorphism.
It's a tricky question. But there are some guidelines / rule of thumbs to follow.
As long as you do not need to derive from a class, then don't write any virtual method, once you need to derive, only make virtual those methods you need to customize in the child class.
If a class has a virtual method, then the destructor shall be virtual (end of discussion).
Try to follow NVI (Non-Virtual Interface) idiom, make virtual method non-public and provide public wrappers in charge of assessing pre and post conditions, so that derived classes cannot accidentally break them.
I think those are simple enough. I definitely let the ABI part of the reflexion away, it's only useful when delivering DLLs.
If your code is following a particular design pattern, then your choice should reflect the DP's own principles. For example, if you are coding a Decorator pattern, the function that should be virtual are the ones that belong to the Component interface.
Otherwise, I'd like to follow an evolutional approach, IOW I don't have virtual methods until I see that a hierarchy is trying to emerge from your code.
For instance member functions in Java are 100% virtual. In C++ it is considered as a code size/function call time penalty. Additionally a non virtual function guarantees that the function implementation will always be the same (using the base class object/reference). Scott Meyers in "Effective C++" discusses it in more details.
A sanity test I mostly use is - In case a class I'm defining is derived from in future, would the behavior (function) remain same or would it need to be redefined. If it would be, the function is a strong contender for being virtual, if not then no, if I do not know - I probably need to look into the problem domain for better understanding of behavior I'm planning to implement. Mostly problem domain gives me the answer - in cases where it does not the behavior is generally non critical.
I guess one possible way to determine quickly would be to consider if you are going to deal with a bunch of similar classes that you are going to use to perform the same tasks, with the change being the way you go about doing those tasks.
One trivial example would be the problem of computing areas for various geometric figures. You need the areas of squares, circles, rectangles, triangles etc. and the only thing that changes here is the math formulae (the way) you use to compute the area. Therefore it would be a good decision to have each of these shapes inherit from a common base class and to add a virtual method in the base class that returns the area (which you can then implement in each of the child with the respective math formula).
Making everything virtual "just in case" will make your objects take up more memory. Additionally, there is a small (but non-zero) overhead when calling virtual functions. So, IMHO, making everything virtual "just in case" would be bad idea when performance/memory constraints are important (which basically means in always every real-world program that you write).
However, this again is debatable based on how clearly the requirements are spelled out and how often code-changes expected. For example, in a quick-and-dirty tool or an initial prototype where a few extra bytes of memory and a few milliseconds of lost time do not mean much, it would be OK to have a bunch (unnecessarily) virtual functions for the sake of flexibility.
My point is if you want to use the parent class pointer to point at child class instance and use their methods, then you should use virtual methods.
Virtual methods are a way to achieve polymorphism. They are used when you want to define some action at a more abstract level such that it is impossible to actually implement because it is too general. Only in derived classes you can tell how to perform that action. But with the definition of a virtual method you create a requirement, that adds rigidity to the hierarchy of classes. This can advisable or not, it depends on what you are trying to obtain, and on your own taste.
Have a look at Design Patterns. If your code/design is one of these or similar go use virtual function. Otherwise, try this