Performance impact of virtual inheritance - c++

I am considering using virtual inheritance in a real-time application. Does using virtual inheritance have a performance impact similar to that of calling a virtual function? The objects in question would only be created at start up but I'm concerned if all functions from the hierarchy would be dispatched via a vtable or if only those from the virtual base class would be.

Common implementations will make access to data members of virtual base classes use an additional indirection.
As James points out in his comments, calling a member function of a base class in a multiple inheritance scenario will need adjustment of the this pointer, and if that base class is virtual, then the offset of the base class sub-object in the derived class's object depends on the dynamic type of the derived class and will need to be calculated at runtime.
Whether this has any visible performance impact on real-world applications depends on many things:
Do virtual bases have data members at all? Often, it's abstract base classes that need to be derived from virtually, and abstract bases that have any data members are often a code smell anyway.
Assuming you have virtual bases with data members, are those accessed in a critical path? If a user clicking on some button in a GUI results in a few dozen additional indirections, nobody will notice.
What would be the alternative if virtual bases are avoided? Not only might the design be inferior, it is also likely that the alternative design has a performance impact, too. It has to achieve the same goal, after all, and TANSTAAFL. Then you traded one performance loss for another plus an inferior design.
Additional note: Have a look at Stan Lippmann's Inside the C++ Object Model, which answers such questions quite thoroughly.

Take a look at the following large scale experimental study published OOPSLA'96. I am copy pasting a bibtex entry, the abstract and a link to the paper. I would consider this the most comprehensive experimental study on the topic to date.
#article{driesen1996direct,
title={{The direct cost of virtual function calls in C++}},
author={Driesen, K. and H{\\"o}lzle, U.},
journal={ACM Sigplan Notices},
volume={31},
number={10},
pages={306--323},
issn={0362-1340},
year={1996},
publisher={ACM}
}
Abstract:
We study the direct cost of virtual function
calls in C++ programs, assuming the standard
implementation using virtual function tables. We
measure this overhead experimentally for a number of
large benchmark programs, using a combination of
executable inspection and processor simulation. Our
results show that the C++ programs measured spend a
median of 5.2% of their time and 3.7% of their
instructions in dispatch code. For “all virtuals”
versions of the programs, the median overhead rises to
13.7% (13% of the instructions). The “thunk” variant
of the virtual function table implementation reduces
the overhead by a median of 21% relative to the
standard implementation. On future processors, these
overheads are likely to increase moderately
http://www.cs.ucsb.edu/~urs/oocsb/papers/oopsla96.pdf

Are you sure you mean virtual inheritance? If so, it's identical to the cost of a normal virtual function call. The vtable chained search just follows a specified path.
You said that this was at startup. Your disk overhead (from simply loading your code into memory) is likely to require orders of magnitude more time than the half-dozen instructions or so for vtable lookups. I'd be somewhat surprised if you could profile this and detect a difference.

Without inspecting compilation or runtime details, based on my test using GNU C++17, accessing data member in virtual base class has no performance inpact.

Related

C++: When is method redefinition preferred over virtual method override? [duplicate]

I know that virtual functions have an overhead of dereferencing to call a method. But I guess with modern architectural speed it is almost negligible.
Is there any particular reason why all functions in C++ are not virtual as in Java?
From my knowledge, defining a function virtual in a base class is sufficient/necessary. Now when I write a parent class, I might not know which methods would get over-ridden. So does that mean that while writing a child class someone would have to edit the parent class. This sounds like inconvenient and sometimes not possible?
Update:
Summarizing from Jon Skeet's answer below:
It's a trade-off between explicitly making someone realize that they are inheriting functionality [which has potential risks in themselves [(check Jon's response)] [and potential small performance gains] with a trade-off for less flexibility, more code changes, and a steeper learning curve.
Other reasons from different answers:
Virtual functions cannot be in-lined because inlining have to happen at runtime. This have performance impacts when you expect you functions benefits from inlining.
There might be potentially other reasons, and I would love to know and summarize them.
There are good reasons for controlling which methods are virtual beyond performance. While I don't actually make most of my methods final in Java, I probably should... unless a method is designed to be overridden, it probably shouldn't be virtual IMO.
Designing for inheritance can be tricky - in particular it means you need to document far more about what might call it and what it might call. Imagine if you have two virtual methods, and one calls the other - that must be documented, otherwise someone could override the "called" method with an implementation which calls the "calling" method, unwittingly creating a stack overflow (or infinite loop if there's tail call optimization). At that point you've then got less flexibility in your implementation - you can't switch it round at a later date.
Note that C# is a similar language to Java in various ways, but chose to make methods non-virtual by default. Some other people aren't keen on this, but I certainly welcome it - and I'd actually prefer that classes were uninheritable by default too.
Basically, it comes down to this advice from Josh Bloch: design for inheritance or prohibit it.
One of the main C++ principles is: you only pay for what you use ("zero overhead principle"). If you don't need the dynamic dispatch mechanism, you shouldn't pay for its overhead.
As the author of the base class, you should decide which methods should be allowed to be overridden. If you're writing both, go ahead and refactor what you need. But it works this way, because there has to be a way for the author of the base class to control its use.
But I guess with modern architectural speed it is almost negligible.
This assumption is wrong, and, I guess, the main reason for this decision.
Consider the case of inlining. C++’ sort function performs much faster than C’s otherwise similar qsort in some scenarios because it can inline its comparator argument, while C cannot (due to use of function pointers). In extreme cases, this can mean performance differences of as much as 700% (Scott Meyers, Effective STL).
The same would be true for virtual functions. We’ve had similar discussions before; for instance, Is there any reason to use C++ instead of C, Perl, Python, etc?
Most answers deal with the overhead of virtual functions, but there are other reasons not to make any function in a class virtual, as the fact that it will change the class from standard-layout to, well, non-standard-layout, and that can be a problem if you need to serialize binary data. That is solved differently in C#, for example, by having structs being a different family of types than classes.
From the design point of view, every public function establishes a contract between your type and the users of the type, and every virtual function (public or not) establishes a different contract with the classes that extend your type. The greater the number of such contracts that you sign the less room for changes that you have. As a matter of fact, there are quite a few people, including some well known writers, that defend that the public interface should never contain virtual functions, as your compromise to your clients might be different from the compromises you require from your extensions. That is, the public interfaces shows what you do for your clients, while the virtual interface shows how others might help you in doing it.
Another effect of virtual functions is that they always get dispatched to the final overrider (unless you explicitly qualify the call), and that means that any function that is needed to maintain your invariants (think the state of the private variables) should not be virtual: if a class extends it, it will have to either make an explicit qualified call back to the parent or else would break the invariants at your level.
This is similar to the example of the infinite loop/stack overflow that #Jon Skeet mentioned, just in a different way: you have to document in each function whether it accesses any private attributes so that extensions will ensure that the function is called at the right time. And that in turn means that you are breaking encapsulation and you have a leaking abstraction: Your internal details are now part of the interface (documentation + requirements on your extensions), and you cannot modify them as you wish.
Then there is performance... there will be an impact in performance, but in most cases that is overrated, and it could be argued that only in the few cases where performance is critical, you would fall back and declare the functions non-virtual. Then again, that might not be simple on a built product, since the two interfaces (public + extensions) are already bound.
You forget one thing. The overhead is also in memory, that is you add a virtual table and a pointer to that table for each object. Now if you have an object which has significant number of instances expected then it is not negligible. example, million instance equals 4 Mega byte. I agree that for simple application this is not much, but for real time devices such as routers this counts.
I'm rather late to the party here, so I'll add one thing that I haven't noticed covered in other answers, and summarise quickly...
Usability in shared memory: a typical implementation of virtual dispatch has a pointer to a class-specific virtual dispatch table in each object. The addresses in these pointers are specific to the process creating them, which means multi-process systems accessing objects in shared memory can't dispatch using another process's object! That's an unacceptable limitation given shared memory's importance in high-performance multi-process systems.
Encapsulation: the ability of a class designer to control the members accessed by client code, ensuring class semantics and invariants are maintained. For example, if you derive from std::string (I may get a few comments for daring to suggest that ;-P) then you can use all the normal insert / erase / append operations and be sure that - provided you don't do anything that's always undefined behaviour for std::string like pass bad position values to functions - the std::string data will be sound. Someone checking or maintaining your code doesn't have to check if you've changed the meaning of those operations. For a class, encapsulation ensures freedom to later modify the implementation without breaking client code. Another perspective on the same statement: client code can use the class any way it likes without being sensitive to the implementation details. If any function can be changed in a derived class, that whole encapsulation mechanism is simply blown away.
Hidden dependencies: when you know neither what other functions are dependent on the one you're overriding, nor that the function was designed to be overridden, then you can't reason about the impact of your change. For example, you think "I've always wanted this", and change std::string::operator[]() and at() to consider negative values (after a type-cast to signed) to be offsets backwards from the end of the string. But, perhaps some other function was using at() as a kind of assertion that an index was valid - knowing it'll throw otherwise - before attempting an insertion or deletion... that code might go from throwing in a Standard-specified way to having undefined (but likely lethal) behaviour.
Documentation: by making a function virtual, you're documenting that it is an intended point of customisation, and part of the API for client code to use.
Inlining - code side & CPU usage: virtual dispatch complicates the compiler's job of working out when to inline function calls, and could therefore provide worse code in terms of both space/bloat and CPU usage.
Indirection during calls: even if an out-of-line call is being made either way, there's a small performance cost for virtual dispatch that may be significant when calling trivially simple functions repeatedly in performance critical systems. (You have to read the per-object pointer to the virtual dispatch table, then the virtual dispatch table entry itself - means the VDT pages are consuming cache too.)
Memory usage: the per-object pointers to virtual dispatch tables may represent significant wasted memory, especially for arrays of small objects. This means less objects fit in cache, and can have a significant performance impact.
Memory layout: it's essential for performance, and highly convenient for interoperability, that C++ can define classes with the exact memory layout of member data specified by network or data standards of various libraries and protocols. That data often comes from outside your C++ program, and may be generated in another language. Such communications and storage protocols won't have "gaps" for pointers to virtual dispatch tables, and as discussed earlier - even if they did, and the compiler somehow let you efficiently inject the correct pointers for your process over incoming data, that would frustrate multi-process access to the data. Crude-but-practical pointer/size based serialisation/deserialisation/comms code would also be made more complicated and potentially slower.
Pay per use (in Bjarne Stroustrup words).
Seems like this question might have some answers Virtual functions should not be used excessively - Why ?. In my opinion the one thing that stands out is that it just add more complexity in terms of knowing what can be done with inheritance.
Yes, it's because of performance overhead. Virtual methods are called using virtual tables and indirection.
In Java all methods are virtual and the overhead is also present. But, contrary to C++, the JIT compiler profiles the code during run-time and can in-line those methods which don't use this property. So, JVM knows where it's really needed and where not thus freeing You from making the decision on your own.
The issues is that while Java compiles to code that runs on a virtual machine, that same guarantee can't be made for C++. It common to use C++ as a more organized replacement for C, and C has a 1:1 translation to assembly.
If you consider that 9 out of 10 microprocessors in the world are not in a personal computer or a smartphone, you'll see the issue when you further consider that there are a lot of processors that need this low level access.
C++ was designed to avoid that hidden deferencing if you didn't need it, thus keeping that 1:1 nature. Some of the first C++ code actually had an intermediate step of being translated to C before running through a C-to-assembly compiler.
Java method calls are far more efficient than C++ due to runtime optimization.
What we need is to compile C++ into bytecode and run it on JVM.

virtual member functions are good or bad for locality in modern CPUs?

Considering the new CPUs with new instructions for moving and new memory controllers, if in C++ I have a vector of Derived objects where Derived is composed of virtual member functions, is this a good or a bad thing for the locality ?
And what if I have a vector of pointers to the base class Base* where I store references to derived objects that are 1-2-3 level up from Base ?
Basically dynamic typing applies to both cases, but which one is better for caching and memory access ?
I have a preference between this 2 but I would like to see a complete answer on the subject.
There is something new to consider as ground-braking from the hardware industry in the last 2-3 years ?
Storing Derived rather than Base * in a vector is better because it eliminates one extra level of indirection and you have all objects laid out «together» in a continuous memory, which in turn makes life easier for a hardware prefetcher, helps with paging, TLB misses, etc. However, if you do this, make sure you don't introduce a slicing problem.
As for the virtual dispatch in this case, it almost does not matter with an exception of adjustment required for «this» pointer. For example, if Derived overrides a virtual function that you are calling and you already have a pointer to Devied *, then «this» adjustment is not required, and otherwise it should be adjusted to one of the base class`s «this» value (this also depends on size of the classes in inheritance hierarchy).
As long as all classes in a vector have the same overloads, CPU would be able to predict what's going on. However, if you have a mix of different implementations, then CPU would have no clue as to what function will be called for every next object, and that might cause performance issues.
And don't forget to always profile before and after you make changes.
Modern CPU's know how to optimise data-dependent jump instructions, as well as it can for data dependent "branch" instructions - the processor will "learn" that "Last time I went through here, I went THIS way", and if it has enough confidence (gone through several times with the same result) it ill keep going that way.
Of course that doesn't help if the instances are a complete random selection of different classes that each have it's own virtual function.
Cache-locality is of course a slightly different matter, and it really depends on whether you are storing the object instances or the pointers/references to instances in the vector.
And of course, an important factor is "what is the alternative?" - if you are using virtual functions "correctly", it means that there is (at least) one less conditional check in a code-path, because the decision was taken at a much earlier stage. That condition would be (assuming the probability corresponds the same) to the branch probability of the decision, if you solve it by some other method - which will be at least as bad for performance as virtual functions with the same probability (chances are that it's worse, because we now have a if (x) foo(); else bar(); type scenario, so we first have to evaluate x then choose the path. obj->vfunc() will just be unpredictable because fetching for the vtable gives an unpredictable result - but at least the vtable itself is cached.

Are there any implementations of C++ that don't use vtables and vptrs? [duplicate]

C++ supports dynamic binding through virtual mechanism. But as I understand the virtual mechanism is an implementation detail of the compiler and the standard just specifies the behaviors of what should happen under specific scenarios. Most compilers implement the virtual mechanism through the virtual table and virtual pointer. This is not about implementation detail of virtual pointers and table. My questions are:
Are there any compilers which implement dynamic dispatch of virtual functions in any other way other than the virtual pointer and virtual table mechanism? As far as I have seen most (read G++, Microsoft Visual Studio) implement it through virtual table, pointer mechanism. So practically are there any other compiler implementations at all?
The sizeof of any class with just a virtual function will be size of an pointer (vptr inside this) on that compiler. So given that virtual pointer and TBL mechanism itself is compiler implementation, will this statement I made above be always true?
It is not true that vtable pointers in objects are always the most efficient. My compiler for another language used to use in-object pointers for similar reasons but no longer does: instead it uses a separate data structure which maps the object address to the required meta-data: in my system this happens to be shape information for use by the garbage collector.
This implementation costs a bit more storage for a single simple object, is more efficient for complex objects with many bases, and it is vastly more efficient for arrays, since only a single entry is required in the mapping table for all objects in the array. My particular implementation can also find the meta-data given a pointer to any point interior to the object.
The actual lookup is extremely fast, and the storage requirements very modest, because I am using the best data structure on the planet: Judy arrays.
I also know of no C++ compiler using anything other than vtable pointers, but it is not the only way. In fact, the initialisation semantics for classes with bases make any implementation messy. This is because the complete type has to see-saw around as the object is constructed. As a consequence of these semantics, complex mixin objects lead to massive sets of vtables being generated, large objects, and slow object initialisation. This probably isn't a consequence of the vtable technique as much as needing to slavishly follow the requirement that the run-time type of a subobject be correct at all times. Actually there's no good reason for this during construction, since constructors are not methods and can't sensibly use virtual dispatch: this isn't so clear to me for destruction since destructors are real methods.
To my knowledge, all C++ implementations use a vtable pointer, although it would be quite easy (and perhaps not so bad perf wise as you might think given caches) to keep a small type-index in the object (1-2 B) and subsequently obtain the vtable and type information with a small table lookup.
Another interesting approach might be BIBOP (http://foldoc.org/BIBOP) -- big bag of pages -- although it would have issues for C++. Idea: put objects of the same type on a page. Get a pointer to the type descriptor / vtable at the top of the page by simply and'ing off the less signficant bits of the object pointer. (Doesn't work well for objects on the stack, of course!)
Another other approach is to encode certain type tags/indices in the object pointers themselves. For example, if by construction all objects are 16-byte aligned, you can use the 4 LSBs to put a 4-bit type tag in there. (Not really enough.) Or (particularly for embedded systems) if you have guaranteed unused more-significant-bits in addresses, you can put more tag bits up there, and recover them with a shift and mask.
While both these schemes are interesting (and sometimes used) for other language implementations, they are problematic for C++. Certain C++ semantics, such as which base class virtual function overrides are called during (base class) object construction and destruction, drive you to a model where there is some state in the object that you modify as you enter base class ctors/dtors.
You may find my old tutorial on the Microsoft C++ object model implementation interesting.
http://www.openrce.org/articles/files/jangrayhood.pdf
Happy hacking!
I don't think there are any modern compilers with an approach other than vptr/vtable. Indeed, it would be hard to figure out something else that is not just plain inefficient.
However, there is still a pretty large room for design tradeoffs within that approach. Maybe especially regarding how virtual inheritance is handled. So it makes sense to make this implementation-defined.
If you are interested in this kind of stuff, I strongly suggest reading Inside the C++ Object Model.
sizeof class depends on the compiler. If you want portable code, don't make any assumptions.
Are there any compilers which implement Virtual Mechanism in any other way other than the virtual pointer and virtual table mechanism? As far as i have seen most(read g++,Microsoft visual studio) implement it through virtual table, pointer mechanism. So practically are there any other compiler implementations at all?
All current compilers that I know of use the vtable mechanism.
This is an optimization that's possible because C++ is statically type checked.
In some more dynamic languages there is instead a dynamic search up the base class chain(s), searching for an implementation of a member function that's called virtually, starting in the most derived class of the object. For example, that's how it worked in original Smalltalk. And the C++ standard describes the effect of a virtual call as if such a search had been used.
In Borland/Turbo Pascal in the 1990's such dynamic search was employed for finding handlers of Windows API "window messages". And I think possibly the same in Borland C++. It was in addition to the normal vtable mechanism, used solely for message handlers.
If it was used in Borland/Turbo C++ – I can't remember – then it was in support of a language extensions that allowed you to associate message id's with message handler functions.
The sizeof of any class with just a virtual function will be size of an pointer(vptr inside the this) on that compiler, So given that virtual ptr and tbl mechanism itself is compiler implementation, will this statement I made above be always true?
Formally no (even with assumption of vtable mechanism), it depends on the compiler. Since the standard doesn't require the vtable mechanism it says nothing about placement of vtable pointer in each object. And other rules let the compiler freely add padding, unused bytes, at the end.
But in practice perhaps. ;-)
However it's not something that you should rely on, or that you need to rely on. But in the other direction you can require this, for example if you're defining an ABI. Then any compiler that doesn't, simply doesn't conform to your requirement.
Cheers & hth.,
Are there any compilers which implement Virtual Mechanism in any other way other than the virtual pointer and virtual table mechanism? As far as i have seen most(read g++,Microsoft visual studio) implement it through virtual table, pointer mechanism. So practically are there any other compiler implementations at all?
None that I'm aware of C++ compilers using, though you might find it interesting to read about Binary Tree Dispatch. If you're interested in exploiting the expectation of virtual dispatch tables in any way, you should be aware that compilers can - where the types are known at compile time - sometimes resolve virtual function calls at compile time, so may not consult the table.
The sizeof of any class with just a virtual function will be size of an pointer(vptr inside the this) on that compiler, So given that virtual ptr and tbl mechanism itself is compiler implementation, will this statement I made above be always true?
Assuming no base classes with their own virtual members, and no virtual base classes, it's overwhelmingly likely to be true. Alternatives can be envisaged - such as whole-program analysis revealing only one member in the class heirarchy, and a switch to compile-time dispatch. If run-time dispatch is required, it's hard to imagine why any compiler would introduce further indirection. Still, the Standard deliberately doesn't stipulate these things precisely so that implementations can vary, or be varied in future.
In trying to imagine an alternative scheme, I have come up with the following, along the lines of Yttril's answer. As far as I'm aware, no compiler uses it!
Given a sufficiently large virtual address space and flexible OS memory allocation routines, it would be possible for new to allocate objects of different types in fixed, non-overlapping address ranges. Then the type of an object could be inferred quickly from its address using a right-shift operation, and the result used to index a table of vtables, thus saving 1 vtable pointer per object.
At first glance this scheme might seem to run into problems with stack-allocated objects, but this can be handled cleanly:
For each stack-allocated object, the compiler adds code that adds a record to a global array of (address range, type) pairs when the object is created and removes the record when it is destroyed.
The address range comprising the stack would map to a single vtable containing a large number of thunks that read the this pointer, scan the array to find the corresponding type (vptr) for the object at that address, and call the corresponding method in the vtable pointed to. (I.e. the 42nd thunk will call the 42nd method in the vtable -- if the most virtual functions used in any class is n, then at least n thunks are required.)
This scheme obviously incurs non-trivial overhead (at least O(log n) for the lookup) for virtual method calls on stack-based objects. In the absence of arrays or composition (containment within another object) of stack-based objects, a simpler and faster approach can be used in which the vptr is placed on the stack immediately before the object (note that it is not considered part of the object and does not contribute to its size as measured by sizeof). In this case thunks simply subtract sizeof (vptr) from this to find the correct vptr to use, and forward as before.
IIRC Eiffel uses a different approach and all overrides of a method end up merged and compiled in the same address with a prologue where the object type is checked (so every object must have a type ID, but it's not a pointer to a VMT). This for C++ would require of course that the final function is created at link time.
I don't know any C++ compiler that uses this approach, however.
I've never heard of or seen any compiler that uses any alternative implementation. The reason that vtables are so popular is because that not only is it the most efficient implementation, but it's also the easiest design and most obvious implementation.
On pretty much any compiler you care to use, it's almost certainly true. However, it's not guaranteed and not always true- you can't depend on it, even though it's pretty much always the case. Your favourite compiler could also alter it's alignment, increasing it's size, for funsies, without telling you. From memory, it can also insert whatever debug information and whatever it likes.
C++/CLI deviates from both assumptions. If you define a ref class, it doesn't get compiled into machine code at all; instead, the compiler compiles it into .NET managed code. In the intermediate language, classes are a built-in feature, and the set of virtual methods is defined in the metadata, rather than a method table.
The specific strategy to implement object layout and dispatch depends on the VM. In Mono, an object containing just one virtual method has not the size of one pointer, but needs two pointers in the MonoObject struct; the second one for the synchronization of the object. As this is implementation-defined and also not really useful to know, sizeof is not supported for ref classes in C++/CLI.
First, there were mentioned Borland's proprietary extension to C++, Dynamic Dispatch Virtual Tables (DDVT), and you can read something about it in a file named DDISPATC.ZIP. Borland Pascal had both virtual and dynamic methods, and Delphi introduced yet another "message" syntax, similar to dynamic, but for messages. At this point I'm not sure if Borland C++ had the same features. There was no multiple inheritance in either Pascal or Delphi, so Borland C++ DDVT might be different from either Pascal or Delphi.
Second, in 1990s and a bit earlier there was experimenting with different object models, and Borland was not the most advanced one. I personally think that shutting down IBM SOMobjects did a damage to the world that we all still suffering from. Before shutting down SOM there were experiments with Direct-to-SOM C++ compilers. So instead of C++'s way of invoking methods SOM is used. It is in many ways similar to C++ vtable, with several exceptions. First, to prevent fragile base class problem, programs do not use offsets inside of vtable, because they don't know this offset. It can change if base class introduces new methods. Instead, callers invoke a thunk created in runtime that has this knowledge in its assembly code. And there is one more difference. In C++, when multiple inheritance is used, an object can contain several VMTs IIRC. In contrast to C++, each SOM object has just one VMT, so dispatch code should be different from "call dword ptr [VMT+offset]".
There is a document related to SOM, Release-to-Release Binary Compatibility in SOM. You can find comparison of SOM with another projects I know little of, like Delta/C++ and Sun OBI. They solve a subset of problems that SOM solves, and by doing so they are also having somewhat tweaked invokation code.
I have recently found Visual Age C++ v3.5 for Windows compiler fragment enough to get things running and actually touch it. Most of users are not likely to get OS/2 VM just to play with DTS C++, but having Windows compiler is completely another matter. VAC v3.5 is the first and the last version to support Direct-to-SOM C++ feature. VAC v3.6.5 and v4.0 are not appropriate.
Download VAC 3.5 fixpak 9 from IBM FTP. This fixpak contain many files, so you don't even need to full compiler (I have 3.5.7 distro, but fixpak 9 was big enough to do some tests).
Unpack to e. g. C:\home\OCTAGRAM\DTS
Start command line and run subsequent commands there
Run: set SOMBASE=C:\home\OCTAGRAM\DTS\ibmcppw
Run: C:\home\OCTAGRAM\DTS\ibmcppw\bin\SOMENV.BAT
Run: cd C:\home\OCTAGRAM\DTS\ibmcppw\samples\compiler\dts
Run: nmake clean
Run: nmake
hhmain.exe and its dll are in different directories, so we must make them find each other somehow; since I was doing several experiments, I executed "set PATH=%PATH%;C:\home\OCTAGRAM\DTS\ibmcppw\samples\compiler\dts\xhmain\dtsdll" once, but you can just copy dll near to hhmain.exe
Run: hhmain.exe
I've got an output this way:
Local anInfo->x = 5
Local anInfo->_get_x() = 5
Local anInfo->y = A
Local anInfo->_get_y() = B
{An instance of class info at address 0092E318
}
Tony D's answer correctly points out that compilers are allowed to use whole-program analysis to replace a virtual function call with a static call to the unique possible function implementation; or to compile obj->method() into the equivalent of
if (auto frobj = dynamic_cast<FrequentlyOccurringType>(obj)) {
frobj->FrequentlyOccurringType::method(); // static dispatch on hot path
} else {
obj->method(); // vtable dispatch on cold path
}
Karel Driesen and Urs Hölzle wrote a really fascinating paper way back in 1996 in which they simulated the effect of perfect whole-program optimization on typical C++ applications: "The Direct Cost of Virtual Function Calls in C++". (The PDF is available for free if you Google for it.) Unfortunately, they only benchmarked vtable dispatch versus perfect static dispatch; they didn't compare it to binary tree dispatch.
They did point out that there are actually two kinds of vtables, when you're talking about languages (like C++) that support multiple inheritance. With multiple inheritance, when you call a virtual method that is inherited from the second base class, you need to "fix up" the object pointer so it points to an instance of the second base class. This fixup offset can be stored as data in the vtable, or it can be stored as code in a "thunk". (See the paper for more details.)
I believe all decent compilers these days use thunks, but it did take 10 or 20 years for that market penetration to reach 100%.

in OO programming, what are some negative runtime impacts of inheritance?

I know there is some positive aspects of inheritance, but I don't know negative runtime impacts of inheritance? Can anybody tell me about that, thanks!
Large inheritance based systems usually uses more memory and have worse data layout than composition based systems, this has a runtime cost in terms of speed due to how the cache behaves ( you want everything related to be as tightly packed as possible ).
Virtual function calls requires a trip to a virtual function table in order to retrieve the correct function to call, this can be costly due to cache behavior, the vtable might be far from the calling function.
Multiple inheritance increases the cost of virtual function calls further, as first an offset might need to be computed in order to get the correct vtable.
If you're using RTTI, then you'll usually see additional data at a fixed location in relation to the vtable. This affects the vtable locality, which once again prohibits the cache.
If a base class contains virtual functions then instances of it and its descendants will each have a pointer to a virtual function table, increasing their memory footprint by the size of one pointer. Calls to virtual functions will have an extra level of indirection compared to non-virtual functions, so there is a small call time cost there.
Otherwise, there is no negative impact. Deriving one class from another but not using polymorphism (so, no virtual functions, always calling methods through pointers to the derived class) has no cost over a class with no parent.
Update: I have addressed the performance impact of inheritance here. Other answers have more to say on OO-correctness.
The benefits of using inheritance greatly outweigh the downfalls.
The first downfall is the object size in memory, which, when using virtual functions, has an extra pointer to the virtual function table.
Virtual function calls also require a few extra steps in the assembly compared to regular calls.
Non-virtual function calls cost the same in terms of performance.
Object size can also increase as an object of class A, if A is derived from B, contains all information from B. Of course, with a well-thought design, this doesn't happen, because even without inheritance, A would contain all information in B.
One more issue would be the use of dynamic_cast or static_cast, which you wouldn't encounter in an inheritance-free environment, but these can also be avoided even using inheritance.
The only runtime impact could be performance in terms of memory and speed. Considering functionality-wise everything can be done without inheritance, the only question is how well it performs as opposed to the alternatives. That will depend on the specific scenarios you want to compare, and the complier's generated code.
Inheritance can negatively impact data locality, which is a big deal when you have a lot of numbers to crunch. You also get less control over data layout than when you use composition, so your objects might take up more memory.
If you also use polymorphism, then you spend additional cycles on indirect function calls and get even worse data locality, as you reference virtual function tables.
Generally, the overhead cost of object-oriented programming is fairly small and you only have to think about it when you are processing large amounts of data. Check Sony's Pitfalls of Object Oriented Programming presentation — it looks at OOP performance from a game developer's perspective.
After reading the other (informative!) responses, I believe one potential negative impact wasn't mentioned yet:
Inheritance is often used to achieve polymorphy. In C++, this means that you pass references (C++ references or pointers) to the base type around, instead of passing it by value, to avoid the slicing problem. In practice, passing references around often means that the scope of an object should no longer define its life time - so people start using dynamic memory management (say, new and delete). And this can open a whole can of worms itself.
To make a long story short: very often, inheritance goes hand in hand with dynamic memory allocation, which opens a whole new class of issues.
Since you tagged your post with C++, I'd like to add that one of the most important runtime impact when you use virtual function in C++ is related to the impossibility to expand them inline.
In fact, the heaviest performance impact is not due to the virtual function table lookup, but to the fact that the compiler cannot expand a virtual function even if you declare it as inline. This prevents an important optimization that could make your code much faster.
I would think inheritance would only improve runtime. If you rewrite the code in several places that code has to be compiled that many times more.

Can you cache a virtual function lookup in C++?

Say I have a virtual function call foo() on an abstract base class pointer, mypointer->foo(). When my app starts up, based on the contents of a file, it chooses to instantiate a particular concrete class and assigns mypointer to that instance. For the rest of the app's life, mypointer will always point to objects of that concrete type. I have no way to know what this concrete type is (it may be instantiated by a factory in a dynamically loaded library). I only know that the type will stay the same after the first time an instance of the concrete type is made. The pointer may not always point to the same object, but the object will always be of the same concrete type. Notice that the type is technically determined at 'runtime' because it's based on the contents of a file, but that after 'startup' (file is loaded) the type is fixed.
However, in C++ I pay the virtual function lookup cost every time foo is called for the entire duration of the app. The compiler can't optimize the look up away because there's no way for it to know that the concrete type won't vary at runtime (even if it was the most amazing compiler ever, it can't speculate on the behavior of dynamically loaded libraries). In a JIT compiled language like Java or .NET the JIT can detect that the same type is being used over and over and do inline cacheing. I'm basically looking for a way to manually do that for specific pointers in C++.
Is there any way in C++ to cache this lookup? I realize that solutions might be pretty hackish. I'm willing to accept ABI/compiler specific hacks if it's possible to write configure tests that discover the relevant aspects of the ABI/compiler so that it's "practically portable" even if not truly portable.
Update: To the naysayers: If this wasn't worth optimizing, then I doubt modern JITs would do it. Do you think Sun and MS's engineers were wasting their time implementing inline cacheing, and didn't benchmark it to ensure there was an improvement?
There are two costs to a virtual function call: The vtable lookup and the function call.
The vtable lookup is already taken care of by the hardware. Modern CPUs (assuming you're not working on a very simple embedded CPU) will predict the address of the virtual function in their branch predictor and speculatively execute it in parallel with the array lookup. The fact that the vtable lookup happens in parallel with the speculative execution of the function means that, when executed in a loop in the situations you describe, virtual function calls have next to zero overhead compared to direct, non-inlined function calls.
I've actually tested this in the past, albeit in the D programming language, not C++. When inlining was disabled in the compiler settings and I called the same function in a loop several million times, the timings were within epsilon of each other whether the function was virtual or not.
The second and more important cost of virtual functions is that they prevent inlining of the function in most cases. This is even more important than it sounds because inlining is an optimization that can enable several other optimizations such as constant folding in some cases. There's no way to inline a function without recompiling the code. JITs get around this because they're constantly recompiling code during the execution of your application.
Why virtual call is expensive? Because you simply don't know the branch target until the code is executed in runtime. Even modern CPUs are still perfectly handling the virtual call and indirect calls. One can't simply say it costs nothing because we just have a faster CPU. No, it is not.
1. How can we make it fast?
You already have pretty deep understanding the problem. But, the only I can say that if the virtual function call is easy to predict, then you could perform software-level optimization. But, if it's not (i.e., you have really no idea what would be the target of the virtual function), then I don't think that there is good solution for now. Even for CPU, it is hard to predict in such extreme case.
Actually, compilers such as Visual C++'s PGO(Profiling guided optimization) has virtual call speculation optimization (Link). If the profiling result can enumerate hot virtual function targets, then it translate to direct call which can be inlined. This is also called devirtualization. It can be also found in some Java dynamic optimizer.
2. To those one who say it's not necessary
If you're using script languages, C# and concern about the coding efficiency, yes, it's worthless. However, anyone who are eager to save a single cycle to obtain better performance, then indirect branch is still important problem. Even the latest CPUs are not good to handle virtual calls. One good example would be a virtual machine or interpreter, which usually have a very large switch-case. Its performance is pretty much related to the correct prediction of indirect branch. So, you can't simply say it's too low-level or not necessary. There are hundreds of people who are trying to improve the performance in the bottom. That's why you can simply ignore such details :)
3. Some boring computer architectural facts related to virtual functions
dsimcha has written a good answer for how CPU can handle virtual call effectively. But, it's not exactly correct. First, all modern CPUs have branch predictor, which literally predicts the outcomes of a branch to increase pipeline throughput (or, more parallelism in instruction level, or ILP. I can even say that single-thread CPU performance is solely depending on how much you can extract ILP from a single thread. Branch prediction is the most critical factor for obtaining higher ILP).
In branch prediction, there are two predictions: (1) direction (i.e., the branch is taken? or not taken? binary answer), and (2) branch target (i.e., where will I go? it's not binary answer). Based on the prediction, CPU speculatively execute the code. If the speculation is not correct, then CPU rollbacks and restarts from the mis-predicted branch. This is completely hidden from programmer's view. So, you don't really know what's going on inside the CPU unless you're profiling with VTune which gives branch misprediction rates.
In general, branch direction prediction is highly accurate(95%+), but it is still hard to predict branch targets, especially virtual calls and switch-case(i.e., jump table). Vrtual call is indirect branch which requires a more memory load, and also CPU requires branch target prediction. Modern CPUs like Intel's Nehalem and AMD's Phenom have specialized indirect branch target table.
However, I don't think looking up vtable incurs a lot of overhead. Yes, it requires a more memory load which can make cache miss. But, once vtable is loaded into cache, then it's pretty much cache hit. If you're also concerned with that cost, you may put prefetching code to load vtable in advance. But, the real difficulty of virtual function call is that CPU can't do great job to predict the target of virtual call, which may result in pipeline drain frequently due to misprediction of the target.
So assuming that this is a fundamental issue you want to solve (to avoid premature optimization arguments), and ignoring platform and compiler specific hackery, you can do one of two things, at opposite ends of complexity:
Provide a function as part of the .dll that internally simply calls the right member function directly. You pay the cost of an indirect jump, but at least you don't pay the cost of a vtable lookup. Your mileage may vary, but on certain platforms, you can optimize the indirect function call.
Restructure your application such that instead of calling a member function per instance, you call a single function that takes a collection of instances. Mike Acton has a wonderful post (with a particular platform and application type bent) on why and how you should do this.
All answers are dealing with the most simple scenario, where calling a virtual method only requires getting the address of the actual method to call. In the general case, when multiple and virtual inheritance come into play, calling a virtual method requires shifting the this pointer.
The method dispatch mechanism can be implemented in more than one way, but it is common to find that the entry in the virtual table is not the actual method to call, but rather some intermediate 'trampoline' code inserted by the compiler that relocates the this pointer prior to calling the actual method.
When the dispatch is the simplest, just an extra pointer redirection, then trying to optimize it does not make sense. When the problem is more complex, then any solution will be compiler dependent and hackerish. Moreover, you do not even know in what scenario you are: if the objects are loaded from dlls then you don't really know whether the actual instance returned belongs to a simple linear inheritance hierarchy or a more complex scenario.
I have seen situations where avoiding a virtual function call is beneficial. This does not look to me to be one of those cases because you really are using the function polymorphically. You are just chasing one extra address indirection, not a huge hit, and one that might be partially optimized away in some situations. If it really does matter, you may want to restructure your code so that type-dependent choices such as virtual function calls are made fewer times, pulled outside of loops.
If you really think it's worth giving it a shot, you can set a separate function pointer to a non-virtual function specific to the class. I might (but probably wouldn't) consider doing it this way.
class MyConcrete : public MyBase
{
public:
static void foo_nonvirtual(MyBase* obj);
virtual void foo()
{ foo_nonvirtual(this); }
};
void (*f_ptr)(MyBase* obj) = &MyConcrete::foo_nonvirtual;
// Call f_ptr instead of obj->foo() in your code.
// Still not as good a solution as restructuring the algorithm.
Other than making the algorithm itself a bit wiser, I suspect any attempt to manually optimize the virtual function call will cause more problems than it solves.
You can't use a method pointer because pointers to member functions aren't considered covariant return types. See the example below:
#include <iostream>
struct base;
struct der;
typedef void(base::*pt2base)();
typedef void(der::*pt2der)();
struct base {
virtual pt2base method() = 0;
virtual void testmethod() = 0;
virtual ~base() {}
};
struct der : base {
void testmethod() {
std::cout << "Hello from der" << std::endl;
}
pt2der method() { **// this is invalid because pt2der isn't a covariant of pt2base**
return &der::testmethod;
}
};
The other option would be to have the method declared pt2base method() but then the return would be invalid because der::testmethod is not of type pt2base.
Also even if you had a method that received a ptr or reference to the base type you would have to dynamically cast it to the derived type in that method to do anything particularly polymorphic which adds back in the cost we're trying to save.
So, what you basically want to do is convert runtime polymorphism into compile time polymorphism. Now you still need to build your app so that it can handle multiple "cases", but once it's decided which case is applicable to a run, that's it for the duration.
Here's a model of the runtime polymorphism case:
struct Base {
virtual void doit(int&)=0;
};
struct Foo : public Base {
virtual void doit(int& n) {--n;}
};
struct Bar : public Base {
virtual void doit(int& n) {++n;}
};
void work(Base* it,int& n) {
for (unsigned int i=0;i<4000000000u;i++) it->doit(n);
}
int main(int argc,char**) {
int n=0;
if (argc>1)
work(new Foo,n);
else
work(new Bar,n);
return n;
}
This takes ~14s to execute on my Core2, compiled with gcc 4.3.2 (32 bit Debian), -O3 option.
Now suppose we replace the "work" version with a templated version (templated on the concrete type it's going to be working on):
template <typename T> void work(T* it,int& n) {
for (unsigned int i=0;i<4000000000u;i++) it->T::doit(n);
}
main doesn't actually need to be updated, but note that the 2 calls to work now trigger instantiations of and calls to two different and type-specific functions (c.f the one polymorphic function previously).
Hey presto runs in 0.001s. Not a bad speed up factor for a 2 line change! However, note that the massive speed up is entirely due to the compiler, once the possibility of runtime polymorphism in the work function is eliminated, just optimizing away the loop and compiling the result directly into the code. But that actually makes an important point: in my experience the main gains from using this sort of trick come from the opportunities for improved inlining and optimisation they allow the compiler when a less-polymorphic, more specific function is generated, not from the mere removal of vtable indirection (which really is very cheap).
But I really don't recommend doing stuff like this unless profiling absolutely indicates runtime polymorphism is really hitting your performance. It'll also bite you as soon as someone subclasses Foo or Bar and tries to pass that into a function actually intended for its base.
You might find this related question interesting too.
I asked a very similar question recently, and got the answer that it's possible as a GCC extension, but not portably:
C++: Pointer to monomorphic version of virtual member function?
In particular, I also tried it with Clang and it doesn't support this extension (even though it supports many other GCC extensions).
Could you use a method pointer?
The objective here is that the compiler would load the pointer with the location of the resolved method or function. This would occur once. After the assignment, the code would access the method in a more direct fashion.
I know that a pointer to an object and accessing the method via the object point invokes run-time polymorphism. However, there should be a way to load a method pointer to a resolved method, avoiding the polymorphism and directly calling the function.
I've checked the community wiki to introduce more discussion.