C++ double dispatch "extensible" without RTTI - c++

Does anyone know a way to have double dispatch handled correctly in C++ without using RTTI and dynamic_cast<> and also a solution, in which the class hierarchy is extensible, that is the base class can be derived from further and its definition/implementation does not need to know about that?
I suspect there is no way, but I'd be glad to be proven wrong :)

The first thing to realize is that double (or higher order) dispatch doesn't scale. With single
dispatch, and n types, you need n functions; for double dispatch n^2, and so on. How you
handle this problem partially determines how you handle double dispatch. One obvious solution is to
limit the number of derived types, by creating a closed hierarchy; in that case, double dispatch can
be implemented easily using a variant of the visitor pattern. If you don't close the hierarchy,
then you have several possible approaches.
If you insist that every pair corresponds to a function, then you basically need a:
std::map<std::pair<std::type_index, std::type_index>, void (*)(Base const& lhs, Base const& rhs)>
dispatchMap;
(Adjust the function signature as necessary.) You also have to implement the n^2 functions, and
insert them into the dispatchMap. (I'm assuming here that you use free functions; there's no
logical reason to put them in one of the classes rather than the other.) After that, you call:
(*dispatchMap[std::make_pair( std::type_index( typeid( obj1 ) ), std::type_index( typeid( obj2 ) )])( obj1, obj2 );
(You'll obviously want to wrap that into a function; it's not the sort of thing you want scattered
all over the code.)
A minor variant would be to say that only certain combinations are legal. In this case, you can use
find on the dispatchMap, and generate an error if you don't find what you're looking for.
(Expect a lot of errors.) The same solution could e used if you can define some sort of default
behavior.
If you want to do it 100% correctly, with some of the functions able to handle an intermediate class
and all of its derivatives, you then need some sort of more dynamic searching, and ordering to
control overload resolution. Consider for example:
Base
/ \
/ \
I1 I2
/ \ / \
/ \ / \
D1a D1b D2a D2b
If you have an f(I1, D2a) and an f(D1a, I2), which one should be chosen. The simplest solution
is just a linear search, selecting the first which can be called (as determined by dynamic_cast on
pointers to the objects), and manually managing the order of insertion to define the overload
resolution you wish. With n^2 functions, this could become slow fairly quickly, however. Since
there is an ordering, it should be possible to use std::map, but the ordering function is going to
be decidedly non-trivial to implement (and will still have to use dynamic_cast all over the
place).
All things considered, my suggestion would be to limit double dispatch to small, closed hierarchies,
and stick to some variant of the visitor pattern.

The "visitor pattern" in C++ is often equated with double dispatch. It uses no RTTI or dynamic_casts.
See also the answers to this question.

The first problem is trivial. dynamic_cast involves two things: run-time check and a type cast. The former requires RTTI, the latter does not. All you need to do to replace dynamic_cast with a functionality that does the same without requiring RTTI is to have your own method to check the type at run-time. To do this, all you need is a simple virtual function that returns some sort of identification of what type it is or what more-specific interface it complies to (that can be an enum, an integer ID, even a string). For the cast, you can safely do a static_cast once you have already done the run-time check yourself and you are sure that the type you are casting to is in the object's hierarchy. So, that solves the problem of emulating the "full" functionality of dynamic_cast without needing the built-in RTTI. Another, more involved solution is to create your own RTTI system (like it is done in several softwares, like LLVM that Matthieu mentioned).
The second problem is a big one. How to create a double dispatch mechanism that scales well with an extensible class hierarchy. That's hard. At compile-time (static polymorphism), this can be done quite nicely with function overloads (and/or template specializations). At run-time, this is much harder. As far as I know, the only solution, as mentioned by Konrad, is to keep a dispatch table of function pointers (or something of that nature). With some use of static polymorphism and splitting dispatch functions into categories (like function signatures and stuff), you can avoid having to violate type safety, in my opinion. But, before implementing this, you should think very hard about your design to see if this double dispatch is really necessary, if it really needs to be a run-time dispatch, and if it really needs to have a separate function for each combination of two classes involved (maybe you can come up with a reduced and fixed number of abstract classes that capture all the truly distinct methods you need to implement).

You may want to check how LLVM implement isa<>, dyn_cast<> and cast<> as a template system, since it's compiled without RTTI.
It is a bit cumbersome (requires tidbits of code in every class involved) but very lightweight.
LLVM Programmer's Manual has a nice example and a reference to the implementation.
(All 3 methods share the same tidbit of code)

You can fake the behaviour by implementing the compile-time logic of multiple dispatch yourself. However, this is extremely tedious. Bjarne Stroustrup has co-authored a paper describing how this could be implemented in a compiler.
The underlying mechanism – a dispatch table – could be dynamically generated. However, using this approach you would of course lose all syntactical support. You’d need to to maintain 2-dimensional matrix of method pointers and manually look up the correct method depending on the argument types. This would render a simple (hypothetical) call
collision(foo, bar);
at least as complicated as
DynamicDispatchTable::lookup(collision_signature, FooClass, BarClass)(foo, bar);
since you didn’t want to use RTTI. And this is assuming that all your methods take only two arguments. As soon as more arguments are required (even if those aren’t part of the multiple dispatch) this becomes more complicated still, and would require circumventing type safety.

Related

Are there any more useful use-cases of functors?

I am trying to understand cases that require using functors. Most of the answer on Stackoverflow and other websites put emphasis on being able to define different adders or multipliers regarding benefits of functors.
Can the use of functors go beyond them? What are some other uses of functors?
More often than not, functors are used with other API calls that need some kind of function object. For example, sorting vectors of user-defined objects which don't have operator() or operator< (etc.) defined.
There are some cases where a set of functors may prove useful. One such case comes when you have several algorithms which functionally do the same thing, but achieve varying levels of accuracy. This happens a lot with some numeric optimization problems: given the general form of a matrix, we might use a different technique to find the solution of a linear equation (e.g., sparse vs dense problem-matracies can employ different algorithms to invert the matrix).
In particular, you should consider functors versus lambdas. In modern versions of C++, there really isn't a need to specify a functor unless you're implementing a function/method that needs a functor (or lambda) as an argument. There are some cases to consider: Do you need a unit-test? Is the functor itself a prototype of future functionality? etc.
ADDENDUM: The key thing to consider is that the use of functor/lambda ultimately boils down to a design decision. As #t.niese noted in the comments, you could use just use functions in combination of template arguments. In addition to the previous considerations above, consider whether or not you can make a compile-time or run-time assessment of the needed functionality.
Additionally, as you make design decisions, you may want to consider "Is there a need for this function to be used outside of this specific context?" If the answer is no, that's a compelling argument to choose a lambda over a free function. With regards to functor specifically, this was an important pattern added before the addition of lambdas to the standard. Typically they're defined in a somewhat private context (frequently in the implementation files, thus after compiled into a library, obfuscated to users of the API). Now with lambdas, you can simply define them within another function or even as a function argument, instead of pre-defining them prior to need.

Compile time vs run time polymorphism in C++ advantages/disadvantages

In C++ when it is possible to implement the same functionality using either run time (sub classes, virtual functions) or compile time (templates, function overloading) polymorphism, why would you choose one over the other?
I would think that the compiled code would be larger for compile time polymorphism (more method/class definitions created for template types), and that compile time would give you more flexibility, while run time would give you "safer" polymorphism (i.e. harder to be used incorrectly by accident).
Are my assumptions correct? Are there any other advantages/disadvantages to either? Can anyone give a specific example where both would be viable options but one or the other would be a clearly better choice?
Also, does compile time polymorphism produce faster code, since it is not necessary to call functions through vtable, or does this get optimized away by the compiler anyway?
Example:
class Base
{
virtual void print() = 0;
}
class Derived1 : Base
{
virtual void print()
{
//do something different
}
}
class Derived2 : Base
{
virtual void print()
{
//do something different
}
}
//Run time
void print(Base o)
{
o.print();
}
//Compile time
template<typename T>
print(T o)
{
o.print();
}
Static polymorphism produces faster code, mostly because of the possibility of aggressive inlining. Virtual functions can rarely be inlined, and mostly in a "non-polymorphic" scenarios. See this item in C++ FAQ. If speed is your goal, you basically have no choice.
On the other hand, not only compile times, but also the readability and debuggability of the code is much worse when using static polymorphism. For instance: abstract methods are a clean way of enforcing implementation of certain interface methods. To achieve the same goal using static polymorphism, you need to restore to concept checking or the curiously recurring template pattern.
The only situation when you really have to use dynamic polymorphism is when the implementation is not available at compile time; for instance, when it's loaded from a dynamic library. In practice though, you may want to exchange performance for cleaner code and faster compilation.
After you filter out obviously bad and suboptimal cases I believe you're left with almost nothing. IMO it is pretty rare when you're facing that kind of choice. You could improve the question by stating an example, and for that a real comparison van be provided.
Assuming we have that realistic choice I'd go for the compile time solution -- why waste runtime for something not absolutely necessary? Also is something is decided at compile time it is easier to think about, follow in head and do evaluation.
Virtual functions, just like function pointers make you unable to create accurate call graphs. You can review the bottom but not easily from the top. virtual functions shall follow some rules but if they don't, you have to look all of them for the sinner.
Also there are some losses on performance, probably not a big deal in majority of cases but if no balance on the other side, why take it?
In C++ when it is possible to implement the same functionality using either run time (sub classes, virtual functions) or compile time (templates, function overloading) polymorphism, why would you choose one over the other?
I would think that the compiled code would be larger for compile time polymorphism (more method/class definitions created for template types)...
Often yes - due to multiple instantiations for different combinations of template parameters, but consider:
with templates, only the functions actually called are instantiated
dead code elimination
constant array dimensions allowing member variables such as T mydata[12]; to be allocated with the object, automatic storage for local variables etc., whereas a runtime polymorphic implementation might need to use dynamic allocation (i.e. new[]) - this can dramatically impact cache efficiency in some cases
inlining of function calls, which makes trivial things like small-object get/set operations about an order of magnitude faster on the implementations I've benchmarked
avoiding virtual dispatch, which amounts to following a pointer to a table of function pointers, then making an out-of-line call to one of them (it's normally the out-of-line aspect that hurts performance most)
...and that compile time would give you more flexibility...
Templates certainly do:
given the same template instantiated for different types, the same code can mean different things: for example, T::f(1) might call a void f(int) noexcept function in one instantiation, a virtual void f(double) in another, a T::f functor object's operator()(float) in yet another; looking at it from another perspective, different parameter types can provide what the templated code needs in whatever way suits them best
SFINAE lets your code adjust at compile time to use the most efficient interfaces objects supports, without the objects actively having to make a recommendation
due to the instantiate-only-functions-called aspect mentioned above, you can "get away" with instantiating a class template with a type for which only some of the class template's functions would compile: in some ways that's bad because programmers may expect that their seemingly working Template<MyType> will support all the operations that the Template<> supports for other types, only to have it fail when they try a specific operation; in other ways it's good because you can still use Template<> if you're not interested in all the operations
if Concepts [Lite] make it into a future C++ Standard, programmers will have the option of putting stronger up-front contraints on the semantic operations that types used as template paramters must support, which will avoid nasty surprises as a user finds their Template<MyType>::operationX broken, and generally give simpler error messages earlier in the compile
...while run time would give you "safer" polymorphism (i.e. harder to be used incorrectly by accident).
Arguably, as they're more rigid given the template flexibility above. The main "safety" problems with runtime polymorphism are:
some problems end up encouraging "fat" interfaces (in the sense Stroustrup mentions in The C++ Programming Language): APIs with functions that only work for some of the derived types, and algorithmic code needs to keep "asking" the derived types "should I do this for you", "can you do this", "did that work" etc..
you need virtual destructors: some classes don't have them (e.g. std::vector) - making it harder to derive from them safely, and the in-object pointers to virtual dispatch tables aren't valid across processes, making it hard to put runtime polymorphic objects in shared memory for access by multiple processes
Can anyone give a specific example where both would be viable options but one or the other would be a clearly better choice?
Sure. Say you're writing a quick-sort function: you could only support data types that derive from some Sortable base class with a virtual comparison function and a virtual swap function, or you could write a sort template that uses a Less policy parameter defaulting to std::less<T>, and std::swap<>. Given the performance of a sort is overwhelmingly dominated by the performance of these comparison and swap operations, a template is massively better suited to this. That's why C++ std::sort clearly outperforms the C library's generic qsort function, which uses function pointers for what's effectively a C implementation of virtual dispatch. See here for more about that.
Also, does compile time polymorphism produce faster code, since it is not necessary to call functions through vtable, or does this get optimized away by the compiler anyway?
It's very often faster, but very occasionally the sum impact of template code bloat may overwhelm the myriad ways compile time polymorphism is normally faster, such that on balance it's worse.

Is boost::variant rocket science? (And should I therefore avoid it for simple problems?)

OK, so I have this tiny little corner of my code where I'd like my function return either of (int, double, CString) to clean up the code a bit.
So I think: No problem to write a little union-like wrapper struct with three members etc. But wait! Haven't I read of boost::variant? Wouldn't this be exactly what I need? This would save me from messing around with a wrapper struct myself! (Note that I already have the boost library available in my project.)
So I fire up my browser, navigate to Chapter 28. Boost.Variant and lo and behold:
The variant class template is a safe, generic, stack-based discriminated union container, offering a simple solution for manipulating an object from a heterogeneous set of types [...]
Great! Exactly what I need!
But then it goes on:
Boost.Variant vs. Boost.Any
Boost.Any makes little use of template metaprogramming techniques (avoiding potentially hard-to-read error messages and significant compile-time processor and memory demands).
[...]
Troubleshooting
"Internal heap limit reached" -- Microsoft Visual C++ -- The compiler option /ZmNNN can increase the memory allocation limit. The NNN is a scaling percentage (i.e., 100 denotes the default limit). (Try /Zm200.)
[...]
Uh oh. So using boost::variant may significantly increase compile-time and generate hard-to-read error messages. What if someone moves my use of boost::variant to a common header, will our project suddenly take lots longer to compile? Am I introducing an (unnecessarily) complex type?
Should I use boost::variant for my simple tiny problem?
Generally, use boost::variant if you do want a discriminated union (any is for unknown types -- think of it as some kind of equivalent to how void* is used in C).
Some advantages include exception handling, potential usage of less space than the sum of the type sizes, type discriminated "visiting". Basically, stuff you'd want to perform on the discriminated union.
However, for boost::variant to be efficient, at least one of the types used must be "easily" constructed (read the documentation for more details on what "easily" means).
Boost.variant is not that complex, IMHO. Yes, it is template based, but it doesn't use any really complex feature of C++. I've used quite a bit and no problem at all. I think in your case it would help better describing what your code is doing.
Another way of thinking is transforming what that function returns into a more semantically rich structure/class that allows interpreting which inner element is interesting, but that depends on your design.
This kind of boost element comes from functional programming, where you have variants around every corner.
It should be a way to have a type-safe approach to returning a kind of value that can be of many precise types. This means that is useful to solve your problem BUT you should consider if it's really what you need to do.
The added value compared to other approaches that tries to solve the same problem should be the type-safety (you won't be able to place whatever you want inside a variant without noticing, in opposition to a void*)
I don't use it because, to me, it's a symptom of bad design.
Either your method should return an object that implements a determinated interface or it should be split in more than one method. Design should be reviewed, anyway.

Large scale usage of Meyer's advice to prefer Non-member,non-friend functions?

For some time I've been designing my class interfaces to be minimal, preferring namespace-wrapped non-member functions over member functions. Essentially following Scott Meyer's advice in the article How Non-Member Functions Improve Encapsulation.
I've been doing this with good effect in a few small scale projects, but I'm wondering how well it works on a larger scale. Are there any large, well regarded open-source C++ projects that I can take a look at and perhaps reference where this advice is strongly followed?
Update: Thanks for all the input, but I'm not really interested in opinion so much as finding out how well it works in practice on a larger scale. Nick's answer is closest in this regard, but I'd like to be able to see the code. Any sort of detailed description of practical experiences (positives, negatives, practical considerations, etc) would be acceptable as well.
I do this quite a bit on the project I work on; the largest of which at my current company is around 2M lines, but it's not open source, so I can't provide it as a reference. However, I will say that I agree with the advice, generally speaking. The more you can separate the functionality which is not strictly contained to just one object from that object, the better your design will be.
By way of an example, consider the classic polymorphism example: a Shape base class with subclasses, and a virtual Draw() function. In the real world, Draw() would need to take some drawing context, and potentially be aware of the state of other things being drawn, or the application in general. Once you put all that into each subclass implementation of Draw(), you're likely to have some code overlap, or most of your actual Draw() logic will be in the base class, or somewhere else. Then consider that if you want to re-use some of that code, you'll need to provide more entry points into the interface, and possibly pollute the functions with other code not related to drawing shapes (eg: multi-shape drawing correlation logic). Before long, it'll be a mess, and you'll wish you had a draw function which took a Shape (and context, and other data) instead, and Shape just had functions/data which were entirely encapsulated and not using or referencing external objects.
Anyway, that's my experience/advice, for what it's worth.
I'd argue that the benefit of non-member functions increases as the size of the project increases. The standard library containers, iterators, and algorithms library are proof of this.
If you can decouple algorithms from data structures (or, to phrase it another way, if you can decouple what you do with objects from how their internal state is manipulated), you can decrease coupling between your classes and take greater advantage of generic code.
Scott Meyers isn't the only author who has argued in favor of this principle; Herb Sutter has too, especially in Monoliths Unstrung, which ends with the guideline:
Where possible, prefer writing functions as nonmember nonfriends.
I think one of the best examples of an unneccessary member function from that article is std::basic_string::find; there is no reason for it to exist, really, as std::find provides exactly the same functionality.
OpenCV library does this. They have a cv::Mat class that presents a 3D matrix (or images). Then they have all the other functions in the cv namespace.
OpenCV library is huge and is widely regarded in its field.
One practical advantage of writing functions as nonmember nonfriends is that doing so can significantly reduce the time it takes to thoroughly test and verify the code.
Consider, for example, the sequence container member functions insert and push_back. There are at least two approaches to implementing push_back:
It can simply call insert (it's behavior is defined in terms of insert anyway)
It can do all the work that insert would do (possibly calling private helper functions) without actually calling insert
Obviously, when implementing a sequence container, you probably want to use the first approach. push_back is just a special form of insert and (to the best of my knowledge) you can't really get any performance benefit by implementing push_back some other way (at least not for list, deque, or vector).
However, to thoroughly test such a container, you have to test push_back separately: since push_back is a member function, it can modify any and all of the internal state of the container. From a testing standpoint, you should (must?) assume that push_back is implemented using the second approach because it is possible that it could be implemented using the second approach. There is no guarantee that it is implemented in terms of insert.
If push_back is implemented as a nonmember nonfriend, it can't touch any of the internal state of the container; it must use the first approach. When you write tests for it, you know that it can't break the internal state of the container (assuming the actual container member functions are implemented correctly). You can use that knowledge to significantly reduce the number of tests that you need to write to fully exercise the code.
(I don't have time to write this up nicely, the following's a 5 minute brain dump which doubtless can be ripped apart at various trival levels, but please address the concepts and general thrust.)
I have considerable sympathy for the position taken by Jonathan Grynspan, but want to say a bit more about it than can reasonably be done in comments.
First - a "well said" to Alf Steinbach, who chipped in with "It's only over-simplified caricatures of their viewpoints that might seem to be in conflict. For what it's worth I don't agree with Scott Meyers on this matter; as I see it he's over-generalizing here, or he was."
Scott, Herb etc. were making these points when few people understood the trade-offs or alternatives, and they did so with disproportionate strength. Some nagging hassles people had during evolution of code were analysed and a new design approach addressing those issues was rationally derived. Let's return to the question of whether there were downsides later, but first - worth saying that the pain in question was typically small and infrequent: non-member functions are just one small aspect of designing reusable code, and in enterprise scale systems I've worked on simply writing the same kind of code you'd have put into a member function as a non-member is rarely enough to make the non-members reusable. It's pretty rare for them to even express algorithms that are both complex enough to be worth reusing and yet not tightly bound to the specific of the class they were designed for, that being weird enough that it's practically inconceivable some other class will happen along supporting the same operations and semantics. Often, you also need to template arguments, or introduce a base class to abstract the set of operations required. Both have significant implications in terms of performance, being inline vs out-of-line, client-code recompilation.
That said, there's often less code changes and impact study required when changing implementation if operations have been implementing in terms of a public interface, and being a non-friend non-member systematically enforces that. Occasionally though, it makes the initial implementation more verbose or in some other way less desirable and maintainble.
But, as a litmus test - how many of these non-member functions sit in the same header as the only class for which they're currently applicable? How many want to abstract their arguments via templates (which means inlining, compilation dependencies) or base classes (virtual function overheads) to allow reuse? Both discourage people from seeing them as reusable, but when not the case, the operations available on a class are delocalised, which can frustrate developers perception of a system: the develop often has to work out for themselves the rather disappointing fact that - "oh - that will only work for class X".
Bottom line: most member functions aren't potentially reusable. Much corporate code isn't broken into clean algorithm versus data with potential for reuse of the former. That kind of division just isn't required or useful or conceivably useful 20 years down the road. It's much the same as get/set methods - they're needed at certain API boundaries, but can constitute needless verbosity when ownership and use of the code is localised.
Personally, I don't have an all or nothing approach to this, but decide what to make a member function or non-member based on whether there's any likely benefit to either, potential reusability versus locality of interface.
I also do this alot, where it seems to make sense, and it causes absolutely no problems with scaling. (although my current project is only 40000 LOC) In fact, I think it makes the code more scalable - it slims down classes, reduces dependencies.
It sometimes requires you to refactor your functions to make them independent of members of the class - and thereby often creating a library of more general helper functions, which you can easly reuse elsewhere. I'd also mention that one of the common problems with many large projects is the bloating of classes - and I think preferring non-member, non-friend functions also helps here.
Prefer non-member non-friend functions for encapsulation UNLESS you want implicit conversions to work for class templates non-member functions (in which case you better make them friend functions):
That is, if you have a class template type<T>:
template<class T>
struct type {
void friend foo(type<T> a) {}
};
and a type implicitly convertible to type<T>, e.g.:
template<class T>
struct convertible_to_type {
operator type<T>() { }
};
The following works as expected:
auto t = convertible_to_type<int>{};
foo(t); // t is converted to type<int>
However, if you make foo a non-friend function:
template<class T>
void foo(type<T> a) {}
then the following doesn't work:
auto t = convertible_to_type<int>{};
foo(t); // FAILS: cannot deduce type T for type
Since you cannot deduce T then the function foo is removed from the overload resolution set, that is: no function is found, which means that the implicit conversion does not trigger.

How is dynamic_cast typically implemented?

Is the type check a mere integer comparison? Or would it make sense to have a GetTypeId virtual function to distinguishing which would make it an integer comparison?
(Just don't want things to be a string comparison on the class names)
EDIT: What I mean is, if I'm often expecting the wrong type, would it make sense to use something like:
struct Token
{
enum {
AND,
OR,
IF
};
virtual std::size_t GetTokenId() = 0;
};
struct AndToken : public Token
{
std::size_t GetTokenId() { return AND; }
};
And use the GetTokenId member instead of relying on dynamic_cast.
The functionality of the dynamic_cast goes far beyond a simple type check. If it was just a type check, it would be very easy to implement (something like what you have in your original post).
In addition to type checking, dynamic_cast can perform casts to void * and hierarchical cross-casts. These kinds of casts conceptually require some ability to traverse class hierarchy in both directions (up and down). The data structures needed to support such casts are more complicated than a mere scalar type id. The information the dynamic_cast is using is a part of RTTI.
Trying to describe it here would be counterproductive. I used to have a good link that described one possible implementation of RTTI... will try to find it.
I don't know the exact implementation, but here is an idea how I would do it:
Casting from Derived* to Base* can be done in compile time. Casting between two unrelated polimorphic types can be done in compile time too (just return NULL).
Casting from Base* to Derived* needs to be done in run-time, because multiple derived classes possible. The identification of dynamic type can be done using the virtual method table bound to the object (that's why it requires polymorphic classes).
This VMT probably contains extra information about the base classes and their data offsets. These data offsets are relevant when multiple inheritance is involved and is added to the source pointer to make it point to the right location.
If the desired type was not found among the base classes, dynamic_cast would return null.
In some of the original compilers you are correct they used string comparison.
As a result dynamic_cast<> was very slow (relatively speaking) as the class hierarchy was traversed each step up/down the hierarchy chain required a string compare against the class name.
This leads to a lot of people developing their own casting techniques. This was nearly always ultimately futile as it required each class to be annotated correctly and when things went wrong it was nearly impossible to trace the error.
But that is also ancient history.
I am not sure how it is done now but it definitely does not involve string comparison. Doing it yourself is also a bad idea (never do work that the compiler is already doing). Any attempt you make will not be as fast or as accurate as the compiler, remember that years of development have gone into making the compiler code as quick as possible (and it will always be correct).
The compiler cannot divine additional information you may have and stick it in dynamic_cast. If you know certain invariants about your code and you can show that your manual casting mechanism is faster, do it yourself. It doesn't really matter how dynamic_cast is implemented in that case.