Why is RTTI necessary? - c++

Why is RTTI (Runtime Type Information) necessary?

RTTI, Run-Time Type Information, introduces a [mild] form of reflection for C++.
It allows to know for example the type of a super class, hence allowing to handle an heterogeneous collection of objects which are all derived from the same base type. in ways that are specific to the individual super-classes. (Say you have an array of "Vehicle" objects and need to deal differently with the "Truck" objects found amid the array).
The question whether RTTI is necessary is however an open one. Story has it that Bjarne Stroustrup purposefully excluded this feature from the original C++ specification, by fear that it would be misused.
There are indeed opportunities to overuse/misuse reflection features, and this may have been even more of a factor when C++ was initially introduced because there wasn't such a OOP culture in the mainstream programmer community.
This said, with a more OOP savvy community, with the effective demonstration of all the good things reflection can do (eg. with languages such as Java or C#) and with the fancy design patterns in use nowadays, I strongly believe that RTTI and reflection features at large are very important even if sometimes misused.

I can think of exactly one case when it would be appropriate to use RTTI, and it doesn't even work.
It is fairly common for C-compatible APIs which perform callbacks to provide a user-defined void* to communicate a state structure back to the caller. When calling such an API from C++, it is quite common to pass the this pointer through said void* argument. From the callback, one might want to invoke virtual functions on the passed pointer.
In some cases when the callback parameters are insecure (such as LPARAM of a Windows message), it is obviously desirable to validate the pointer before using it for a virtual call, by checking the hidden vfptr. dynamic_cast is the natural way to do this, but results in undefined behavior exactly when the object is invalid (IIRC, it is undefined behavior if the pointer is to anything except an object with a virtual table). So RTTI is utterly useless for preventing a shatter attack in this way.
Feel free to present any other valid use cases for RTTI, cause I'm totally unconvinced.
EDIT: boost::any got mentioned. As far as boost::any is concerned, you can disable RTTI and use the following typeid implementation:
typedef const void* typeinfo_nonrtti;
template <typename T> typeinfo_nonrtti typeid_nonrtti();
template <typename T> class typeinfo_nonrtti_helper
{
friend typeinfo_nonrtti typeid_nonrtti<T>();
static char unique;
};
template <typename T> char typeinfo_nonrtti_helper<T>::unique;
template <typename T>
typeinfo_nonrtti typeid_nonrtti() { return &typeinfo_nonrtti_helper<T>::unique; }

Related

Void pointer in C++

I learned that templates are the void* equivalents in C++. Is this true?
I have this issue in polling "events" off some procedure, when I have an EventType variable and I may also need to pass raw data that is related to that event.
struct WindowEvent {
enum Type {WINDOW_SIZE_CHANGE, USER_CLICK, ...};
void* data;
};
The user may then cast data to the necessary type, depending on the event type.
Is this approach okay in C++? Are there any better approaches?
In C, which generally lacks support for polymorphism, void* pointers can be used to accept data of any type, along with some run-time representation of the actual type, or just knowledge that the data will be casted back to the correct type.
In C++ and other languages with support for polymorphism one will generally instead use either dynamic polymorphism (classes with virtual functions) or static polymorphism (function overloads and templates).
The main difference is that the C approach yields dynamic (run time) manual type checking, while the C++ approaches yield mostly static (compile time) and fully automated type checking. This means less time spent on testing and hunting down silly easily preventable bugs. The cost is more verbose code, and that means that there's a code size offset somewhere, under which the C approach probably rules for productivity, and above which the C++ approaches rule.
"I learned that templates are the void equivalents in C++. Is this true?"*
No - Templates maintain type safety
"Is this approach okay in C++?"
No
"Are there any better approaches?"
Depending on the use case one could use (for example)
class EventData {
public:
virtual int getData() = 0;
};
And then use the appropriate inherited class. Perhaps using smart pointers.

Easy way to implement small buffer optimization for arbitrary type erasure (like in std::function.)

I tend to use type erasure technique quite a bit.
It typically looks like this:
class YetAnotherTypeErasure
{
public:
// interface redirected to pImpl
private:
// Adapting function
template ...
friend YetAnotherTypeErasure make_YetAnotherTypeErasure (...);
class Interface {...};
template <typename Adaptee>
class Concrete final : public Interface {
// redirecting Interface to Adaptee
};
std::unique_ptr<Interface> pImpl_; // always on the heap
};
std::function does something similar, but it has a small buffer optimization, so if Concrete<Adaptee> is smaller than smth and has nothrow move operations, it will be stored in it. Is there some generic library solution to do it fairly easy? For enforcing small buffer only storing at compile time? Maybe something has been proposed for standardisation?
I know nothing about the small buffer optimization required by the standard or any proposal, though it is often allowed or encouraged.
Note that some (conditionally) non-throwing requirements on such types effectively require the optimization in practice because alternatives (like non-throwing allocation from emergency buffers) seem insane here.
On the other hand, you can just make your own solution from scratch, based on the standard library (e.g. std::aligned_storage). This may still verbose from the view of users, but not too hard.
Actually I implemented (not proposed then) any with such optimization and some related utilities several years ago. Lately, libstdc++'s implementation of std::experimental::any used the technique almost exactly as this (however, __ prefixed internal names are certainly not good for ordinary library users).
My implementation now uses some common helpers to deal with the storage. These helpers do ease to implement the type erasure storage strategy (at least fit for something similar to any enough). But I am still interested in more general high-level solution to simplify the interface redirecting.
There is also a function implementation based directly on the any implementation above. They support move-only types and sane allocator interface, while std ones not. The function implementation has better performance than std::function in libstdc++ in some cases, thanks to the (partially no-op) default initialization of the underlying any object.
I found a reasonably nice solution for everyday code - use std::function
With tiny library support to help with const correctness,
the code get's down to 20 lines:
https://gcc.godbolt.org/z/GtewFI
I think C++20 polymorphic_value comes closest to what we can do in modern c++: wg21.link/p0201
Basically it's like std::any but all of your types have to inherit the same interface.
It is Semiregular, they decided to drop equality.
This has some overhead: one vptr in the class itself and a separate dispatch mechanism in the polymorphic value. It also has a pointer like interface instead of a value like.
However, considering how easy it is to use it comparing to writing your own type_erased adapter, I'd say for most use-cases would be more than good enough.

Why std::unary_function doesn't contain virtual destructor

I came across class template std::unary_function and std::binary_function.
template <class Arg, class Result>
struct unary_function {
typedef Arg argument_type;
typedef Result result_type;
};
template <class Arg1, class Arg2, class Result>
struct binary_function {
typedef Arg1 first_argument_type;
typedef Arg2 second_argument_type;
typedef Result result_type;
};
Both these can be used as base class for specific purposes. But still there's no virtual destructor in these. One reason which I could guess is these are not meant to be treated polymorphically. i.e
std::unary_function* ptr;
//intialize it
//do something
delete ptr;
But if that is so, shouldn't destructor be there with protected access specifier so that compiler would break any attempt to do that.
In a well-balanced C++ design philosophy the idea of "preventing" something from happening is mostly applicable when there's a good chance of accidental and not-easily-detectable misuse. And even in that case the preventive measures are only applicable when they don't impose any significant penalties. The purpose of such classes as unary_function, binary_function, iterator etc. should be sufficiently clear to anyone who knows about them. It would take a completely clueless user to use them incorrectly.
In case of classes that implement the well-established idiom of "group member injection" through public inheritance, adding a virtual destructor to the would be a major design error. Turning a non-polymorphic class into a polymorphic one is a major qualitative change. Paying such price for the ability to use this idiom would be prohibitively unacceptable.
A non-virtual protected destructor is a different story... I don't know why they didn't go that way. Maybe it just looked unnecessarily excessive to add a member function to that purpose alone (since otherwise, these classes contain only typedefs).
Note that even though unary_function, binary_function are deprecated, iterator is not. The deprecation does not target the idiom itself. The idiom is widely used within other larger-scale design approaches, like C++ implementation of Mixins and such.
Because std::unary_function and std::binary_function are, by design, not supposed to be used for polymorphic deletion : they exist only to provide typedefs to the child classes, and have no other intent.
Being a base class in C++ does not mean that the class must exhibit any particular polymorphic behaviour.
i.e. you should never see code such as :
void foo(std::unary_function* f)
{
delete f; // illegal
}
Note that both classes are deprecated since C++11 (See N3145)
The basic reason the various type tags (e.g., there is also std::iterator<...>) don't play nicely with people believing everything derived from is meant to be a base class is that overall design of STL where they are used frowns upon the use of inheritance for polymorphism. That is, the people who proposed these classes wouldn't see any reason why anybody would want to treat anything dynamically polymorphic, especially not any of these empty-by-design type tags. Thus, little effort was made to prevent silly mistakes.
When these classes were accepted as part of the STL at large there was a lot more effort spent on removing the rough edges of STL and not so much on unimportant details. Also, having the type tags being empty could be useful as they wouldn't interfere with some of the constraints place upon classes using any access specifiers. Thus, the type tags were left empty.
As it is specifically not needed to use any of these type traits with C++11 (the return type can be determined upon use and the arguments can be perfectly forwarded) these types are being deprecated rather than getting "fixed" (assuming they are considered broken).

Why is it bad to impose type constraints on templates in C++?

In this question the OP asked about limiting what classes a template will accept. A summary of the sentiment that followed is that the equivalent facility in Java is bad; and don't do this.
I don't understand why this is bad. Duck typing is certainly a powerful tool; but in my mind it lends itself confusing runtime issues when a class looks close (same function names) but has slightly different behavior. And you can't necessarily rely on compile time checking because of examples like this:
struct One { int a; int b };
struct Two { int a; };
template <class T>
class Worker{
T data;
void print() { cout << data.a << endl; }
template <class X>
void usually_important () { int a = data.a; int b = data.b; }
}
int main() {
Worker<Two> w;
w.print();
}
Type Two will allow Worker to compile only if usually_important is not called. This could lead to some instantiations of Worker compiling and others not even in the same program.
In a case like this, though. The responsibility is put on to the designer of ENGINE to ensure that it is a valid type (after which they should inherit ENGINE_BASE). If they don't, there will be a compiler error. To me this seems much safer while not imposing any restrictions or adding much additional work.
class ENGINE_BASE {}; // Empty class, all engines should extend this
template <class ENGINE>
class NeedsAnEngine {
BOOST_STATIC_ASSERT((is_base_of<ENGINE_BASE, ENGINE>));
// Do stuff with ENGINE...
};
This is too long, but it might be informative.
Generics in Java are a type erasure mechanism, and automatic code generation of type casts and type checks.
templates in C++ are code generation and pattern matching mechanisms.
You can use C++ templates to do what Java generics do with a bit of effort. std::function< A(B) > behaves in a covariant/contravariant fashion with regards to A and B types and conversion to other std::function< X(Y) >.
But the primary design of the two is not the same.
A Java List<X> will be a List<Object> with some thin wrapping on it so users don't have to do type casts on extraction. If you pass it as a List<? extends Bar>, it again is getting a List<Object> in essence, it just has some extra type information that changes how the casts work and which methods can be invoked. This means you can extract elements from the List into a Bar and know it works (and check it). Only one method is generated for all List<? extends Bar>.
A C++ std::vector<X> is not in essence a std::vector<Object> or std::vector<void*> or anything else. Each instance of a C++ template is an unrelated type (except template pattern matching). In fact, std::vector<bool> uses a completely different implementation than any other std::vector (this is now considered a mistake because the implementation differences "leak" in annoying ways in this case). Each method and function is generated independently for the particular type you pass it.
In Java, it is assumed that all objects will fit into some hierarchy. In C++, that is sometimes useful, but it has been discovered it is often ill fitting to a problem.
A C++ container need not inherit from a common interface. A std::list<int> and std::vector<int> are unrelated types, but you can act on them uniformly -- they both are sequential containers.
The question "is the argument a sequential container" is a good question. This allows anyone to implement a sequential container, and such sequential containers can as high performance as hand-crafted C code with utterly different implementations.
If you created a common root std::container<T> which all containers inherited from, it would either be full of virtual table cruft or it would be useless other than as a tag type. As a tag type, it would intrusively inject itself into all non-std containers, requiring that they inherit from std::container<T> to be a real container.
The traits approach instead means that there are specifications as to what a container (sequential, associative, etc) is. You can test these specifications at compile time, and/or allow types to note that they qualify for certain axioms via traits of some kind.
The C++03/11 standard library does this with iterators. std::iterator_traits<T> is a traits class that exposes iterator information about an arbitrary type T. Someone completely unconnected to the standard library can write their own iterator, and use std::iterator<...> to auto-work with std::iterator_traits, add their own type aliases manually, or specialize std::iterator_traits to pass on the information required.
C++11 goes a step further. for( auto&& x : y ) can work with things that where written long before the range-based iteration was designed, without touching the class itself. You simply write a free begin and end function in the namespace that the class belongs to that returns a valid forward iterator (note: even invalid forward iterators that are close enough work), and suddenly for ( auto&& x : y ) starts working.
std::function< A(B) > is an example of using these techniques together with type erasure. It has a constructor that accepts anything that can be copied, destroyed, invoked with (B) and whose return type can be converted to A. The types it can take can be completely unrelated -- only that which is required is tested for.
Because of std::functions design, we can have lambda invokables that are unrelated types that can be type-erased into a common std::function if needed, but when not type erased their invokation action is known from there type. So a template function that takes a lambda knows at the point of invokation what will happen, which makes inlining an easy local operation.
This technique is not new -- it was in C++ since std::sort, a high level algorithm that is faster than C's qsort due to the ease of inlining invokable objects passed as comparators.
In short, if you need a common runtime type, type erase. If you need certain properties, test for those properties, don't force a common base. If you need certain axioms to hold (untestable properties), either document or require callers to claim those properties via tags or traits classes (see how the standard library handles iterator categories -- again, not inheritance). When in doubt, use free functions with ADL enabled to access properties of your arguments, and have your default free functions use SFINAE to look for a method and invoke if it exists, and fail otherwise.
Such a mechanism removes the central responsibility of a common base class, allows existing classes to be adapted without modification to pass your requirements (if reasonable), places type erasure only where it is needed, avoids virtual overhead, and ideally generates clear errors when properties are found to not hold.
If your ENGINE has certain properites it needs to pass, write a traits class that tests for those.
If there are properties that cannot be tested for, create tags that describe such properties. Use specialization of a traits class, or canonical typedefs, to let the class describe which axioms hold for the type. (See iterator tags).
If you have a type like ENGINE_BASE, don't demand it, but instead use it as a helper for said tags and traits and axiom typedefs, like std::iterator<...> (you never have to inherit from it, it simply acts as a helper).
Avoid over specifying requirements. If usually_important is never invoked on your Worker<X>, probably your X doesn't need a b in that context. But do test for properties in a way clearer than "method does not compile".
And sometimes, just punt. Following such practices might make things harder for you -- so do an easier way. Most code is written and discarded. Know when your code will persist, and write it better and more extendably and more maintainably. Know that you need to practice those techniques on disposable code so you can write it correctly when you have to.
Let me turn the question around on you: Why is it bad that the code compiles for Two if usually_important isn't called? The type you gave it meets all the needs for that particular instantiation and the compiler will immediately tell you if a particular instantiation no longer meets the interface needed for the needed functionality in the template.
That said if you insist that you need an Engine object, don't do it with templates at all, instead treat it as a sort of strategy pattern with a non-template (using this approach enforces at compile time that the user-defined type adheres to a specific interface, not just that it looks like a duck):
class Worker
{
public:
explicit Worker(EngineBase* data) : data_(data) {}
void print() { cout << data_->a() << endl; }
template <class X>
void usually_important () { int a = data_->a(); int b = data_->b(); }
private:
EngineBase* data_;
}
int main()
{
Worker w(new ConcreteEngine);
w.print();
}
I don't understand why this is bad. Duck typing is certainly a
powerful tool; but in my mind it lends itself confusing runtime issues
when a class looks close (same function names) but has slightly
different behavior.
The probability that you can define a non-trivial interface and then by accident have another interface that has different semantics but can be substituted is minimal. This never, ever happens.
Type Two will allow Worker to compile only if usually_important is not
called.
That is a good thing. We depend on it all the time. It makes class templates more flexible.
Matching a compile-time interface is strictly superior to a run-time one. This is because run-time interfaces can't differ in key ways that compile-time ones can (e.g. different types in the interface), and require a bunch of run-time abstraction like dynamic allocation that may be unnecessary.
In a case like this, though. The responsibility is put on to the
designer of ENGINE to ensure that it is a valid type (after which they
should inherit ENGINE_BASE). If they don't, there will be a compiler
error. To me this seems much safer while not imposing any restrictions
or adding much additional work.
It is not safer. It is utterly pointless. It is stupendously unlikely that the user will accidentally instantiate the class with the wrong type but it will compile successfully due to circumstantial interface match.
What it really boils down to is this: you should only require what you really need. Absolutely definitely must have in order to function. Everything else, don't require it. This is a core tenet of making software maintainable. You cannot possibly imagine what shenanigans I might conceive of long after you have written this class to use it in ways that you never thought it could be used for.

Why is std::type_info polymorphic?

Is there a reason why std::type_info is specified to be polymorphic? The destructor is specified to be virtual (and there's a comment to the effect of "so that it's polymorphic" in The Design and Evolution of C++). I can't really see a compelling reason why. I don't have any specific use case, I was just wondering if there ever was a rationale or story behind it.
Here's some ideas that I've come up with and rejected:
It's an extensibility point - implementations might define subclasses, and programs might then try to dynamic_cast a std::type_info to another, implementation-defined derived type. This is possibly the reason, but it seems that it's just as easy for implementations to add an implementation-defined member, which could possibly be virtual. Programs wishing to test for these extensions would necessarily be non-portable anyway.
It's to ensure that derived types are destroyed properly when deleteing a base pointer. But there are no standard derived types, users can't define useful derived types, because type_info has no standard public constructors, and so deleteing a type_info pointer is never both legal and portable. And the derived types aren't useful because they can't be constructed - the only use I know for such non-constructible derived types is in the implementation of things like the is_polymorphic type trait.
It leaves open the possibility of metaclasses with customized types - each real polymorphic class A would get a derived "metaclass" A__type_info, which derives from type_info. Perhaps such derived classes could expose members that call new A with various constructor arguments in a type-safe way, and things like that. But making type_info polymorphic itself actually makes such an idea basically impossible to implement, because you'd have to have metaclasses for your metaclasses, ad infinitum, which is a problem if all the type_info objects have static storage duration. Maybe barring this is the reason for making it polymorphic.
There's some use for applying RTTI features (other than dynamic_cast) to std::type_info itself, or someone thought that it was cute, or embarrassing if type_info wasn't polymorphic. But given that there's no standard derived type, and no other classes in the standard hierarchy which one might reasonably try cross-cast to, the question is: what? Is there a use for expressions such as typeid(std::type_info) == typeid(typeid(A))?
It's because implementers will create their own private derived type (as I believe GCC does). But, why bother specifying it? Even if the destructor wasn't specified as virtual and an implementer decided that it should be, surely that implementation could declare it virtual, because it doesn't change the set of allowed operations on type_info, so a portable program wouldn't be able to tell the difference.
It's something to do with compilers with partially compatible ABIs coexisting, possibly as a result of dynamic linking. Perhaps implementers could recognize their own type_info subclass (as opposed to one originating from another vendor) in a portable way if type_info was guaranteed to be virtual.
The last one is the most plausible to me at the moment, but it's pretty weak.
I assume it's there for the convenience of implementers. It allows them to define extended type_info classes, and delete them through pointers to type_info at program exit, without having to build in special compiler magic to call the correct destructor, or otherwise jump through hoops.
surely that implementation could
declare it virtual, because it doesn't
change the set of allowed operations
on type_info, so a portable program
wouldn't be able to tell the
difference.
I don't think that's true. Consider the following:
#include <typeinfo>
struct A {
int x;
};
struct B {
int x;
};
int main() {
const A *a1 = dynamic_cast<const A*>(&typeid(int));
B b;
const A *a2 = dynamic_cast<const A*>(&b);
}
Whether it's reasonable or not, the first dynamic cast is allowed (and evaluates to a null pointer), whereas the second dynamic cast is not allowed. So, if type_info was defined in the standard to have the default non-virtual destructor, but an implementation added a virtual destructor, then a portable program could tell the difference[*].
Seems simpler to me to put the virtual destructor in the standard, than to either:
a) put a note in the standard that, although the class definition implies that type_info has no virtual functions, it is permitted to have a virtual destructor.
b) determine the set of programs which can distinguish whether type_info is polymorphic or not, and ban them all. They may not be very useful or productive programs, I don't know, but to ban them you have to come up with some standard language that describes the specific exception you're making to the normal rules.
Therefore I think that the standard has to either mandate the virtual destructor, or ban it. Making it optional is too complex (or perhaps I should say, I think it would be judged unnecessarily complex. Complexity never stopped the standards committee in areas where it was considered worthwhile...)
If it was banned, though, then an implementation could:
add a virtual destructor to some derived class of type_info
derive all of its typeinfo objects from that class
use that internally as the polymorphic base class for everything
that would solve the situation I described at the top of the post, but the static type of a typeid expression would still be const std::type_info, so it would be difficult for implementations to define extensions where programs can dynamic_cast to various targets to see what kind of type_info object they have in a particular case. Perhaps the standard hoped to allow that, although an implementation could always offer a variant of typeid with a different static type, or guarantee that a static_cast to a certain extension class will work, and then let the program dynamic_cast from there.
In summary, as far as I know the virtual destructor is potentially useful to implementers, and removing it doesn't gain anyone anything other than that we wouldn't be spending time wondering why it's there ;-)
[*] Actually, I haven't demonstrated that. I've demonstrated that an illegal program would, all else being equal, compile. But an implementation could perhaps work around that by ensuring that all isn't equal, and that it doesn't compile. Boost's is_polymorphic isn't portable, so while it's possible for a program to test that a class is polymorphic, that should be, there may be no way for a conforming program to test that a class isn't polymorphic, that shouldn't be. I think though that even if it's impossible, proving that, in order to remove one line from the standard, is quite a lot of effort.
The C++ standard says that typeid returns an object of type type_info, OR AN IMPLEMENTATION-DEFINED subclass thereof. So... I guess this is pretty much the answer. So I don't see why you reject your points 1 and 2.
Paragraph 5.2.8 Clause 1 of the current C++ standard reads:
The result of a typeid expression is an
lvalue of static type const
std::type_info (18.5.1) and dynamic
type const std::type_info or const
name where name is an
implementation-defined class derived
from std::type_info which preserves
the behavior described in 18.5.1.61)
The lifetime of the object referred to
by the lvalue extends to the end of
the program. Whether or not the
destructor is called for the type_info
object at the end of the program is
unspecified.
Which in turn means that one could write the following code is legal and fine:
const type_info& x = typeid(expr); which may require that type_info be polymorphic
3/ It leaves open the possibility of metaclasses with customized types - each real polymorphic class A would get a derived "metaclass" A__type_info, which derives from type_info. Perhaps such derived classes could expose members that call new A with various constructor arguments in a type-safe way, and things like that. But making type_info polymorphic itself actually makes such an idea basically impossible to implement, because you'd have to have metaclasses for your metaclasses, ad infinitum, which is a problem if all the type_info objects have static storage duration. Maybe barring this is the reason for making it polymorphic.
Clever...
Anyway, I disagree with this reasoning: such implementation could easily rule-out meta classes for types derived from type_info, including type_info itself.
About the simplest "global" id you can have in C++ is a class name,and typeinfo provides a way to compare such id's for equality. But the design is so awkward and limited that you then need to wrap typeinfo in some wrapper class, e.g. to be able to put instances in collections. Andrei Alexandrescu did that in his "Modern C++ Design" and I think that that typeinfo wrapper is part of the Loki library; there's probably one also in Boost; and it's pretty easy to roll your own, e.g. see my own wrapper.
But even for such a wrapper there's not in general any need for a virtual destructor in typeinfo.
The question is therefore not so much "huh, why is there a virtual destructor" but rather, as I see it, "huh, why is the design so backward, awkward and not directly usable"? And I'd put that down to the standardization process. For example, iostreams are not exactly examples of superb design, either; not something to emulate.