Why can't I cast a Box with an extended trait to a Box with the base trait? [duplicate] - casting

This question already has answers here:
Why doesn't Rust support trait object upcasting?
(5 answers)
Closed 4 years ago.
Given the code
trait Base { }
trait Derived : Base { }
struct Foo { }
impl Base for Foo { }
impl Derived for Foo { }
fn main()
{
let b : Box<Derived> = Box::new( Foo { } );
let a : Box<Base> = b;
}
When I compile as I'm sure you know I get the following error message:
error[E0308]: mismatched types
--> src/main.rs:14:25
|
14 | let a : Box<Base> = b;
| ^ expected trait `Base`, found trait `Derived`
|
= note: expected type `std::boxed::Box<Base>`
found type `std::boxed::Box<Derived>`
Why am I not allowed to do this? If a Box contains a Dervied it is guaranteed that this also contains a Base. Is there any way to do this? And if not, what is a common way to for instance store a vector of different Traits that all have the same base Trait?

The short answer is because traits are not interfaces.
The long answer is because a &Base trait object and a &Derived trait object are not the same thing. The vtables are different because Derived and Base are different traits. The vtable for Derived would include all of Dervied's methods as well as all of Base's while the vtable for &Base would only include Base's methods.
Now, obviously, Base's methods are in &Derived's vtable. So perhaps you could do something clever and get the behavior you want:
If the Base's methods were listed first in &Derived's vtable, then you could just cast &Derived to &Base and that would work. However, &Derived and &Base vtables have different lengths and doing so would chop off everything past the end of &Base. So if you try to call a method on that object which is part of Derived, you'll invoke undefined behavior.
You could run some magic code which would analyze the definitions of &Base and &Derived and be able to construct a vtable for &Base from &Derived. This would require additional information at runtime about these types and their layout. This would also have a non-zero performance cost in addition to the additional memory usage. One of the basic principles of Rust is "zero cost abstractions" which generally means that potentially expensive operations are explicit and not implicit (if let a: Box<Base> = b; did this, it would generally be considered too implicit).
It's difficult to say in general what a better pattern is. If you are modeling a closed-set of items, enums are generally a better way to go:
enum Animal {
Dog { name: String, age: u8 },
Cat { name: String, age: u8, sleeping: bool },
Fish { name: String, age: u8, in_ocean: bool },
}
If you are trying to do something more complicated, Entity Component Systems like specs can give you a lot more flexibility than a simple enum.

Related

Deduce return type of a function on derived class automatically on base class

I would like to achieve something like following on c++14, basically derived class can have different type of return type (e.g int, double, string, etc)
class Base {
public:
virtual auto value() = 0; // I know this won't compile
};
class Derived1 : public Base {
public:
double value() override { return 1.0; };
};
class Derived2 : public Base {
public:
int value() override { return 1; };
};
I know above code won't compile but I'm trying to achieve something like that using any possible way or pattern (I've tried template, CRTP, visitor but nothing can satisfy my following code)
Derived1 d1;
Derived2 d2;
std::vector<Base*> base = { &d1, &d2 };
for (const auto* b : base)
std::cout << b->value();
The best I can get with template is something like
Derived1 d1;
Derived2 d2;
std::vector<Base*> base = {&d1, &d2);
for (const auto* b : base)
if (dynamic_cast<Derived1>(b))
std::cout << b->value<double>();
else if (dynamic_cast<Derived2>(b))
std::cout << b->value<int>();
but if I have 100 types of Derived class, it won't look that pretty :D
This is not possible, fundamentally, in C++. C++ does not work this way, for the following simple reason. Let's just pretend that this works somehow. Consider the following simple function:
void my_function(Base *p)
{
auto value=p->value();
}
Now, ask yourself: what is value's type, here? You may not be aware of this, but there is no such actual type called auto in C++. auto is a placeholder for the C++ compiler to deduce, or determine the actual type at compile time. auto basically says: whatever the expression's type evaluates to be, that's the type of this object. If your C++ compiler determined that p->value() returns an int, then value is an int, and the above is 100% equivalent to declaring int value=p->value();.
Here, it is impossible to determine values actual type. Is it an int? Is it a double? Or something else?
Unfortunately, it's a mystery that will remain unsolved forever. The actual value of type depends on the derived object that the pointer to the Base actually points to, which is unknown at compile-time, and can only be determine at run time.
It is a fundamental property of C++ that the types of all objects must be deduced at compile time. This is baked-into C++. There are no workarounds. There are no alternatives. What you are trying to do cannot be done in C++.
However, there's a little bit of good news: if the number of possible return types is limited, just declare an ordinary virtual method that returns a std::variant, and each derived class can then return an appropriate value. It will be up to the caller to make use of it.
In the case above, this would be:
class Base {
public:
virtual std::variant<int, double> value() = 0;
};
If the type of the actual value being returned is completely unknown, then I suppose you can use std::any. In either case, as you attempt to implement either approach you will discover that C++ will force you to figure out and check for each possible type (in ways that depend on whether you use std::variant or std::any), each time you attempt to use the value returned from this method.
An abstract base class is generally intended as a public interface to the implementation class.
In this case your interface is changing with every child: when the return type changes, the function signature changes and that is why override will result in compilation error as you might have already realized.
Visitor is useful provided your class system is stable as described here:
Confirm that the current hierarchy (known as the Element hierarchy) will be fairly stable and that the public interface of these classes
is sufficient for the access the Visitor classes will require. If
these conditions are not met, then the Visitor pattern is not a good
match.
To implement Visitor, you would generally define multiple functions with different input parameter types instead of using a dynamic cast (as described on the link above).
You can also do away with the class inheritance altogether. Check out Sean Parent's talk. He describes a similar use case and uses template to do what you might be trying to do. The trick is to define one class with a templated constructor and a on object type to use with the constructor.

Why does the C++ standard require compilers to ignore calls to convertion operators for base types?

Take the following code:
#include <iostream>
struct Base {
char x = 'b';
};
struct Derived : Base {
operator Base() { return Base { 'a' }; }
};
int main() {
Derived derived;
auto base = static_cast<Base>(derived);
std::cout << "BASE -> " << base.x << std::endl;
}
Under both g++ and clang++, this produces:
BASE -> b
I was expecting the following:
BASE -> a
Why? Because why I read this code, I see a conversion operator inside of Derived that returns an instance of Base containing 'a'.
clang++ did me the courtesy of emitting a warning:
main.cpp:9:5: warning: conversion function converting 'Derived' to its base class 'Base' will never be used
operator Base() { return Base { 'a' }; }
Researching this warning, I found that this was by design (paraphrased for clarity):
class.conv.fct
The type of the conversion function ([dcl.fct]) is “function taking no parameter returning conversion-type-id”. A conversion function is never used to convert a (possibly cv-qualified) object [...] to a (possibly cv-qualified) base class of that type (or a reference to it) [...].
So it would seem that both compilers are doing the right thing here.
My question is, why does the standard require this behaviour?
If you could override the conversion-to-base-class in C++, then you could break lots and lots of stuff. For example, how exactly would you be able to get access to the actual base class instance of a class? You would need some baseof template, similar to std::addressof that's used to bypass the il-conceived operator& overload.
Allowing this would create confusion as to what code means. With this rule in place, it's clear that converting a class to its base class copies the base class instance, in all cases.
Let's I have a hierarchy of Animals and I want to write a function that takes an modifiable Animal, without slicing. How would I do that? I have two options:
void by_ref(Animal& );
void by_ptr(Animal* );
Which I could call like:
Dog dog = ...;
by_ref(dog);
by_ptr(&dog);
Today, the only difference between those two calls is going to be the syntax used inside of the two functions and possibly a check against nullptr. This is because the Dog to Animal& and Dog* to Animal* are guaranteed standard derived-to-base conversions. There is no alternative.
But imagine if I could actually write:
struct Dog : Animal {
operator Animal&() { ... };
};
Now those two calls could do totally different things! My Dog* to Animal* conversion is still the same dog, but my Dog to Animal& conversion is an entirely different Dog. It could even be a Cat. Which would make all of this code basically impossible to reason about.
You would need a special mechanism to definitely give you the base subobject of a particular type:
by_ref(std::base<Animal>(dog));
which would basically have to be used everywhere to guarantee correctness of any code that relies upon inheritance. Which is a lot of code.
And to what benefit? If you want a different base subobject, you can just write a differently named function:
struct Dog : Animal {
Animal& foo();
};
Naming may be one of the two hard things about programming, but better to just come up with your own name than to open up the bag of deplorables that would be allowing people to write their own derived-to-base-but-not-really conversions.
The designer of C++ have decided that this general principle should apply (Stroustrup, The C++ Programming language, chapter 18)
User-defined conversions are considered only if a call cannot be resolved without them (i.e., using only built-in conversions).
For better or worse, there is a built-in conversion from Derived to Base, so the user-defined operator is never considered.
The standard generally follows the path devised by Stroustrup unless there is a very good reason not to.
Stroustrup cites the following rationale for the overall type conversion design (ibid.):
The rules for conversion are neither the simplest to implement, nor the simplest to document, nor the most general that could be devised. They are, however, considerably safer, and the resulting resolutions are typically less surprising than alternatives. It is far easier to manually resolve an
ambiguity than to find an error caused by an unsuspected conversion.

What is the advantage of using dynamic_cast instead of conventional polymorphism?

We can use Polymorphism (inheritance + virtual functions) in order to generalize different types under a common base-type, and then refer to different objects as if they were of the same type.
Using dynamic_cast appears to be the exact opposite approach, as in essence we are checking the specific type of an object before deciding what action we want to take.
Is there any known example for something that cannot be implemented with conventional polymorphism as easily as it is implemented with dynamic_cast?
Whenever you find yourself wanting a member function like "IsConcreteX" in a base class (edit: or, more precisely, a function like "ConcreteX *GetConcreteX"), you are basically implementing your own dynamic_cast. For example:
class Movie
{
// ...
virtual bool IsActionMovie() const = 0;
};
class ActionMovie : public Movie
{
// ...
virtual bool IsActionMovie() const { return true; }
};
class ComedyMovie : public Movie
{
// ...
virtual bool IsActionMovie() const { return false; }
};
void f(Movie const &movie)
{
if (movie.IsActionMovie())
{
// ...
}
}
This may look cleaner than a dynamic_cast, but on closer inspection, you'll soon realise that you've not gained anything except for the fact that the "evil" dynamic_cast no longer appears in your code (provided you're not using an ancient compiler which doesn't implement dynamic_cast! :)). It's even worse - the "self-written dynamic cast" approach is verbose, error-prone and repetitve, while dynamic_cast will work just fine with no additional code whatsoever in the class definitions.
So the real question should be whether there are situations where it makes sense that a base class knows about a concrete derived class. The answer is: usually it doesn't, but you will doubtlessly encounter such situations.
Think, in very abstract terms, about a component of your software which transmits objects from one part (A) to another (B). Those objects are of type Class1 or Class2, with Class2 is-a Class1.
Class1
^
|
|
Class2
A - - - - - - - -> B
(objects)
B, however, has some special handling only for Class2. B may be a completely different part of the system, written by different people, or legacy code. In this case, you want to reuse the A-to-B communication without any modification, and you may not be in a position to modify B, either. It may therefore make sense to explicitly ask whether you are dealing with Class1 or Class2 objects at the other end of the line.
void receiveDataInB(Class1 &object)
{
normalHandlingForClass1AndAnySubclass(object);
if (typeid(object) == typeid(Class2))
{
additionalSpecialHandlingForClass2(dynamic_cast<Class2 &>(object));
}
}
Here is an alternative version which does not use typeid:
void receiveDataInB(Class1 &object)
{
normalHandlingForClass1AndAnySubclass(object);
Class2 *ptr = dynamic_cast<Class2 *>(&object);
if (ptr != 0)
{
additionalSpecialHandlingForClass2(*ptr);
}
}
This might be preferable if Class2 is not a leaf class (i.e. if there may be classes further deriving from it).
In the end, it often comes down to whether you are designing a whole system with all its parts from the beginning or have to modify or adapt parts of it at a later stage. But if you ever find yourself confronted with a problem like the one above, you may come to appreciate dynamic_cast as the right tool for the right job in the right situation.
It allows you to do things which you can only do to the derived type. But this is usually a hint that a redesign is in order.
struct Foo
{
virtual ~Foo() {}
};
struct Bar : Foo
{
void bar() const {}
};
int main()
{
Foo * f = new Bar();
Bar* b = dynamic_cast<Bar*>(f);
if (b) b->bar();
delete f;
}
I can't think of any case where it's not possible to use virtual functions (other than such things as boost:any and similar "lost the original type" work).
However, I have found myself using dynamic_cast a few times in the Pascal compiler I'm currently writing in C++. Mostly because it's a "better" solution than adding a dozen virtual functions to the baseclass, that are ONLY used in one or two places when you already (should) know what type the object is. Currently, out of roughly 4300 lines of code, there are 6 instances of dynamic_cast - one of which can probably be "fixed" by actually storing the type as the derived type rather than the base-type.
In a couple of places, I use things like ArrayDecl* a = dynamic_cast<ArrayDecl*>(type); to determine that type is indeed an array declaration, and not someone using an non-array type as a base, when accessing an index (and I also need a to access the array type information later). Again, adding all the virtual functions to the base TypeDecl class would give lots of functions that mostly return nothing useful (e.g. NULL), and aren't called except when you already know that the class is (or at least should be) one of the derived types. For example, getting to know the range/size of an array is useless for types that aren't arrays.
No advantages really. Sometimes dynamic_cast is useful for a quick hack, but generally it is better to design classes properly and use polymorphism. There may be cases when due to some reasons it is not possible to modify the base class in order to add necessary virtual functions (e.g. it is from a third-party which we do not want to modify), but still dynamic_cast usage should be an exception, not a rule.
An often used argument that it is not convenient to add everything to the base class does not work really, since the Visitor pattern (see e.g. http://sourcemaking.com/design_patterns/visitor/cpp/2) solves this problem in a more organised way purely with polymorphism - using Visitor you can keep the base class small and still use virtual functions without casting.
dynamic_cast needs to be used on base class pointer for down cast when member function is not available in base class, but only in derived class. There is no advantage to use it. It is a way to safely down cast when virtual function is not overridden from base class. Check for null pointer on return value. You are correct in that it is used where there is no virtual function derivation.

Why 'virtual' inheritance is not the default behaviour? [duplicate]

This question already has answers here:
Why is the virtual keyword needed?
(4 answers)
Closed 8 years ago.
I understand the requirement of using virtual keyword when deriving from base classes to avoid diamond inheritance related ambiguity problems.
But, my question is why this is not the default behaviour in C++ when deriving classes regardless whether diamond problem could be or could not be there?
Is there any 'harm' of using 'virtual' keyword in a situation where diamond inheritance does not present?
Virtual inheritance has a run-time overhead: converting pointers requires an adjustment that's only known at run time while, with non-virtual inheritance, it can be known at compile time. It can also make a class more complicated to derive from, since virtual base classes are initialised by the final derived class, not (necessarily) the class that inherits directly from them.
You would therefore only want it when you specifically want a diamond structure; it would be a pain to have to remember to specify non-virtual inheritance to avoid a hidden overhead. C++ generally follows the principle that you shouldn't pay for features you don't need.
There's an overhead, try it:
#include <iostream>
struct Foo {
int a;
};
struct Bar : Foo {
int b;
};
struct Baz : virtual Foo {
int b;
};
int main() {
std::cout << sizeof(Foo) << " ";
std::cout << sizeof(Bar) << " ";
std::cout << sizeof(Baz) << "\n";
}
On my implementation I get 4 8 16. Virtual inheritance requires a vptr or equivalent mechanism, because the class Baz does not know at what offset the Foo base class sub-object will appear relative to the Baz base class sub-object. It depends whether the most-derived type also inherits Foo by another route.
Since the vptr is there, one also expects that in certain circumstances it will be used, which is more overhead :-) That is, one or more additional indirections are required in order to access Foo::a via a Baz* or Baz&. The compiler might choose to avoid that if it somehow knows the most-derived type of the referand.

Can 'this' pointer be different than the object's pointer?

I've recently came across this strange function in some class:
void* getThis() {return this;}
And later in the code it is sometimes used like so: bla->getThis() (Where bla is a pointer to an object of the class where this function is defined.)
And I can't seem to realize what this can be good for. Is there any situation where a pointer to an object would be different than the object's this (where bla != bla->getThis())?
It seems like a stupid question but I wonder if I'm missing something here..
Of course, the pointer values can be different! Below an example which demonstrates the issue (you may need to use derived1 on your system instead of derived2 to get a difference). The point is that the this pointer typically gets adjusted when virtual, multiple inheritance is involved. This may be a rare case but it happens.
One potential use case of this idiom is to be able to restore objects of a known type after storing them as void const* (or void*; the const correctness doesn't matter here): if you have a complex inheritance hierarchy, you can't just cast any odd pointer to a void* and hope to be able to restore it to its original type! That is, to easily obtain, e.g., a pointer to base (from the example below) and convert it to void*, you'd call p->getThis() which is a lot easier to static_cast<base*>(p) and get a void* which can be safely cast to a base* using a static_cast<base*>(v): you can reverse the implicit conversion but only if you cast back to the exact type where the original pointer came from. That is, static_cast<base*>(static_cast<void*>(d)) where d is a pointer to an object of a type derived from base is illegal but static_cast<base*>(d->getThis()) is legal.
Now, why is the address changing in the first place? In the example base is a virtual base class of two derived classes but there could be more. All subobjects whose class virtually inherits from base will share one common base subject in object of a further derived class (concrete in the example below). The location of this base subobject may be different relative to the respective derived subobject depending on how the different classes are ordered. As a result, the pointer to the base object is generally different from the pointers to the subobjects of classes virtually inheriting from base. The relevant offset will be computed at compile-time, when possible, or come from something like a vtable at run-time. The offsets are adjusted when converting pointers along the inheritance hierarchy.
#include <iostream>
struct base
{
void const* getThis() const { return this; }
};
struct derived1
: virtual base
{
int a;
};
struct derived2
: virtual base
{
int b;
};
struct concrete
: derived1
, derived2
{
};
int main()
{
concrete c;
derived2* d2 = &c;
void const* dptr = d2;
void const* gptr = d2->getThis();
std::cout << "dptr=" << dptr << " gptr=" << gptr << '\n';
}
No. Yes, in limited circumstances.
This looks like it is something inspired by Smalltalk, in which all objects have a yourself method. There are probably some situations in which this makes code cleaner. As the comments note, this looks like an odd way to even implement this idiom in c++.
In your specific case, I'd grep for actual usages of the method to see how it is used.
Your class can have custom operator& (so &a may not return this of a). That's why std::addressof exists.
I ran across something like this many (many many) years ago. If I recall correctly, it was needed when a class is manipulating other instances of the same class. One example might be a container class that can contain its own type/(class?).
That might be a way to override the this keyword.
Lets say that you have a memory pool, full initialized at the start of your program, for instance you know that at any time you can deal with a max of 50 messages, CMessage.
You create a pool at the size of 50 * sizeof(CMessage) (what ever this class might be), and CMessage implements the getThis function.
That way instead of overriding the new keyword you just override the "this", accessing the pool.
It can also mean that the object might be defined on different memory spaces, lets say on a SRAM, in boot mode, and then on a SDRAM.
It might be that the same instance will return different values for getThis through the program in such a situation, on purpose of course, when overriden.