Context
My goal is to have a base container class that contains and manipulates several base class objects, and then a derived container class that contains and manipulates several derived class objects. On the advice of this answer, I attempted to do this by having each contain an array of pointers (a Base** and a Derived**), and cast from Derived** to Base** when initializing the base container class.
However, I ran into a problem – despite compiling just fine, when manipulating the contained objects, I'd have segfaults or the wrong methods would be called.
Problem
I have boiled the problem down to the following minimal case:
#include <iostream>
class Base1 {
public:
virtual void doThing1() {std::cout << "Called Base1::doThing1" << std::endl;}
};
class Base2 {
public:
virtual void doThing2() {std::cout << "Called Base2::doThing2" << std::endl;}
};
// Whether this inherits "virtual public" or just "public" makes no difference.
class Derived : virtual public Base1, virtual public Base2 {};
int main() {
Derived derived;
Derived* derivedPtrs[] = {&derived};
((Base2**) derivedPtrs)[0]->doThing2();
}
You may expect this to print "Called Base2::doThing2", but…
$ g++ -Wall -Werror main.cpp -o test && ./test
Called Base1::doThing1
Indeed – the code calls Base2::doThing2, but Base1::doThing1 ends up being called. I've also had this segfault with more complex classes, so I assume it's address-related hijinks (perhaps vtable-related – the error doesn't seem to occur without virtual methods). You can run it here, and see the assembly it compiles to here.
You can see my actual structure here – it's more complex, but it ties it into the context and explains why I need something along the lines of this.
Why does the Derived**→Base** cast not work properly when Derived*→Base* does, and more importantly, what is the right way to handle an array of derived objects as an array of base objects (or, failing that, another way to make a container class that can contain multiple derived objects)?
I can't index before upcasting (((Base2*) derivedPtrs[0])->doThing2()), I'm afraid, since in the full code that array is a class member – and I'm not sure it's a good idea (or even possible) to cast manually in every place contained objects are used in the container classes. Correct me if that is the way to handle this, though.
(I don't think it makes a difference in this case, but I'm in an environment where std::vector is unavailable.)
Edit: Solution
Many of the answers suggested that casting each pointer individually is the only way to have an array that can contain derived objects – and that does indeed seem to be the case. For my particular use case, though, I managed to solve the problem using templates! By supplying a type parameter for what the container class is supposed to contain, instead of having to contain an array of derived objects, the type of the array can be set to the derived type at compile time (e.g. BaseContainer<Derived> container(length, arrayOfDerivedPtrs);).
Here's a version of the broken "actual structure" code above, fixed with templates.
There are many things that make this code pretty awful and contribute to this issue:
Why are we dealing with two-star types in the first place? If std::vector doesn't exist, why not write your own?
Don't use C-style casts. You can cast pointers to completely unrelated types into each other and the compiler is not allowed to stop you (coincidentally, this is exactly what is happening here). Use static_cast/dynamic_cast instead.
Let's assume we had std::vector, for ease of notation. You are trying to cast a std::vector<Derived*> to std::vector<Base*>. Those are unrelated types (the same is true for Derived** and Base**), and casting one to the other is not legal in any way.
Pointer casts from/to derived are not necessarily trivial. If you have struct X : A, B {}, then a pointer to the B base will be different from a pointer to the A base† (and with a vtable in play, possible also different from a pointer to X). They must be, because the (sub)objects cannot reside at the same memory address. The compiler will adjust the pointer value when you cast the pointer. This of course does not/cannot happen for each individual pointer if you (try to) cast an array of pointers.
If you have an array of pointers to Derived and want to get an array of those pointers to Base, then you have to manually cast each one. Since the pointer values will generally be different in both arrays, there is no way to "reuse" the same array.
†(Unless the conditions for empty base optimization are fulfilled, which is not the case for you).
Like others said the problem is that you are not letting the compiler doing its job in adjusting indices, suppose Derived layout in memory is something like (not guaranteed by the standard, just a possible implementation):
| vtable_Base1 | Base1 | vtable_Base2 | Base2 | vtable_Derived | Derived |
Then &derived points to the start of the object, when you normally do
Base2* base = static_cast<Derived*>(&derived)
the compiler knows the offset which Base2 structure has inside Derived type and adjusts the address to point to the start of it.
If, instead, you cast an array of pointers directly the compiler is coercing the type, assuming that your array is storing pointers to Base2 already, but without adjusting them.
A dirty hack which may work or not in your situation is having a method which returns the pointer to itself, eg:
class Base2 {
public:
Base2* base2() { return this; }
}
so that you can do derivedPtrs[0]->base2()->doThing2().
It doesn't work the same reason why this don't work:
struct Base {};
struct Derived : Base { int i; };
int main() {
Derived d[6];
Derived* d2 = d;
Base** b = &d2; // ERROR!
}
Your c-style cast is a bad practice because it didn't warn you of the error. Just don't do that to your code. Your c-style cast was actually a reinterpret_cast in disguise, which is tottaly wrong in the case.
But why cannot you cast an array to derived into an array to base? Simple: they have different layout.
You see, when you iterate on an array of a type, each elements in the array are contiguous in memory. The Derived class may have a size of let's say, 24 bytes, and Base a size of 8:
Derived d[4];
------------------------------------------------------
| D1 | D2 | D3 | D4 |
------------------------------------------------------
Base b[4];
---------------------
| B1 | B2 | B3 | B4 |
---------------------
As you can see, Derived[4] and Base[4] are different types with different layouts.
Then what can you do?
There's many solution in fact. The easiest would be to create a new array of pointer to base, and cast each derived to base pointers. You have to ajust the pointer of each objects anyway.
It would look like this:
std::vector<Base*> bases;
bases.reserve(std::size(derived_arr))
std::transform(
std::begin(derived_arr), std::end(derived_arr),
std::back_inserter(bases),
[](Dervied* d) {
// You must use dynamic cast because the
// pointer offset in only known at runtime
// when using virtual inheritance
return dynamic_cast<Base*>(d);
}
);
The other solution that is in place in memory would be to create you own type of iterator that would do the cast when calling the operator* and the operator->. This is a bit harder to do but can save you an allocation by making iteration a bit slower.
On top all that, this may be unrelated, but I would advise against virtual inheritance. This is not a good practice and in my experience resulted in more pain than anything. I would suggest using the adaptator pattern and wrap non-polymorphic types instead.
When you cast from Derived* to Base* the compiler will adjust the value. When you cast Derived** to Base** you defeat that.
This is a good reason to always use static_cast. Example error with your code changed:
test.cpp:19:5: error: static_cast from 'Derived **' to 'Base2 **' is not
allowed
static_cast<Base2**>(derivedPtrs)[0]->doThing2();
(Base2 **) is in fact a reinterpret_cast (You can confirm this by trying all four casts) and that expression causes UB. The fact that you can implicitly cast a pointer to derived to a pointer to base doesn't mean that they are the same, e.g. int and float. And here, you are referring an object by the type which it isn't, in this case, causes an UB.
Calling a virtual function this way results the final overrider of that object being called. How this is achieved depends on the compiler.
Assume that the compiler uses a "vtable", finding the vtable of Base2 (might be adding an offset to the memory. assembly) from the memory address of a Derived is not trivial.
Basically, you must perform a dynamic cast on every pointer, store them somewhere or dynamic cast it when in need.
Related
Why does static cast allow an upcast or downcast between pointers to objects derived or base as below, but in the case of casting between a char* and int* or vice versa int* to char*, there is a compilation error?
Casting between different pointers to objects is just as bad I believe.
// compiles fine
class Base {};
class Derived: public Base {};
Base * a = new Base;
Derived * bc = static_cast<Derived*>(a);
// Gives an invalid static cast error during compilation
char charVar = 8;
char* charPtr = &charVar;
int* intPtr = static_cast<int*>(charPtr);
C++ is strongly performance oriented. So as long as there is some use case for that you can gain performance, C++ will allow you to do it. Consider std::vector: Sure, there is the safe element access via function at, which does range checking for you. But if you know that your indices are in range (e. g. in a for loop), these range checks are just dead weight. So you additionally get the (less safe) operator[] which just omits these checks.
Similarly, if you have a pointer of type Base, it could, in reality, point to an object of type Derived. If in doubt, you would dynamic_cast from Base* to Derived*. But this comes with some overhead. But if you know 100% for sure (by whatever means...) what the sub class actually is, would you want this overhead? As there is a natural (even implicit!) way from Derived* to Base*, we want to have some low-cost way back.
On the other hand, there is no such natural cast between pointers of totally unrelated types (such as char and int or two unrelated classes) and thus no such low-cost way back (compared to dynamic_cast, which isn't available either, of course). Only way to transform in between is reinterpret_cast.
Actually, reinterpret_cast comes with no cost either, it just interprets the pointer as a different type – with all risks! And a reinterpret_cast even can fail, if instead a static_cast would have been required (right to prevent the question "why not just always use ..."):
class A { int a; };
class B { };
class C : public A, public B { };
B* b = new C();
C* c = reinterpret_cast<C*>(b); // FAILING!!!
From view of memory layout, C looks like this (even if hidden away from you):
class C
{
A baseA;
B baseB; // this is what pointer b will point to!
};
Obviously, we'll get an offset when casting between C* and B* (either direction), which is considered by both static_cast and dynamic_cast, but not by reinterpret_cast...
Why does static cast allow an upcast ...
There is no reason to prevent upcast. In fact, a derived pointer is even implicitly convertible to a base pointer - no cast is necessary (except in convoluted cases where there are multiple bases of same type). A derived class object always contains a base class sub object.
Upcasting is particularly useful because it allows runtime polymorphism through the use of virtual functions.
or downcast between pointers to objects derived or base as below
A base pointer may point to a base sub object of a derived object, as a consequence of an upcast. Like for example here:
Derived d;
Base *b = &d;
There are cases where you may want to gain access to members of the derived object that you know is being pointed at. Static cast makes that possible without the cost of run time type information.
It is not possible in general for the compiler to find out (at compile time) the concrete type of the pointed object (i.e. whether the pointer points to a sub object and if it does, what is the type of the container object). It is the responsibility of the programmer to ensure that the requirements of the cast are met. If the programmer cannot prove the correctness, then writing the static cast is a bug.
That's because what you're trying to do is a reinterpretation - for which there's a reinterpret_cast<> operator. static_cast<> for pointers is only used for down-casting:
If new_type is a pointer or reference to some class D and the type of expression is a pointer or reference to its non-virtual base B, static_cast performs a downcast. This downcast is ill-formed if B is ambiguous, inaccessible, or virtual base (or a base of a virtual base) of D. Such static_cast makes no runtime checks to ensure that the object's runtime type is actually D, and may only be used safely if this precondition is guaranteed by other means, such as when implementing static polymorphism. Safe downcast may be done with dynamic_cast.
See:
When should static_cast, dynamic_cast, const_cast and reinterpret_cast be used?
for a detailed discussion of when to use each casting operator.
I recently had a problem with casts and multiple inheritance: I needed to cast a Base* to Unrelated*, because a specific Derived class derives the Unrelated class.
This is a short example:
#include <iostream>
struct Base{
virtual ~Base() = default;
};
struct Unrelated{
float test = 111;
};
struct Derived : Base,Unrelated{};
int main(){
Base* b = new Derived;
Unrelated* u1 = (Unrelated*)b;
std::cout << u1->test << std::endl; //outputs garbage
Unrelated* y = dynamic_cast<Unrelated*>(b);
std::cout << y->test << std::endl; //outputs 111
}
The first cast clearly doesnt work, but the second one did work.
My question is: Why did the second cast work? Shouldnt dynamic_cast only work for cast to a related class type? I thought there wasnt any information about Unrelated at runtime because it is not polymorphic.
Edit: I used colirus gcc for the example.
dynamic_cast works because the dynamic type of object pointed to by a pointer to its base class is related to Unrelated.
Keep in mind that dynamic_cast requires a virtual table to inspect the inheritance tree of the object at run-time.
The C-style cast (Unrelated*)b does not work because the C-style cast does const_cast, static_cast, reinterpret_cast and more, but it does not do dynamic_cast.
I would suggest avoiding C-style casts in C++ code because they do so many things as opposed to precise C++ casts. My colleagues who insist on using C-style cast still occasionally get them wrong.
The first cast (Unrelated*)b doesn't work because you're treating the Base class sub-object, containing probably just a vtable pointer, as an Unrelated, containing a float.
Instead you can cast down and up, static_cast<Unrelated*>( static_cast<Derived*>( b ) ).
And this is what dynamic_cast does for you, since Base is a polymorphic type (at least one virtual method) which allows dynamic_cast to inspect the type of the most derived object.
In passing, dynamic_cast<void*>( b ) would give you a pointer to the most derived object.
However, since you know the types there's no need to invoke the slight overhead of a dynamic_cast: just do the down- and up-casts.
Instead of the C style cast (Unrelated*)b you should use the corresponding C++ named cast or casts, because C style casts of pointers can do things you'd not expect, and because the effect can change completely when types are changed during maintenance.
The C style cast will maximum do 2 C++ named casts. In this case the C style cast corresponds to a reinterpret_cast. The compiler will not allow any other named cast here.
Which is a warning sign. ;-)
In contrast, the down- and up-casts are static_casts, which usually are benign casts.
All that said, the best is to almost completely avoid casts by using the top secret technique:
Don't throw away type information in the first place.
I.e., in the example code, just use Derived* as the type of the pointer.
With multiple inheritance, the Derived object consists of two sub-objects, one Base and one Unrelated. The compiler knows how to access whichever part of the object it needs, typically by adding an offset to the pointer (but that's an implementation detail). It can only do that when it knows the actual type of a pointer. By using a C-style cast, you've told the compiler to ignore the actual type and treat that pointer value as a pointer to something else. It no longer has the information necessary to properly access the sub-object you desire, and it fails.
dynamic_cast allows the compiler to use run-time information about the object to locate the proper sub-object contained within it. If you were to output or examine the pointer values themselves, you'd see that they are different.
Here is some code that illustrates the question:
#include <iostream>
class Base {
};
class Derived : public Base {
};
void doThings(Base* bases[], int length)
{
for (int i = 0; i < length; ++i)
std::cout << "Do ALL the things\n";
}
int main(int argc, const char * argv[])
{
Derived* arrayOfDerived[2] = { new Derived(), new Derived() };
doThings(arrayOfDerived, 2); // Candidate function not viable: no known conversion from 'Derived *[2]' to 'Base **' for 1st argument
// Attempts to work out the correct cast
Derived** deriveds = &arrayOfDerived[0];
Base** bases = dynamic_cast<Base**>(deriveds); // 'Base *' is not a class
Base** bases = dynamic_cast<Base**>(arrayOfDerived); // 'Base *' is not a class
// Just pretend that it should work
doThings(reinterpret_cast<Base**>(arrayOfDerived), 2);
return 0;
}
Clang produces the errors given in the comments. The question is: "Is there a correct way to cast arrayOfDerived to something that doThings can take?
Bonus marks:
Why does clang produce the errors "'Base *' is not a class" on the given lines? I know that Base* isn't a class it's a pointer. What is the error trying to tell me? Why has dynamic_cast been designed so that in dynamic_cast<T*> the thing T must be a class?
What are the dangers of using the reinterpret_cast to force everything to work?
Thanks as always :)
No. What you’re asking for here is a covariant array type, which is not a feature of C++.
The risk with reinterpret_cast, or a C-style cast, is that while this will work for simple types, it will fail miserably if you use multiple or virtual inheritance, and may also break in the presence of virtual functions (depending on the implementation). Why? Because in those cases a static_cast or dynamic_cast may actually change the pointer value. That is, given
class A {
int a;
};
class B {
string s;
};
class C : public A, B {
double f[4];
};
C *ptr = new C();
unless the classes are empty, ptr == (A *)ptr and/or ptr == (B *)ptr will be false. It only takes a moment to work out why that must be the case; if you're using single inheritance and there is no vtable, it’s guaranteed that the layout of a subclass is the same as the layout of the superclass, followed by member variables defined in the subclass. If, however, you have multiple inheritance, the compiler must choose to lay the class out in some order or other; I'm not sure what the C++ standard has to say about it (you could check - it might define or constrain the layout order somehow), but whatever order it picks, it should be apparent that one of the casts must result in a change the pointer value.
The first error in your code is that you are using a syntax that should be avoided:
void doThings(Base* bases[], int length)
if you look at the error message, it tells you that bases is actually a Base**, and that is also what you should write if you really want to pass such a pointer.
The second problem has to do with type safety. Imagine this code, where I reduced the array of pointers to a single pointer:
class Base {...};
class Derived1: public Base {...};
class Derived2: public Base {...};
Derived1* p1 = 0;
Base*& pb = p1; // reference to a pointer
pb = new Derived2(); // implicit conversion Derived -> Base
If this code compiled, p1 would now suddenly point to a Derived2! The important difference to a simple conversion of a pointer-to-derived to a pointer-to-base is that we have a reference here, and the pointer in your case is no different.
You can use two variations that work:
Base* pb = p1; // pointer value conversion
Base* const& pb = p1; // reference-to-const pointer
Applied to your code, this involves copying the array of pointers-to-derived to an array of pointers-to-base, which is probably what you want to avoid. Unfortunately, the C++ type system doesn't provide any different means to achieve what you want directly. I'm not exactly sure about the reason, but I think that at least according to the standard pointers can have different sizes.
There are two things I would consider:
Convert doThings() to a function template taking two iterators. This follows the spirit of the STL and allows calls with different pointer types or others that provide an iterator interface.
Just write a loop. If the body of the loop is large, you could also consider extracting just that as a function.
In answer to your main question, the answer is not really,
because there's no way of reasonably implementing it. There's
no guarantee that the actual addresses in the Base* will be
the same as those in the Derived*, and in fact, there are many
cases where they aren't. In the case of multiple inheritance,
it's impossible that both bases have the same address as the
derived, because they must have different addresses from each
other. If you have an array of Derived*, whether it be
std::vector<Derived*> or a Derived** pointing to the first
element of an array, the only way to get an array of Base* is
by copying: using std::vector<Derived*>, this would look
something like:
std::vector<Base*> vectBase( vectDerived.size() );
std::transform(
vectDerived.cbegin(),
vectDerived.cend(),
vectBase.begin(),
[]( Derived* ptr ) { return static_cast<Base*>( ptr ); } );
(You can do exactly the same thing with Derived** and
Base**, but you'll have to tweek it with the known lengths.)
As for your "bonus" questions:
Clang (an I suspect every other compiler in existance),
produces the error Base* is not a class because it isn't
a class. You're trying to dynamic_cast between Base**
and Derived**, the pointed to types are Base* and
Derived*, and dynamic_cast requires the pointed to types
to be classes.
The only danger with reinterpret_cast is that it won't
work. You'll get undefined behavior, which will possibly
work in a few simple cases, but won't work generally. You'll
end up with something that the compiler thinks is a Base*,
but which doesn't physically point to a Base*.
I would like to cast a pointer to a member of a derived class to void* and from there to a pointer of the base class, like in the example below:
#include <iostream>
class Base
{
public:
void function1(){std::cout<<"1"<<std::endl;}
virtual void function2()=0;
};
class Derived : public Base
{
public:
virtual void function2(){std::cout<<"2"<<std::endl;}
};
int main()
{
Derived d;
void ptr* = static_cast<void*>(&d);
Base* baseptr=static_cast<Base*>(ptr);
baseptr->function1();
baseptr->function2();
}
This compiles and gives the desired result (prints 1 and 2 respectively), but is it guaranteed to work? The description of static_cast I found here: http://en.cppreference.com/w/cpp/language/static_cast
only mentions conversion to void* and back to a pointer to the same class (point 10).
In the general case, converting a base to void to derived (or vice versa) via static casting is not safe.
There will be cases where it will almost certainly work: if everything involved is a pod, or standard layout, and only single inheritance is involved, then things should be fine, at least in practice: I do not have chapter and verse from the standard, but the general idea is that the base in that case is guaranteed to be the prefix of the derived, and they will share addresses.
If you want to start seeing this fail, mix in virtual inheritance, multiple inheritance (both virtual and not), and multiple implementation inheritance that are non trivial. Basically when the address of the different type views of this differ, the void cast from and back to a different type is doomed. I have seen this fail in practice, and the fact it can fail (due to changes in your code base far away from the point of casting) is why you want to be careful about always casting to and from void pointer with the exact same type.
In general, no, it is not safe.
Suppose that casting Derived* directly to Base* results in a different address (for example, if multiple inheritance or virtual inheritance is involved).
Now if you inserted a cast to void* in between, how would the compiler know how to convert that void* to an appropriate Base* address?
If you need to cast a Derived* to a void*, you should explicitly cast the void* back to the original Derived* type first. (And from there, the cast from Derived* to Base* is implicit anyway, so you end up with the same number of casts, and thus it's not actually less any convenient.)
From the link you supplied yourself
9) A pointer to member of some class D can be upcast to a pointer to member of its base class B. This static_cast makes no checks to ensure the member actually exists in the runtime type of the pointed-to object.
Meaning as long as you know that the upcast is safe before you do it it is guaranteed to work. That's why you should be using dynamic_cast which returns nullptr if unsuccessful.
Think of it this way if you have.
Type * t1;
static_cast of t1 to one of the classes deriving from it cannot be known at compile time, without in depth analysis of your program (which obviously it is not and should not be doing), so even if it ends up being correct you have no way of checking. dynamic_cast does extra work at runtime to check if the conversion was successful hence the prefix dynamic.
I've recently came across this strange function in some class:
void* getThis() {return this;}
And later in the code it is sometimes used like so: bla->getThis() (Where bla is a pointer to an object of the class where this function is defined.)
And I can't seem to realize what this can be good for. Is there any situation where a pointer to an object would be different than the object's this (where bla != bla->getThis())?
It seems like a stupid question but I wonder if I'm missing something here..
Of course, the pointer values can be different! Below an example which demonstrates the issue (you may need to use derived1 on your system instead of derived2 to get a difference). The point is that the this pointer typically gets adjusted when virtual, multiple inheritance is involved. This may be a rare case but it happens.
One potential use case of this idiom is to be able to restore objects of a known type after storing them as void const* (or void*; the const correctness doesn't matter here): if you have a complex inheritance hierarchy, you can't just cast any odd pointer to a void* and hope to be able to restore it to its original type! That is, to easily obtain, e.g., a pointer to base (from the example below) and convert it to void*, you'd call p->getThis() which is a lot easier to static_cast<base*>(p) and get a void* which can be safely cast to a base* using a static_cast<base*>(v): you can reverse the implicit conversion but only if you cast back to the exact type where the original pointer came from. That is, static_cast<base*>(static_cast<void*>(d)) where d is a pointer to an object of a type derived from base is illegal but static_cast<base*>(d->getThis()) is legal.
Now, why is the address changing in the first place? In the example base is a virtual base class of two derived classes but there could be more. All subobjects whose class virtually inherits from base will share one common base subject in object of a further derived class (concrete in the example below). The location of this base subobject may be different relative to the respective derived subobject depending on how the different classes are ordered. As a result, the pointer to the base object is generally different from the pointers to the subobjects of classes virtually inheriting from base. The relevant offset will be computed at compile-time, when possible, or come from something like a vtable at run-time. The offsets are adjusted when converting pointers along the inheritance hierarchy.
#include <iostream>
struct base
{
void const* getThis() const { return this; }
};
struct derived1
: virtual base
{
int a;
};
struct derived2
: virtual base
{
int b;
};
struct concrete
: derived1
, derived2
{
};
int main()
{
concrete c;
derived2* d2 = &c;
void const* dptr = d2;
void const* gptr = d2->getThis();
std::cout << "dptr=" << dptr << " gptr=" << gptr << '\n';
}
No. Yes, in limited circumstances.
This looks like it is something inspired by Smalltalk, in which all objects have a yourself method. There are probably some situations in which this makes code cleaner. As the comments note, this looks like an odd way to even implement this idiom in c++.
In your specific case, I'd grep for actual usages of the method to see how it is used.
Your class can have custom operator& (so &a may not return this of a). That's why std::addressof exists.
I ran across something like this many (many many) years ago. If I recall correctly, it was needed when a class is manipulating other instances of the same class. One example might be a container class that can contain its own type/(class?).
That might be a way to override the this keyword.
Lets say that you have a memory pool, full initialized at the start of your program, for instance you know that at any time you can deal with a max of 50 messages, CMessage.
You create a pool at the size of 50 * sizeof(CMessage) (what ever this class might be), and CMessage implements the getThis function.
That way instead of overriding the new keyword you just override the "this", accessing the pool.
It can also mean that the object might be defined on different memory spaces, lets say on a SRAM, in boot mode, and then on a SDRAM.
It might be that the same instance will return different values for getThis through the program in such a situation, on purpose of course, when overriden.