[embedded C person here, boggled by embedded C++]
There is a class that derives from another, as follows:
class DerivedThing: public BaseThing {
...
}
There is some code, which I have no control over, which takes in a pointer to DerivedThing and, eventually, casts that pointer to BaseThing. I'm led to believe this works. Anyway, I need to implement my own DerivedThing but it needs to subclass some other stuff:
class MyDerivedThing: public BaseThing, public AnotherThing {
...
}
What do I need to do to make sure that the cast will still work? I'm asking because in my case the wrong functions are most definitely being invoked.
EDIT: The cast in the code I have no control over is:
int setInterface(void* context)
{
interface[0] = (BaseThing *) context;
...
I was overly pessimistic while commenting!
Casting to BaseThing* yourself, and passing that to the function taking void*, should work.
The problem with just passing the address of derived object, as-is, is that its address gets implicitly converted to a void*. As a result, all information about its type and the involved class hierarchy is lost.
The receiving function then assumes the now-void* pointer pointed directly at a BaseThing. This is where things start to go loopy, manifesting as symptoms like the wrong functions being called - if you're lucky - because assuming that the derived object and any particular base subobject have the same address is not (always/reliably) a valid assumption where multiple and/or virtual inheritance are at play.
Intermission: It's not immediately clear why this quoted order of bases would cause a problem:
class MyDerivedThing: public BaseThing, public AnotherThing {
...but there are various possibilities. For example, if all three classes here have no virtual methods, then there shouldn't be a problem. But, for example, if BaseThing was not virtual but either of the other two is, the compiler might put a virtual table pointer at the top of the object, which will blow up anything that just takes that address and assumes there's a BaseThing there.
Anyway - by casting yourself before passing, the compiler can do a proper type-aware cast, performing any adjusting arithmetic that might be required, to the address of the BaseThing within your object - then pass that to the function. It still goes through the conversion to void* and back, but that'll now be guaranteed to represent the address of a BaseThing in the end.
I'd still question the author on why this function takes a void*. One of the key things about C++ is type safety. void* makes a mockery of that. Worse, they just cast said void* to a BaseThing* immediately anyway. So, why not just... take a BaseThing* in the first place? Then the compiler could implicitly perform the mentioned safe typecast at any call site, rather than making you do it.
Related
My base class is called Account while the derived class Businessaccount has an additional int variable called x, as well as a getter-method (int getx()) for it.
Is slicing supposed to occur in the following situation? It obviously happens in my case:
vector<shared_ptr<Account>> vec;
shared_ptr<Businessaccount> sptr = make_shared<Businessaccount>();
vec.push_back(sptr);
After that, if I do this:
(*vec.at(0)).getx();
it says that class<Account> has no member named getx()!
I'd be thankful if somebody would tell me why this occurs and how to fix it.
(*vec.at(0)) is going to return an Account which doesn't know about x. You need to cast the Account pointer into a Businessaccount to reference that member.
shared_ptr<BusinessAccount> bA = dynamic_pointer_cast<BusinessAccount>(vec.at(0));
bA->getx();
No, slicing does not happen in this situation, your pointer is just converted to pointer to a base class ie your data is the same, just type of pointer is changed. When slicing happens you loose all data of derived class.
To solve the issue you either need to provide virtual method in base class that would be properly implemented in Businessaccount or use dynamic_cast or static_cast (if you are sure that object has type Businessaccount by different matter). Though using such cast is usually sign of not well designed program.
In C++ the static and dynamic type of an object pointed to can differ.
Your static type of what the shared pointers in that vec point to is Account.
The dynamic type of what the shared pointers in that vec point to varies. In your case, you put a Businessaccount in it.
When you want to access or call methods, you are given access to only the static type methods.
The static type is what you have proven to the compiler the type contains at that line.
If you know better, you can do a static_cast<Businessaccount*>(vec.at(0).get())->getx(). By doing so you are promising to the compiler that you have certain knowledge that the data at that location is actually a Businessaccount. If you are wrong, your program's behavior is undefined (if you are lucky you get a crash).
You can also use RTTI (run time type information) to ask if a particular object is a particular sub-type (in some cases, where the base class has a virtual method).
Account* paccount = vec.at(0).get();
Businessaccount* pba = dynamic_cast<Businessaccount*>(paccount);
if (pba)
pba->getx();
the above checks if the paccount is actually a Businessaccount*, and if so calls getx on it. If not, it does nothing.
Often dynamic casting is a sign you didn't design your object use properly; having to drill down past the interface into which implementation means maybe your interface wasn't rich enough.
In some scripting and bytecode compiled languages, they let you go off and call getx and proceed to crash/throw an exception/etc if that method isn't isn't there.
C++ instead lets you use what you have claimed to be there (via the type system), then lets you write your own handler if you want dynamic type checking.
I have looked at the related questions such as here and here about this topic, and they all describe object slicing, but none of them address whether it is safe, reliable and predictable.
Is there a guarantee from either the standard or the compiler, that if I pass a subclass object by value to a method which wants a superclass, the parts that are sliced off are exactly the members of the subclass, and I will be able to use the sliced superclass object without any concerns about undefined behavior?
Yes, it is safe, reliable, and predictable, because it is well defined by the standard (it will just copy construct a base class object from the derived class object).
But no, it is not safe, it should not be relied on, and generally be treated as unpredictable, because your readers won't know what's going on. This will trigger a lot of bugs when others try to modify your code later (including your own future self). It is basically a no-no, much in the same way as the goto statement, which is perfectly well defined, reliable, and predictable as well.
When they say "parts are sliced off" they do not mean that these parts somehow "disappear": all they mean that these parts are not copied into the object presented to your function for the corresponding parameter. In this sense, object slicing is not dangerous or poorly defined.
What happens there is rather straightforward: in order to construct the value of the parameter that you are passing by value, the object of the derived class used for the actual parameter is given to the constructor of the base class to make a copy. Once the copy constructor has completed its work, you have an object of the base class.
At this point you have a fully functioning object of the base class. The compiler guards you against accepting a class with pure virtual members by value, so you wouldn't be able to slice your object into an instance with missing pure virtual functions.
A more important question is whether you want the slicing behavior to happen implicitly: there is nothing the compiler does here that you wouldn't be able to do in your code:
void foo(bar b) {
... // Payload logic
}
gives you the same functionality as
void foo(const bar &r) {
bar b(r);
... // Payload logic
}
With the first snippet, it is very easy to miss the fact that an ampersand is missing after the name of a type, leading the readers to think that the polymorphic behavior is retained, while it is, in fact, lost. The second snippet is easier to understand to people who maintain your code, because it makes a copy explicitly.
I was just reading a new C++ challenge:
http://blogs.msdn.com/b/vcblog/archive/2014/02/04/challenge-vulnerable-code.aspx
The supplied code if full of issues, some obvious to anybody with good programming habits, some visible only to C++ natives :-)
It is described in the comments, that one particular line (37) is particularly dangerous:
ImageFactory::DebugPrintDimensions((ImageFactory::CustomImage*)image);
the function then calls a virtual method of CustomImage (defined first time in CustomImage).
This allegedly causes the first member of CustomImage to be treated as the vptr of the instance (it's an unique_ptr actually) and make the binary pointed by it treated as executable (perhaps malicious) code..
While I can understand this, I wonder why does this really work.
CustomImage is a virtual class, so (probably) its first 4 bytes (just assume X86) are THE vptr, and the unique_ptr member is next.. And since the cast doesn't seem to be shifting anything...
... how would it be possible to execute data held by unique_ptr?
My take (and I'm more than happy to be corrected):
Here, CustomImage is a polymorphic class (with a vptr as the first "member" under the Windows ABI), but Imageis not. The order of the definitions means that the ImageFactory functions know that CustomImage is an Image, but main() does not.
So when the factory does:
Image* ImageFactory::LoadFromDisk(const char* imageName)
{
return new (std::nothrow) CustomImage(imageName);
}
the CustomImage* pointer is converted to an Image* and everything is fine. Because Image is not polymorphic, the pointer is adjusted to point to the (POD) Image instance inside the CustomImage -- but crutially, this is after the vptr, because that always comes first in a polymorphic class in the MS ABI (I assume).
However, when we get to
ImageFactory::DebugPrintDimensions((ImageFactory::CustomImage*)image);
the compiler sees a C-style cast from one class it knows nothing about to another. All it does is take the address it has and pretend it's a CustomImage* instead. This address in fact points to the Image within the custom image; but because Image is empty, and presumably the empty base class optimisation is in effect, it ends up pointing to the first member within CustomImage, i.e. the unique_ptr.
Now, ImageFactory::DebugPrintDimensions() assumes it's been handed a pointer to a fully-complete CustomImage, so that the address is equal to the address of the vptr. But it hasn't -- it's been handed the address of the unique_ptr, because at the point at which is was called, the compiler didn't know any better. So now it dereferences what it thinks is the vptr (really data we're in control of), looks for the offset of the virtual function and blindly exectutes that -- and now we're in trouble.
There are a couple of things that could have helped mitigate this. Firstly, since we're manipulating a derived class via a base class pointer, Image should have a virtual destructor. This would have made Image polymorphic, and in all probability we wouldn't have had a problem (and we wouldn't leak memory, either).
Secondly, because we're casting from base-to-derived, dynamic_cast should have been used rather than the C style cast, which would have involved a run-time check and correct pointer adjustment.
Lastly, if the compiler had all the information to hand when compiling main(), it might have been able to warn us (or performed the cast correctly, adjusting for the polymorphic nature of CustomImage). So moving the class definitions above main() is recommended too.
Presumably, the memory layout is such that the vptr comes before the base sub-object, like this:
class CustomImage {
void * __vptr;
Image __base; // empty
unique_ptr<whatever> evil;
};
This means that a valid conversion from Image* to CustomImage* requires subtracting a few bytes from the pointer. However, the evil cast you've posted comes before the class definitions, so it doesn't know how to correctly adjust the pointer. Instead, it acts like reinterpret_cast, simply pretending that the pointer points to CustomImage without adjusting its value.
Now, since the base class is empty, the pointer inside unique_ptr will be misinterpreted as the vptr. This points to another pointer, which will be misinterpreted as the vtable's pointer to the first virtual member function. This in turn points to data loaded from the file, which will be executed as code when that virtual function is called. As the icing on the cake, the memory protection flags are loaded from the file, and not adjusted to prevent execution.
Some lessons here are:
Avoid C-style casts, especially on pointer or reference types. They fall back to reinterpret_cast, leading to a minefield of undefined behaviour, if the conversion isn't valid. (To compound the evil, their syntax is also ungreppable, and easy to miss if you don't read the code too carefully.)
Avoid non-polymorphic base classes. As well as being conceptually dubious, and making deletion more awkward and error-prone, it can do surprising things to the memory layout as we see here. If the base class were polymorphic, we could use dynamic_cast (or avoid the cast altogether by providing suitable virtual functions), with no possibility of an invalid conversion.
Avoid unnecessary levels of indirection - there's no particular need for m_imageData to be a pointer.
Never put user data in executable memory.
I dont see what the following macro is doing? If anyone can help me see it it would be appreciated.
#define BASE_OFFSET(ClassName,BaseName)\
(DWORD(static_cast < BaseName* >( reinterpret_cast\
< ClassName* >(Ox10000000)))-Ox10000000)
If anyone is curious to know where it is coming from, it comes out of the 3rd chapter of Don Box Book Essential COM where he is building a QueryInterface function using interface tables and the above macro is somehow used to find the pointer to the interface vtable of the class, where class is the ClassName implementing the BaseName, although I dont know how it is doing that.
It tells to the compiler: "imagine there a ClassName object at 0x10000000. Where would the BaseName data begin in that object, relative to 0x10000000"?
Think of a memory layout of a class object with multiple bases:
class A: B, C{};
In the memory block that constitutes an A object, there's the chunk of data that belong to B, also a chunk of data that belongs to C, and the data that are specific to A. Since the address of at least one base's data cannot be the same as the address of the class instance as a whole, the numeric value of the this pointer that you pass to different methods needs to vary. The macro retrieves the value of the difference.
EDIT: The pointer to the vtable is, by convention, the first data member in any class with virtual functions. So by finding the address of the base data, one finds the address of its vtable pointer.
Now, about the type conversion. Normally, when you typecast pointers, the operation is internally trivial - the numeric value of the address does not depend on what type does it point to; the very notion of datatype only exists on the C level. There's one important exception though - when you cast object pointers with multiple inheritance. As we've just discussed, the this pointer that you need to pass to a base class method might be numerically different from the one of the derived object's.
So the distinction between static_cast and reinterpret_cast captures this difference neatly. When you use reinterpret_cast, you're telling the compiler: "I know better. Take this numeric value and interpret it as a pointer to what I say". This is a deliberate subversion of the type system, dangerous, but occasionally necessary. This kind of cast is by definition trivial - cause you say so.
By "trivial" I mean - the numeric value of the pointer does not change.
The static_cast is a more high level construct. In this particular case, you're casting between an object and its base. That's a reasonable, safe cast under C++ class rules - BUT it might be numerically nontrivial. That's why the macro uses two different typecasts. static_cast does NOT violate the type system.
To recap:
reinterpret_cast<ClassName* >(OxlOOOOOOO)
is an unsafe operation. It returns a bogus pointer to a bogus object, but it's OK because we never dereference it.
static_cast<BaseName*>(...)
is a safe operation (with an unsafe pointer, the irony). It's the part where the nontrivial pointer typecast happens.
(DWORD(...)-OxlOOOOOOO)
is pure arithmetic. That's where the unsafety doubles back on itself: rather than use the pointer as a pointer, we cast it back to an integer and forget that it ever was a pointer.
The last stage could be equivalently rephrased as:
((char*)(...)-(char*)OxlOOOOOOO)
if that makes more sense.
Remark about magic 0x10000000 constant.
If that constant will be 0, GCC will show warning -Winvalid-offset-of (if it is enabled, of course). Maybe other compilers do something like that.
I am reviewing C++ casts operator and I have the following doubt:
for polymorphic classes
I I should use polymorphic_cast
I should never use of static_cast since down-casting might carry to undefined behavior. The code compiles this case anyway.
Now suppose that I have the following situtation
class CBase{};
class CDerived : public CBase{};
int main( int argc, char** argv ){
CBase* p = new CDerived();
//.. do something
CDerived*pd = static_cast<CDerived*>( p );
}
Since there is no polymorphism involved I will not use polymorphic_cast and the code will not even compile.
If at some point, someone introduces some virtual functions in the inheritance tree and I am now aware of it so I am in danger: how can I realize it?
I should move to polymorphic_cast to avoid any risk, but the code will be still compiling without any notification.
What do you do to realize about such kind of changes or prevent these case?
Thanks
AFG
Background you didn't include - boost has polymorphic_cast as a wrapper around dynamic_cast<> that throws when the cast fails. static_cast<> is fine if you're certain that the data is of the type you're casting to... there is no problem with or without virtual members, and the code you include saying it won't compile will compile and run just fine as is.
I guess you're thinking about the possibility to accidentally cast to another derived class? That's the utility/danger of casting, isn't it? You can add a virtual destructor then use dynamic_cast<>, as strictly speaking RTTI is only available for types with one or more virtual functions.
Code written with static_cast<> will continue to handle the same type safely irrespective of the introduction of virtual functions... it's just that if you start passing that code other types (i.e. not CDerived or anything publicly derived therefrom) then you will need the dynamic_cast<> or some other change to prevent incompatible operations.
While you deal with pointer p (of type CBase*) the pointed object will be treated as a CBase, but all virtual functions will do the right thing. Pointer pd treats the same object as a CDerived. Upcasting in this way is dangerous since, if the object is not derived from the upcasted type, any extra member data for the upcasted object will be missing (meaning you'll be poking around in some other data), and virtual function lookup will be all messed up. This is the opposite to downcasting (as you've tagged this question) where you might get slicing.
To avoid this you need to change your programming style. Treating the same object as two different types is a dubious practice. C++ can be very good at enforcing type safety, but it will let you get away with nasty things if you really want to, or just don't know better. If you are wanting to do different things depending upon an object type, and can't do it through virtual functions (such as through double dispatch), you should look more thoroughly into RTTI (look here, or see some good examples here).
polymorphic_cast is not defined in C++. Are you thinking about dynamic_cast?
Anyway, you can not do anything to prevent it.