I dont see what the following macro is doing? If anyone can help me see it it would be appreciated.
#define BASE_OFFSET(ClassName,BaseName)\
(DWORD(static_cast < BaseName* >( reinterpret_cast\
< ClassName* >(Ox10000000)))-Ox10000000)
If anyone is curious to know where it is coming from, it comes out of the 3rd chapter of Don Box Book Essential COM where he is building a QueryInterface function using interface tables and the above macro is somehow used to find the pointer to the interface vtable of the class, where class is the ClassName implementing the BaseName, although I dont know how it is doing that.
It tells to the compiler: "imagine there a ClassName object at 0x10000000. Where would the BaseName data begin in that object, relative to 0x10000000"?
Think of a memory layout of a class object with multiple bases:
class A: B, C{};
In the memory block that constitutes an A object, there's the chunk of data that belong to B, also a chunk of data that belongs to C, and the data that are specific to A. Since the address of at least one base's data cannot be the same as the address of the class instance as a whole, the numeric value of the this pointer that you pass to different methods needs to vary. The macro retrieves the value of the difference.
EDIT: The pointer to the vtable is, by convention, the first data member in any class with virtual functions. So by finding the address of the base data, one finds the address of its vtable pointer.
Now, about the type conversion. Normally, when you typecast pointers, the operation is internally trivial - the numeric value of the address does not depend on what type does it point to; the very notion of datatype only exists on the C level. There's one important exception though - when you cast object pointers with multiple inheritance. As we've just discussed, the this pointer that you need to pass to a base class method might be numerically different from the one of the derived object's.
So the distinction between static_cast and reinterpret_cast captures this difference neatly. When you use reinterpret_cast, you're telling the compiler: "I know better. Take this numeric value and interpret it as a pointer to what I say". This is a deliberate subversion of the type system, dangerous, but occasionally necessary. This kind of cast is by definition trivial - cause you say so.
By "trivial" I mean - the numeric value of the pointer does not change.
The static_cast is a more high level construct. In this particular case, you're casting between an object and its base. That's a reasonable, safe cast under C++ class rules - BUT it might be numerically nontrivial. That's why the macro uses two different typecasts. static_cast does NOT violate the type system.
To recap:
reinterpret_cast<ClassName* >(OxlOOOOOOO)
is an unsafe operation. It returns a bogus pointer to a bogus object, but it's OK because we never dereference it.
static_cast<BaseName*>(...)
is a safe operation (with an unsafe pointer, the irony). It's the part where the nontrivial pointer typecast happens.
(DWORD(...)-OxlOOOOOOO)
is pure arithmetic. That's where the unsafety doubles back on itself: rather than use the pointer as a pointer, we cast it back to an integer and forget that it ever was a pointer.
The last stage could be equivalently rephrased as:
((char*)(...)-(char*)OxlOOOOOOO)
if that makes more sense.
Remark about magic 0x10000000 constant.
If that constant will be 0, GCC will show warning -Winvalid-offset-of (if it is enabled, of course). Maybe other compilers do something like that.
Related
I was just reading a new C++ challenge:
http://blogs.msdn.com/b/vcblog/archive/2014/02/04/challenge-vulnerable-code.aspx
The supplied code if full of issues, some obvious to anybody with good programming habits, some visible only to C++ natives :-)
It is described in the comments, that one particular line (37) is particularly dangerous:
ImageFactory::DebugPrintDimensions((ImageFactory::CustomImage*)image);
the function then calls a virtual method of CustomImage (defined first time in CustomImage).
This allegedly causes the first member of CustomImage to be treated as the vptr of the instance (it's an unique_ptr actually) and make the binary pointed by it treated as executable (perhaps malicious) code..
While I can understand this, I wonder why does this really work.
CustomImage is a virtual class, so (probably) its first 4 bytes (just assume X86) are THE vptr, and the unique_ptr member is next.. And since the cast doesn't seem to be shifting anything...
... how would it be possible to execute data held by unique_ptr?
My take (and I'm more than happy to be corrected):
Here, CustomImage is a polymorphic class (with a vptr as the first "member" under the Windows ABI), but Imageis not. The order of the definitions means that the ImageFactory functions know that CustomImage is an Image, but main() does not.
So when the factory does:
Image* ImageFactory::LoadFromDisk(const char* imageName)
{
return new (std::nothrow) CustomImage(imageName);
}
the CustomImage* pointer is converted to an Image* and everything is fine. Because Image is not polymorphic, the pointer is adjusted to point to the (POD) Image instance inside the CustomImage -- but crutially, this is after the vptr, because that always comes first in a polymorphic class in the MS ABI (I assume).
However, when we get to
ImageFactory::DebugPrintDimensions((ImageFactory::CustomImage*)image);
the compiler sees a C-style cast from one class it knows nothing about to another. All it does is take the address it has and pretend it's a CustomImage* instead. This address in fact points to the Image within the custom image; but because Image is empty, and presumably the empty base class optimisation is in effect, it ends up pointing to the first member within CustomImage, i.e. the unique_ptr.
Now, ImageFactory::DebugPrintDimensions() assumes it's been handed a pointer to a fully-complete CustomImage, so that the address is equal to the address of the vptr. But it hasn't -- it's been handed the address of the unique_ptr, because at the point at which is was called, the compiler didn't know any better. So now it dereferences what it thinks is the vptr (really data we're in control of), looks for the offset of the virtual function and blindly exectutes that -- and now we're in trouble.
There are a couple of things that could have helped mitigate this. Firstly, since we're manipulating a derived class via a base class pointer, Image should have a virtual destructor. This would have made Image polymorphic, and in all probability we wouldn't have had a problem (and we wouldn't leak memory, either).
Secondly, because we're casting from base-to-derived, dynamic_cast should have been used rather than the C style cast, which would have involved a run-time check and correct pointer adjustment.
Lastly, if the compiler had all the information to hand when compiling main(), it might have been able to warn us (or performed the cast correctly, adjusting for the polymorphic nature of CustomImage). So moving the class definitions above main() is recommended too.
Presumably, the memory layout is such that the vptr comes before the base sub-object, like this:
class CustomImage {
void * __vptr;
Image __base; // empty
unique_ptr<whatever> evil;
};
This means that a valid conversion from Image* to CustomImage* requires subtracting a few bytes from the pointer. However, the evil cast you've posted comes before the class definitions, so it doesn't know how to correctly adjust the pointer. Instead, it acts like reinterpret_cast, simply pretending that the pointer points to CustomImage without adjusting its value.
Now, since the base class is empty, the pointer inside unique_ptr will be misinterpreted as the vptr. This points to another pointer, which will be misinterpreted as the vtable's pointer to the first virtual member function. This in turn points to data loaded from the file, which will be executed as code when that virtual function is called. As the icing on the cake, the memory protection flags are loaded from the file, and not adjusted to prevent execution.
Some lessons here are:
Avoid C-style casts, especially on pointer or reference types. They fall back to reinterpret_cast, leading to a minefield of undefined behaviour, if the conversion isn't valid. (To compound the evil, their syntax is also ungreppable, and easy to miss if you don't read the code too carefully.)
Avoid non-polymorphic base classes. As well as being conceptually dubious, and making deletion more awkward and error-prone, it can do surprising things to the memory layout as we see here. If the base class were polymorphic, we could use dynamic_cast (or avoid the cast altogether by providing suitable virtual functions), with no possibility of an invalid conversion.
Avoid unnecessary levels of indirection - there's no particular need for m_imageData to be a pointer.
Never put user data in executable memory.
I had always thought that checking the pointer after casting a void* to a struct* was a valid way to avoid invalid casts. Something like
MyStructOne* pStructOne = (MyStructOne*)someVoidPointer;
if(!pStructOne)
return 0;
It appears that this is not the case as I can cast the same data to two different structs and get the same valid address. The program is then happy to populate my struct fields with whatever random data is in there.
What is a safe way of casting struct pointers?
I can't use dynamic_cast<> as it's not a class.
Thanks for the help!
If you have any control over the struct layout you can put your own type enumeration at the front of every struct to verify the type. This works in both C and C++.
If you can't use an enumeration because not all types are known ahead of time, you can use a GUID. Or a pointer to static variable or member that is unique per struct.
You can use dynamic_cast with structs or classes, as long as it has a virtual method. I would suggest you redesign your broader system to not have void*s anywhere. It's very bad practice/design.
There is no "safe way of casting" in general, because casting pointers is inherently an unsafe procedure. Casting says that you know better than the type system, so you can't expect the type system to be of any help after you started casting pointers.
In C++, you should never use C-style casts (like (T) x), and instead use the C++ casts. Now a few simple rules let you determine whether casting a pointer or reference is OK:
If you const_cast in the bad direction and modify the object, you must be sure that the object is actually mutable.
You can only static_cast pointers or references within a polymorphic hierarchy or from/to void pointer. You must be sure that the dynamic type of the object is a subtype of the cast target, or in the case of void pointers that pointer is the address of an object of the correct type.
reinterpret_cast should only be used to or from a char * type (possibly signed or unsigned), or to convert a pointer to and from an (u)intptr_t.
In every case, it is your responsibility to ensure that the pointers or references in question refer to an object of the type that you claim in the cast. There is no check that anyone else can do for you to verify this.
The (C-style) cast you are using is compile-time operation - that is to say that the compiler generates instructions to modify the pointer to one thing so that it points to another.
With inheritance relationships, this is simply addition or subtraction from the pointer.
In the case of your code, the compiler generates precisely no code whatsoever. The cast merely serves to tell the compiler that you know what you're doing.
The compiler does not generate any code that checks the validity of your operation. If someVoidPointer is null, so will be pStructOne after the cast. \
Using a dynamic_cast<>() doesn't validate that the thing being casted is actually an object at all - it merely tells you that an object with RTTI is (or can be converted to) the type you expect. If it's not an object to start with, you'll most likely get a crash.
There isn't one. And frankly, there can't be.
struct is simply an instruction for the compiler to treat the next sizeof() bytes in a particular semantic fashion - nothing less, nothing more.
You can cast any pointer into any pointer - all that changes is how the compiler would interpret the contents.
Using dynamic_cast<> is the only way, but it invokes RTTI (run type type information) to consider the potential legality of the assignment. Yeah, it's no longer an reinterpret_cast<>
It sounds like you want to make sure the object passed as a void* to your function is really the type you expect. The best approach would be to declare the function prototype with MyStructOne* instead of void* and let the compiler do the type checking.
If you really are trying to do something more dynamic (as in different types of objects can be passed to your function) you need to enable RTTI. This will allow you to interrogate the passed in object and ask it what type it is.
What is a safe way of casting struct pointers?
First, try to avoid needing to do this in the first place. Use forward declarations for structs if you don't want to include their headers. In general, you should only need to hide the data type from the signature if a function could take multiple types of data. The example for something like that is a message passing system, where you want to be able to pass arbitrary data. The sender and receiver know what types they expect, but the message system itself doesn't need to know.
Assuming you have no other alternatives, use a boost::any. This is essentially a type-safe void*; attempts to cast it to the wrong type will throw an exception. Note that this needs RTTI to work (which you generally should have available).
Note that boost::variant is a possibility if there is a fixed, limited set of possible types that can be used.
Since you have to use void*, your options are:
create a single base class including a virtual destructor (and/or other virtual methods) and use that exclusively across the libev interface. Wrap the libev interface to enforce this, and only use the wrappers from your C++ code. Then, inside your C++ code, you can dynamic_cast your base class.
accept that you don't have any runtime information about what type your void* really points to, and just structure your code so you always know statically. That is, make sure you cast to the correct type in the first place.
use the void* to store a simple tag/cookie/id structure, and use that to look up your real struct or whatever - this is really just a more manual version of #1 though, and incurs an extra indirection to boot.
And the direct answer to
What is a safe way of casting struct pointers?
is:
cast to the correct type, or a type you know to be layout compatible.
There just isn't any substitute for knowing statically what the correct type is. You presumably passed something in as a void*, so when you get that void* back you should be able to know what type it was.
I'm writing a delegate class for educational purposes and have run into a little problem. The delegate must be able to call not only functions but also member methods of objects, which means that I need to store a pointer to a method:
void (classname::*methodPtr)(...);
And I need to store pointers to methods from different classes and with different argument lists. At first I just wanted to cast the method pointer to void *, but the compiler dies with an invalid cast error. Turns out that sizeof(methodPtr) == 8 (32-bit system here), but casts to unsigned long long also fail (same compiler error - invalid cast). How do I store the method pointer universally then?
I know it's not safe - I have other safety mechanisms, please just concentrate on my question.
You don't. You use run-time inheritance, if you need abstraction, and create a derived class which is templated on the necessities, or preferably, just create a plain old functor via the use of a function. Check out boost::bind and boost::function (both in the Standard for C++0x) as to how it should be done- if you can read them past all the macro mess, anyway.
You better listen to DeadMG. The problem is, that the size of a member pointer depends on the class type for which you want to form the member pointer. This is so, because depending on the kind of class layout (for example if the class have virtual bases and so on) the member pointer has to contain various offset adjustment values - there is no "one size fits all" member pointer type you can count on. It also means, that you can not assume to have a "castable" integral type which can hold every possible member function pointer.
I need to convert an integral type which contains an address to the actual pointer type. I could use reinterpret_cast as follows:
MyClass *mc1 = reinterpret_cast<MyClass*>(the_integer);
However, this does not perform any run-time checks to see if the address in question actually holds a MyClass object. I want to know if there is any benefit in first converting to a void* (using reinterpret_cast) and then using dynamic_cast on the result. Like this:
void *p = reinterpret_cast<void*>(the_integer);
MyClass *mc1 = dynamic_cast<MyClass*>(p);
assert(mc1 != NULL);
Is there any advantage in using the second method?
Type checking on dynamic_cast is implemented in different ways by different C++ implementations; if you want an answer for your specific implementation you should mention what implementation you are using. The only way to answer the question in general is to refer to ISO standard C++.
By my reading of the standard, calling dynamic_cast on a void pointer is illegal:
dynamic_cast<T>(v)
"If T is a pointer type, v shall be an rvalue of a pointer to complete class type"
(from 5.2.7.2 of the ISO C++ standard). void is not a complete class type, so the expression is illegal.
Interestingly, the type being cast to is allowed to be a void pointer, i.e.
void * foo = dynamic_cast<void *>(some_pointer);
In this case, the dynamic_cast always succeeds, and the resultant value is a pointer to the most-derived object pointed to by v.
No, there's no specific advantage in doing so. The moment you use reinterpret_cast, all bets are off. It's up to you to be sure the cast is valid.
Actually no serious advantage. If the void* points to something that is not a pointer to a polymorphic object you run into undefined behaviour (usually an access violation) immediately.
The safe way is to keep a record of all live MyClass objects. It's best to keep this record in a std::set<void*>, which means you can easily add, remove and test elements.
The reason for storing them as void*s is that you don't risk nastyness like creating unaligned MyClass* pointers from your integers.
First of all "reinterpreting" int to void * is a bad idea. If sizeof(int) is 4 and sizeof(void *) is 8 (64x system) it is ill-formed.
Moreover dynamic_cast is valid only for the case of the polymorphic classes.
Option 1 is your only (semi) portable/valid option.
Option 2: is not valid C++ as the dynamic_cast (as void is not allowed).
At an implementation level it requires type information from the source type to get to the destination type. There is no way (or there may be no way) to get the runtime source type information from a void* so this is not valid either.
Dynamic_Cast is used to cas up and down the type hierarchy not from unknown types.
As a side note you should probably be using void* rather than an integer to store an untyped pointer. There is potential for an int not to be large enough to store a pointer.
The safest way to handle pointers in C++ is to handle them typesafe. This means:
Never store pointers in anything else than a pointer
Avoid void pointers
Never pass pointers to other processes
consider weak_ptr if you plan to use pointers over threads
The reason for this is: what you are planning to do is unsafe and can be avoided unless you're interfacing with unsafe (legacy?) code. In this case consider MSalters' answer, but be aware that it still is a hassle.
If you know for sure that the_integer points to a known base class (that has at least one virtual member), there might in fact be an advantage: knowing that the object is of a specific derived class. But you’d have to reinterpret_cast to your base class first and then do the dynamic_cast:
BaseClass* obj = reinterpret_cast<BaseClass*>(the_integer);
MyClass* myObj = dynamic_cast<BaseClass*>(obj);
Using a void* in dynamic_cast is useless and simply wrong. You cannot use dynamic_cast to check if there’s a valid object at some arbitrary location in memory.
You should also pay attention when storing addresses in non-pointer type variables. There are architectures where sizeof(void*) != sizeof(int), e.g. LP64.
Let's say I have type A, and a derived type B. When I perform a dynamic cast from A* to B*, what kind of "runtime checks" the environment performs? How does it know that the cast is legal?
I assume that in .Net it's possible to use the attached metadata in the object's header, but what happen in C++?
Exact algorithm is compiler-specfic. Here's how it works according to Itanium C++ ABI (2.9.7) standard (written after and followed by GCC).
Pointer to base class is a pointer to the middle of the body of the "big" class. The body of a "big" class is assembled in such a way, that whatever base class your pointer points to, you can uniformly access RTTI for that "big" class, which your "base" class in fact is. This RTTI is a special structure that relates to the "big" class information: of what type it is, what bases it has and at what offsets they are.
In fact, it is the "metadata" of the class, but in more "binary" style.
V instance;
Base *v = &instance;
dynamic_cast<T>(v);
Dynamic cast makes use of the fact that when you write dynamic_cast<T>(v), the compiler can immediately identify metadata for the "big" class of v -- i.e. V! When you write it, you think that T is more derived than Base, so compiler will have hard time doing base-to-drived cast. But compiler can immediately (at runtime) determine most deirved type--V--and it only has then to traverse the inheritance graph contained in metadata to check whether it can downcast V to T. If it can, it just checks the offset. If it can't or is amboguous -- returns NULL.
Dynamic cast is a two-step process:
Given the vtable of a pointer to an object, use the offset to recover a pointer to the full class. (All adjustments will then be made from this pointer.) This is the equivalent of down-casting to the full class.
Search the type_info of the full class for the type we want - in other words, go through a list of all bases. If we find one, use the offset to adjust the pointer again. If the search in step 2 fails, return NULL.
Dynamic cast performs a runtime check whether this is a valid and doable cast; it'll return NULL when it's not possible to perform the cast.
Refere your favourite book on RTTI.