C++ struct reinterpret_cast - c++

Suppose there are two struct A and B. They have a common struct C.
I would like to know if it is safe to call reinterpret_cast to A or B to C.
if not, is there any way to do so without any performance impact?
struct C
{
string m_c1;
int32_t m_c2;
double m_c3;
string m_c4;
};
struct A
{
C m_a1;
string m_a2;
int32_t m_a3;
};
struct B
{
C m_b1;
string m_b2;
int32_t m_b3;
double m_b4;
};
int main(int argc,char *argv[])
{
A a;
a.m_a1.m_c1="A";
a.m_a1.m_c4="AA";
B b;
b.m_b1.m_c1="B";
b.m_b1.m_c4="BB";
C* pc = reinterpret_cast<C*>(&a);
cout << pc->m_c1 << " " << pc->m_c4 << endl;
pc = reinterpret_cast<C*>(&b);
cout << pc->m_c1 << " " << pc->m_c4 << endl;
return 1;
}

As Mike DeSimone points out the string class is not guaranteed to be a standard-layout class, and thus the class C is not standard-layout, which means that you have no guarantees of the memory layout at all. So it is not safe. Only if you change the string to a (const) char* it will be guaranteed to be safe.
Even then it will only be safe as long as the layout of the classes stays the same (you cannot change the order of members or change the access specifier on them), and the classes stays without any vtables, this is "safe" in such a way that the compiler will generate code that display the behavior you would like.
This is how ever two guarantees that a software developer seldom is able to give. Also the code is hard to understand written like this. Another developer (or the same developer a month later) might ignore the this code (or simply don't understand it), and do the changes needed, and suddenly the code is broken and you got some hard to catch errors on your hand.
A and B are classes that give access to a C (or some members of C). More readable and thus more safe solutions are:
Create an accessor for both A and B, this would probably be inlined and incur no performance penalty.
If there is any reason for inheritance use simple inheritance. Either A and B is-a ClassThatHasAC or A and B is-a C As long as there are no virtual functions you would probably not see any performance issues here either. In both cases an accessor would provide you benefits probably without any performance cost.
Create some simple and readable code at first, and measure performance. Onli if this C access is costing you too much, optimize. But if your optimization boils down to the reinterpret cast trick make sure that there are plenty of warning signs around to make sure that no one steps on this booby trap.

Why dont you inherit both A,B from C and then use static_cast instead? Should be safer/cleaner.
Indeed in your case, you shouldnt need a cast at all, you should be able to assign A or B ptrs to a C*

Related

Does the size of a binary executable depend on inheritance?

I have two scenarios, one with inheritance, and one without.
First:
Class A
{
public:
int a;
void prnt() { cout << "class A"; }
}
Class B: public A
{
}
Second:
Class A
{
public:
int a;
void prnt() { cout << "class B"; }
}
Class B
{
public:
int a;
void prnt() { cout << "class A"; }
}
Does the inheritance increase the size of executable or not?
As in above example, in the first scenario I derived class B from A, so that B has all the data members of B. And in the second scenario I just copy pasted the code from class A to B.
What is advisable in the above scenarios, where we know that there will be only two classes with same data members?
Should we use inheritance, or just create two separate classes?
And will there be any difference in the size of final executable in both scenarios?
You shouldn't worry to much about these kind of details and fully concentrate on the design that can best represent your application domain.
The real question you should ask, is whether B is really an A, i.e. everytime you use an A you could also use a B.
Nevertheless, for the sake of curiosity:
I've compiled both versions with MSVC2013 by adding a main() creating an A and a B object and invoking their sole function (I used exacltly the same prnt body in the second example to compare comparable things). In both debug and release mode, the exe file were in the end EXACTLY the same size.
As pointed out in the comments, compiler do a pretty good job. For instance, in release mode, it inlined the code of the function invokation, resulting in the same main() machine code. For the second version, additional code was generated for B::prnt(), but apparently the linker identified the unused function and discarded it.
Of course, with more complex code, the result might be different. Inlining will not always be possible, and functions can have small differences which make deduplication more difficult.
As a rule of the tumb, the size of the code will depend more on the quantity of code (and copy/pasted code) that you write than on inheritance. Inheritance should in fact tend naturally to generate smaller code (especially true with older less optimized compilers), because the inheritance relation tells the compiler that it can/shall reuse the common code.

Position of a vpointer in an object

class C
{
public:
C() : m_x(0) { }
virtual ~C() { }
public:
static ptrdiff_t member_offset(const C &c)
{
const char *p = reinterpret_cast<const char*>(&c);
const char *q = reinterpret_cast<const char*>(&c.m_x);
return q - p;
}
private:
int m_x;
};
int main(void)
{
C c;
std::cout << ((C::member_offset(c) == 0) ? 0 : 1);
std::cout << std::endl;
std::system("pause");
return 0;
}
The program above outputs 1. What it does is just check the addresses of the c object and the c's field m_x. It prints out 1 which means the addresses are not equal. My guess is that is because the d'tor is virtual so the compiler has to create a vtable for the class and put a vpointer in the class's object. If I'm already wrong please correct me.
Apparently, it puts the vpointer at the beginning of the object, pushing the m_x field farther and thus giving it a different address. Is that the case? If so does the standard specify vpointer's position in the object? According to wiki it's implementation-dependent. And its position may change the output of the program.
So can you always predict the output of this program without specifying the target platform?
In reality, it is NEARLY ALWAYS laid out in this way. However, the C++ standard allows whatever works to be used. And I can imagine several solutions that doesn't REQUIRE the above to be true - although they would perhaps not work well as a real solution.
Note however that you can have more than one vptr/vtable for an object if you have multiple inheritance.
There are no "vpointers" in C++. The implementation of polymorphism and dynamic dispatch is left to the compiler, and the resulting class layout is not in any way specified. Certainly an object of polymorphic type will have to carry some extra state in order to identify the concrete type when given only a view of a base subobject.
Implementations with vtables and vptrs are common and popular, and putting the vptr at the beginning of the class means that you don't need any pointer adjustments for single inheritance up and downcasts.
Many C++ compilers follow (parts of) the Itanium ABI for C++, which specifies class layout decisions like this. This popular article may also provide some insights.
Yes, it is implementation dependent and no, you can't predict the program's output without knowing the target platform/compiler.

Is it possible to convert regular pointer to pointer to class field

Given a class instance and pointer to a field we can obtain regular pointer pointing to field variable of this class instance - as in last assignment in following code;
class A
{
public:
int i, j;
};
int main(){
A a;
int A::*p = &A::i;
int* r = &(a.*p); // r now points to a.i;
}
Is it possible to invert this conversion: given class instance A a; and int* r obtain int A::* p (or NULL ptr if pointer given is not in instance given) as in code:
class A
{
public:
int i, j;
};
int main(){
A a;
int A::*p = &A::i;
int* r = &(a.*p); // r now points to a.i;
int A::*s = // a--->r -how to extract r back to member pointer?
}
The only way that I can think of doing it, would be to write a function that takes every known field of A, calculates it's address for given instance and compares with address given. This however requires writing custom code for every class, and might get difficult to manage. It has also suboptimal performance.
I can imagine that such conversion could be done by compiler in few operations under all implementations i know - such pointer is usually just an offset in structure so it would be just a subtraction and range check to see if given pointer is actually in this class storage. Virtual base classes add a bit of complexity, but nothing compiler couldn't handle I think. However it seems that since it's not required by standard (is it?) no compiler vendor cares.
Or am I wrong, and there is some fundamental problem with such conversion?
EDIT:
I see that there is a little misunderstanding about what I am asking about. In short I am asking if either:
There is already some implementation of it (at the compiler level I mean), but since hardly anybody uses it, almost nobody knows about it.
There is no mention of it in standard and no compiler vendor has though of it, but In principle it is possible to implement (once again: by the compiler, not compiled code.)
There is some deep-reaching problem with such an operation, that I missed.
My question was - which of those is true? And in case of the last - what is underlying problem?
I am not asking for workarounds.
There is no cross platform way to do this. Pointer to member values are commonly implemented as offsets to the start of the object. Leveraging off of that fact I made this (works in VS, haven't tried anything else):
class A
{
public:
int i, j;
};
int main()
{
A a;
int A::*p = &A::i;
int* r = &(a.*p); // r now points to a.i;
union
{
std::ptrdiff_t offset;
int A::*s;
};
offset = r - reinterpret_cast<int*>(&a);
a.*s = 7;
std::cout << a.i << '\n';
}
AFAIK C++ does not provide full reflection, which you would need to do that.
One solution is to provide reflection yourself (the way you describe is one way, it may not be the best one but it would work).
A totally non portable solution would be to locate the executable and use any debug information it may contain. Obviously non portable and requires the debug information to be there to begin with.
There's a decent description of the problem of reflection and different possible approaches to it in the introduction section of http://www.garret.ru/cppreflection/docs/reflect.html
Edit:
As I wrote above, there's no portable and general solution. But there may be a very non portable approach. I'm not giving you an implementation here as I do not have a C++ compiler at the moment to test it, but I'll describe the idea.
The basis is what Dave did in his answer: exploit the fact that a pointer to member is often just an offset. The problem is with base classes (especially virtual and multiple inheritance ones). You can approach it with templates. You can use dynamic casting to get a pointer to the base class. And eventually diff that pointer with the original to find out the offset of the base.

LTO, Devirtualization, and Virtual Tables

Comparing virtual functions in C++ and virtual tables in C, do compilers in general (and for sufficiently large projects) do as good a job at devirtualization?
Naively, it seems like virtual functions in C++ have slightly more semantics, thus may be easier to devirtualize.
Update: Mooing Duck mentioned inlining devirtualized functions. A quick check shows missed optimizations with virtual tables:
struct vtab {
int (*f)();
};
struct obj {
struct vtab *vtab;
int data;
};
int f()
{
return 5;
}
int main()
{
struct vtab vtab = {f};
struct obj obj = {&vtab, 10};
printf("%d\n", obj.vtab->f());
}
My GCC will not inline f, although it is called directly, i.e., devirtualized. The equivalent in C++,
class A
{
public:
virtual int f() = 0;
};
class B
{
public:
int f() {return 5;}
};
int main()
{
B b;
printf("%d\n", b.f());
}
does even inline f. So there's a first difference between C and C++, although I don't think that the added semantics in the C++ version are relevant in this case.
Update 2: In order to devirtualize in C, the compiler has to prove that the function pointer in the virtual table has a certain value. In order to devirtualize in C++, the compiler has to prove that the object is an instance of a particular class. It would seem that the proof is harder in the first case. However, virtual tables are typically modified in only very few places, and most importantly: just because it looks harder, doesn't mean that compilers aren't as good in it (for otherwise you might argue that xoring is generally faster than adding two integers).
The difference is that in C++, the compiler can guarantee that the virtual table address never changes. In C then it's just another pointer and you could wreak any kind of havoc with it.
However, virtual tables are typically modified in only very few places
The compiler doesn't know that in C. In C++, it can assume that it never changes.
I tried to summarize in http://hubicka.blogspot.ca/2014/01/devirtualization-in-c-part-2-low-level.html why generic optimizations have hard time to devirtualize. Your testcase gets inlined for me with GCC 4.8.1, but in slightly less trivial testcase where you pass pointer to your "object" out of main it will not.
The reason is that to prove that the virtual table pointer in obj and the virtual table itself did not change the alias analysis module has to track all possible places you can point to it. In a non-trivial code where you pass things outside of the current compilation unit this is often a lost game.
C++ gives you more information on when type of object may change and when it is known. GCC makes use of it and it will make a lot more use of it in the next release. (I will write on that soon, too).
Yes, if it is possible for the compiler to deduce the exact type of a virtualized type, it can "devirtualize" (or even inline!) the call. A compiler can only do this if it can guarantee that no matter what, this is the function needed.
The major concern is basically threading. In the C++ example, the guarantees hold even in a threaded environment. In C, that can't be guaranteed, because the object could be grabbed by another thread/process, and overwritten (deliberately or otherwise), so the function is never "devirtualized" or called directly. In C the lookup will always be there.
struct A {
virtual void func() {std::cout << "A";};
}
struct B : A {
virtual void func() {std::cout << "B";}
}
int main() {
B b;
b.func(); //this will inline in optimized builds.
}
It depends on what you are comparing compiler inlining to. Compared to link time or profile guided or just in time optimizations, compilers have less information to use. With less information, the compile time optimizations will be more conservative (and do less inlining overall).
A compiler will still generally be pretty decent at inlining virtual functions as it is equivalent to inlining function pointer calls (say, when you pass a free function to an STL algorithm function like sort or for_each).

Static cast vs. dymamic cast for traversing inheritance hierarchies

I saw one book on C++ mentioning that navigating inheritance hierarchies using static cast is more efficient than using dynamic cast.
Example:
#include <iostream>
#include <typeinfo>
using namespace std;
class Shape { public: virtual ~Shape() {}; };
class Circle : public Shape {};
class Square : public Shape {};
class Other {};
int main() {
Circle c;
Shape* s = &c; // Upcast: normal and OK
// More explicit but unnecessary:
s = static_cast<Shape*>(&c);
// (Since upcasting is such a safe and common
// operation, the cast becomes cluttering)
Circle* cp = 0;
Square* sp = 0;
// Static Navigation of class hierarchies
// requires extra type information:
if(typeid(s) == typeid(cp)) // C++ RTTI
cp = static_cast<Circle*>(s);
if(typeid(s) == typeid(sp))
sp = static_cast<Square*>(s);
if(cp != 0)
cout << "It's a circle!" << endl;
if(sp != 0)
cout << "It's a square!" << endl;
// Static navigation is ONLY an efficiency hack;
// dynamic_cast is always safer. However:
// Other* op = static_cast<Other*>(s);
// Conveniently gives an error message, while
Other* op2 = (Other*)s;
// does not
} ///:~
However, both dynamic cast and static cast (as implemented above) need RTTI enabled for such navigation to work. It's just that dynamic cast requires the class hierarchy to be polymorphic (i.e. base class having at least one virtual function).
Where does this efficiency gain for static cast come from?
The book does mention that dynamic cast is the preferred way for doing type-safe downcasting.
static_cast per se DOESN'T need RTTI -- typeid does (as does dynamic_cast), but that's a completely different issue. Most casts are just telling the compiler "trust me, I know what I'm doing" -- dynamic_cast is the exception, it asks the compiler to check at runtime and possibly fail. That's the big performance difference right there!
It's much better to avoid switching on types at all if possible. This is usually done by moving the relevant code to a virtual method that is implemented differently for different subtypes:
class Shape {
public:
virtual ~Shape() {};
virtual void announce() = 0; // And likewise redeclare in Circle and Square.
};
void Circle::announce() {
cout << "It's a circle!" << endl;
}
void Square::announce() {
cout << "It's a square!" << endl;
}
// Later...
s->announce();
If you are working with a pre-existing inheritance hierarchy that you can't change, investigate the Visitor pattern for a more extensible alternative to type-switching.
More info: static_cast does not require RTTI, but a downcast using it can be unsafe, leading to undefined behaviour (e.g. crashing). dynamic_cast is safe but slow, because it checks (and therefore requires) RTTI info. The old C-style cast is even more unsafe than static_cast because it will quietly cast across completely unrelated types, where static_cast would object with a compile-time error.
With the static cast (and typeid check) you cannot downcast to an intermediate type (child derives from father derives from grandfather, you cannot downcast from grandfather to father) the usage is a little more limited. static_cast without the typeid check is sacrificing correctness for perfomance, and then you know what they say:
He who sacrifices correctness for performance deserves neither
Then of course, there are situations where you are in desperate need of a few CPU instructions and there is nowhere else to look for improvements and you are actually safe on what you are doing and you have meassured (right?) that the only place to gain performance is using static_cast instead of dynamic_cast... then you know you must rework your design, or your algorithms or get better hardware.
The restrictions you impose by using rtti + static_cast is that you will not be able to extend your code with new derived classes at a later time without reworking all places where you have used this trick to gain just a few CPU instructions. That reworking itself will probably take more time (engineering time that is more expensive) than the CPU time you have obtained. If, at any rate, the time devoted to downcasts is noticeable, then rework your design as j_random_hacker suggests, it will improve both in design and performance.
dynamic_cast would return NULL if you hadn't done the typeid check and the cast couldn't succeed. static_cast would succeed (and lead to undefined behavior, such as an eventual crash). That's likely the speed difference.