How to do a static assert that a pointer cast is trivial? - c++

Let's say I have these types:
struct A {
int a;
};
struct B {
int b;
};
struct C : public A, public B {
int c;
};
A C* pointer can be cast to A* pointer without adjusting the actual address at all. But when C* is cast to B*, the value must change. I'd like to ensure that two related types I have can be cast to each other without a change in address (i.e. that there is no multiple inheritance, or that the base class is the first base of the derived class). This could be checked at run-time, e.g. like so
assert(size_t(static_cast<A*>((C*)0xF000) == 0xF000);
assert(size_t(static_cast<B*>((C*)0xF000) != 0xF000);
That works. But this information is known at compile time, so I'm looking for a way to do a compile-time assert on it. The obvious ways of converting the above to a static assert (e.g. replace assert with BOOST_STATIC_ASSERT give the error "a cast to a type other than an integral or enumeration type cannot appear in a constant-expression" with g++ 4.2.
Portability isn't too important. Using gcc extensions, or hacky template tricks would all be fine.
Update: Found that almost the same question has been asked before: C++, statically detect base classes with differing addresses?. Using offsetof() is the only useful suggestion there too.

"I'd like to ensure that two related types can be cast to each other without a change in address (i.e. that there is no multiple inheritance, or that the base class is the first base of the derived class)."
Your "i.e." isn't correct. For instance, it's entirely possible that the first 4 bytes of Derived are a vtable pointer, even when Base isn't polymorphic. Yes, C++ compilers are easier if (1) the first base subobject is at offset 0, and (2) the vtable pointer is at offset 0. But those two goals are inherently at odds, and there is no clear better choice.
Now, the first part could be tested in theory. With standard-layout types, there would be no difference in offset_of(Base, first_member) and offset_of(Derived, first_member). But in practice, offset_of doesn't work for interesting types; it's UB. And the whole point of this check is to check a type, so it should fail reliably for nonstandard-layout types.

Based on a suggestion from MSalters, and an answer from C++, statically detect base classes with differing addresses?, here is the closest thing to an answer I can come up with. It's probably gcc-specific, and requires knowing some member of the base class:
#pragma GCC diagnostic ignored "-Winvalid-offsetof" // To suppress warning.
BOOST_STATIC_ASSERT(offsetof(C, a) == offsetof(A, a));
BOOST_STATIC_ASSERT(offsetof(C, b) != offsetof(B, b));
#pragma GCC diagnostic warn "-Winvalid-offsetof"
Obviously this is both inconvenient and scary (requires to know a member and to turn off a warning).

Related

How can I detect whether a class has an implicit constructor AND primitive members in c++?

I want to detect during compile time (static assertion) whether a class meets both the following conditions:
Has an implicit default constructor (i.e., no user-defined default constructor).
Has at least one data member which is a pod (i.e., a member whose default initialization is to assume whatever random bytes was in its memory address).
[I hope I used the term pod correctly here]
The idea is to avoid working with objects with uninitialized members. I know there are different methods to do this during coding, but I also want a mechanism to detect this during compilation.
I tried using different std/boost functions, such as is_trivially_constructible, is_pod, but none of these provide the exact terms I need.
For example, let's say I have the following classes:
struct A
{
int a;
}
struct B
{
int* b;
}
struct C
{
bool c;
std::string c_str_;
}
struct D
{
D();
float d;
}
struct E
{
std::string e;
}
Assuming the function I need is called "has_primitive_and_implicit_ctor", I would like the output for each call to be as in the comments:
has_primitive_and_implicit_ctor<A>(); //true - A has at least one pod type member (int)
has_primitive_and_implicit_ctor<B>(); //true - A has at least one pod type member (pointer)
has_primitive_and_implicit_ctor<C>(); //true - A has at least one pod type member (bool), even though there is one non-pod member
has_primitive_and_implicit_ctor<D>(); //false - has a pod member(float), but a user defined ctor
has_primitive_and_implicit_ctor<E>(); //false - doesn't have a default ctor but has no pod members
Firstly, it seems to me like a broken design to expect from the user of a class to care about its member initialisation. You should make sure in the class itself that all its members are initialised, not somewhere else where it is used.
What you are looking for does not exist, and if it would, it would not even help you. The existence of an explicit constructor does not guarantee that a data member is initialised. On the other hand, with C++11 it is even possible to initialise data members without explicitly writing a constructor (using the brace syntax in the class declaration). Also you just seem to care about uninitialised POD members, but what about uninitialised non-POD members?
That said, compiler can generate warnings about uninitialised values, but often you have to enable this warning (e.g. -Wuninitialized option for gcc). Most compilers allow to force treating warnings as an error. In combination this can give you the desired effect even without specifically writing code to test for it, and it would also work for any uninitialised values, not only those in classes. Maybe this is the solution you are looking for.

Why does C++ allow private members to be modified using this approach?

After seeing this question a few minutes ago, I wondered why the language designers allow it as it allows indirect modification of private data. As an example
class TestClass {
private:
int cc;
public:
TestClass(int i) : cc(i) {};
};
TestClass cc(5);
int* pp = (int*)&cc;
*pp = 70; // private member has been modified
I tested the above code and indeed the private data has been modified. Is there any explanation of why this is allowed to happen or this just an oversight in the language? It seems to directly undermine the use of private data members.
Because, as Bjarne puts it, C++ is designed to protect against Murphy, not Machiavelli.
In other words, it's supposed to protect you from accidents -- but if you go to any work at all to subvert it (such as using a cast) it's not even going to attempt to stop you.
When I think of it, I have a somewhat different analogy in mind: it's like the lock on a bathroom door. It gives you a warning that you probably don't want to walk in there right now, but it's trivial to unlock the door from the outside if you decide to.
Edit: as to the question #Xeo discusses, about why the standard says "have the same access control" instead of "have all public access control", the answer is long and a little tortuous.
Let's step back to the beginning and consider a struct like:
struct X {
int a;
int b;
};
C always had a few rules for a struct like this. One is that in an instance of the struct, the address of the struct itself has to equal the address of a, so you can cast a pointer to the struct to a pointer to int, and access a with well defined results. Another is that the members have to be arranged in the same order in memory as they are defined in the struct (though the compiler is free to insert padding between them).
For C++, there was an intent to maintain that, especially for existing C structs. At the same time, there was an apparent intent that if the compiler wanted to enforce private (and protected) at run-time, it should be easy to do that (reasonably efficiently).
Therefore, given something like:
struct Y {
int a;
int b;
private:
int c;
int d;
public:
int e;
// code to use `c` and `d` goes here.
};
The compiler should be required to maintain the same rules as C with respect to Y.a and Y.b. At the same time, if it's going to enforce access at run time, it may want to move all the public variables together in memory, so the layout would be more like:
struct Z {
int a;
int b;
int e;
private:
int c;
int d;
// code to use `c` and `d` goes here.
};
Then, when it's enforcing things at run-time, it can basically do something like if (offset > 3 * sizeof(int)) access_violation();
To my knowledge nobody's ever done this, and I'm not sure the rest of the standard really allows it, but there does seem to have been at least the half-formed germ of an idea along that line.
To enforce both of those, the C++98 said Y::a and Y::b had to be in that order in memory, and Y::a had to be at the beginning of the struct (i.e., C-like rules). But, because of the intervening access specifiers, Y::c and Y::e no longer had to be in order relative to each other. In other words, all the consecutive variables defined without an access specifier between them were grouped together, the compiler was free to rearrange those groups (but still had to keep the first one at the beginning).
That was fine until some jerk (i.e., me) pointed out that the way the rules were written had another little problem. If I wrote code like:
struct A {
int a;
public:
int b;
public:
int c;
public:
int d;
};
...you ended up with a little bit of self contradition. On one hand, this was still officially a POD struct, so the C-like rules were supposed to apply -- but since you had (admittedly meaningless) access specifiers between the members, it also gave the compiler permission to rearrange the members, thus breaking the C-like rules they intended.
To cure that, they re-worded the standard a little so it would talk about the members all having the same access, rather than about whether or not there was an access specifier between them. Yes, they could have just decreed that the rules would only apply to public members, but it would appear that nobody saw anything to be gained from that. Given that this was modifying an existing standard with lots of code that had been in use for quite a while, the opted for the smallest change they could make that would still cure the problem.
Because of backwards-compatability with C, where you can do the same thing.
For all people wondering, here's why this is not UB and is actually allowed by the standard:
First, TestClass is a standard-layout class (§9 [class] p7):
A standard-layout class is a class that:
has no non-static data members of type non-standard-layout class (or array of such types) or reference, // OK: non-static data member is of type 'int'
has no virtual functions (10.3) and no virtual base classes (10.1), // OK
has the same access control (Clause 11) for all non-static data members, // OK, all non-static data members (1) are 'private'
has no non-standard-layout base classes, // OK, no base classes
either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and // OK, no base classes again
has no base classes of the same type as the first non-static data member. // OK, no base classes again
And with that, you can are allowed to reinterpret_cast the class to the type of its first member (§9.2 [class.mem] p20):
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa.
In your case, the C-style (int*) cast resolves to a reinterpret_cast (§5.4 [expr.cast] p4).
A good reason is to allow compatibility with C but extra access safety on the C++ layer.
Consider:
struct S {
#ifdef __cplusplus
private:
#endif // __cplusplus
int i, j;
#ifdef __cplusplus
public:
int get_i() const { return i; }
int get_j() const { return j; }
#endif // __cplusplus
};
By requiring that the C-visible S and the C++-visible S be layout-compatible, S can be used across the language boundary with the C++ side having greater access safety. The reinterpret_cast access safety subversion is an unfortunate but necessary corollary.
As an aside, the restriction on having all members with the same access control is because the implementation is permitted to rearrange members relative to members with different access control. Presumably some implementations put members with the same access control together, for the sake of tidiness; it could also be used to reduce padding, although I don't know of any compiler that does that.
The whole purpose of reinterpret_cast (and a C style cast is even more powerful than a reinterpret_cast) is to provide an escape path around safety measures.
The compiler would have given you an error if you had tried int *pp = &cc.cc, the compiler would have told you that you cannot access a private member.
In your code you are reinterpreting the address of cc as a pointer to an int. You wrote it the C style way, the C++ style way would have been int* pp = reinterpret_cast<int*>(&cc);. The reinterpret_cast always is a warning that you are doing a cast between two pointers that are not related. In such a case you must make sure that you are doing right. You must know the underlying memory (layout). The compiler does not prevent you from doing so, because this if often needed.
When doing the cast you throw away all knowledge about the class. From now on the compiler only sees an int pointer. Of course you can access the memory the pointer points to. In your case, on your platform the compiler happened to put cc in the first n bytes of a TestClass object, so a TestClass pointer also points to the cc member.
This is because you are manipulating the memory where your class is located in memory. In your case it just happen to store the private member at this memory location so you change it. It is not a very good idea to do because you do now know how the object will be stored in memory.

Typecasting from bare data to C++ wrapper

Say I have some data allocated somewhere in my program, like:
some_type a;
and I want to wrap this data in a class for access. Is it valid to say,
class Foo {
private:
some_type _val;
public:
inline void doSomething() { c_doSomething(&_val); }
}
Foo *x = reinterpret_cast<Foo *>(&a);
x->double();
The class has no virtual functions, and only includes a single data item of the type I'm trying to wrap. Does the C++ standard specify that this reinterpret_cast is safe and valid? sizeof(Foo) == sizeof(some_type), no address alignment issues, or anything? (In my case, I'd be ensuring that some_type is either a primitive type like int, or a POD structure, but I'm curious what happens if we don't enforce that restriction, too - for example, a derived class of a UIWidget like a UIMenuItem, or something.)
Thanks!
Is it valid to say...
No, this is not valid. There are only a small number of types that a can be treated as; the complete list can be found in an answer I gave to another question.
Does the C++ standard specify that this reinterpret_cast is safe and valid?
The C++ Standards says very little about reinterpret_cast. Its behavior is almost entirely implementation-defined, so use of it is usually non-portable.
The correct way to do this would be to either
have a Foo constructor that takes a some_type argument and makes a copy of it or stores a reference or pointer to it, or
implement your "wrapper" interface as a set of non-member functions that take a some_type object by reference as an argument.
14882/1998/9.2.17:
"A pointer to a PODstruct
object, suitably converted using a reinterpret_cast, points to its initial
member (or if that member is a bitfield,
then to the unit in which it resides) and vice versa. [Note: There
might therefore be unnamed padding within a PODstruct
object, but not at its beginning, as necessary to
achieve appropriate alignment. ]"
So, it would be valid if your wrapper was strictly a POD in itself. However, access specifiers mean that it is not a strictly a POD. That said, I would be interested in knowing whether any current implementation changes object layout due to access specifiers. I think that for all practical purposes, you are good to go.
And for the case when the element is not a POD, it follows that the container is not a POD, and hence all bets are off.
Since your Foo object is already only valid as long as the existing a is valid:
struct Foo {
some_type &base;
Foo(some_type &base) : base (base) {}
void doSomething() { c_doSomething(&base); }
}
//...
Foo x = a;
x.doSomething();
You want to look up the rules governing POD (plain old data) types. If the C++ class is a POD type, then yes, you can cast it.
The details of what actually happens and how aliasing is handled are implementation defined, but are usually reasonable and should match what would happen with a similar C type or struct.
I happen to use this a lot in a project of mine that implements B+ trees in shared memory maps. It has worked in GCC across multiple types of Linux and BSDs including Mac OS X. It also works fine in Windows with MSVC.
Yes, this is valid as long as the wrapper type you are creating (Foo in your example) is a POD-type.

Why C++ virtual function defined in header may not be compiled and linked in vtable?

Situation is following. I have shared library, which contains class definition -
QueueClass : IClassInterface
{
virtual void LOL() { do some magic}
}
My shared library initialize class member
QueueClass *globalMember = new QueueClass();
My share library export C function which returns pointer to globalMember -
void * getGlobalMember(void) { return globalMember;}
My application uses globalMember like this
((IClassInterface*)getGlobalMember())->LOL();
Now the very uber stuff - if i do not reference LOL from shared library, then LOL is not linked in and calling it from application raises exception. Reason - VTABLE contains nul in place of pointer to LOL() function.
When I move LOL() definition from .h file to .cpp, suddenly it appears in VTABLE and everything works just great.
What explains this behavior?! (gcc compiler + ARM architecture_)
The linker is the culprit here. When a function is inline it has multiple definitions, one in each cpp file where it is referenced. If your code never references the function it is never generated.
However, the vtable layout is determined at compile time with the class definition. The compiler can easily tell that the LOL() is a virtual function and needs to have an entry in the vtable.
When it gets to link time for the app it tries to fill in all the values of the QueueClass::_VTABLE but doesn't find a definition of LOL() and leaves it blank(null).
The solution is to reference LOL() in a file in the shared library. Something as simple as &QueueClass::LOL;. You may need to assign it to a throw away variable to get the compiler to stop complaining about statements with no effect.
I disagree with #sechastain.
Inlining is far from being automatic. Whether or not the method is defined in place or a hint (inline keyword or __forceinline) is used, the compiler is the only one to decide if the inlining will actually take place, and uses complicated heuristics to do so. One particular case however, is that it shall not inline a call when a virtual method is invoked using runtime dispatch, precisely because runtime dispatch and inlining are not compatible.
To understand the precision of "using runtime dispatch":
IClassInterface* i = /**/;
i->LOL(); // runtime dispatch
i->QueueClass::LOL(); // compile time dispatch, inline is possible
#0xDEAD BEEF: I find your design brittle to say the least.
The use of C-Style casts here is wrong:
QueueClass* p = /**/;
IClassInterface* q = p;
assert( ((void*)p) == ((void*)q) ); // may fire or not...
Fundamentally there is no guarantee that the 2 addresses are equal: it is implementation defined, and unlikely to resist change.
I you wish to be able to safely cast the void* pointer to a IClassInterface* pointer then you need to create it from a IClassInterface* originally so that the C++ compiler may perform the correct pointer arithmetic depending on the layout of the objects.
Of course, I shall also underline than the use of global variables... you probably know it.
As for the reason of the absence ? I honestly don't see any apart from a bug in the compiler/linker. I've seen inlined definition of virtual functions a few times (more specifically, the clone method) and it never caused issues.
EDIT: Since "correct pointer arithmetic" was not so well understood, here is an example
struct Base1 { char mDum1; };
struct Base2 { char mDum2; };
struct Derived: Base1, Base2 {};
int main(int argc, char* argv[])
{
Derived d;
Base1* b1 = &d;
Base2* b2 = &d;
std::cout << "Base1: " << b1
<< "\nBase2: " << b2
<< "\nDerived: " << &d << std::endl;
return 0;
}
And here is what was printed:
Base1: 0x7fbfffee60
Base2: 0x7fbfffee61
Derived: 0x7fbfffee60
Not the difference between the value of b2 and &d, even though they refer to one entity. This can be understood if one thinks of the memory layout of the object.
Derived
Base1 Base2
+-------+-------+
| mDum1 | mDum2 |
+-------+-------+
When converting from Derived* to Base2*, the compiler will perform the necessary adjustment (here, increment the pointer address by one byte) so that the pointer ends up effectively pointing to the Base2 part of Derived and not to the Base1 part mistakenly interpreted as a Base2 object (which would be nasty).
This is why using C-Style casts is to be avoided when downcasting. Here, if you have a Base2 pointer you can't reinterpret it as a Derived pointer. Instead, you will have to use the static_cast<Derived*>(b2) which will decrement the pointer by one byte so that it correctly points to the beginning of the Derived object.
Manipulating pointers is usually referred to as pointer arithmetic. Here the compiler will automatically perform the correct adjustment... at the condition of being aware of the type.
Unfortunately the compiler cannot perform them when converting from a void*, it is thus up to the developer to make sure that he correctly handles this. The simple rule of thumb is the following: T* -> void* -> T* with the same type appearing on both sides.
Therefore, you should (simply) correct your code by declaring: IClassInterface* globalMember and you would not have any portability issue. You'll probably still have maintenance issue, but that's the problem of using C with OO-code: C is not aware of any object-oriented stuff going on.
My guess is that GCC is taking the opportunity to inline the call to LOL. I'll see if I can find a reference for you on this...
I see sechastain beat me to a more thorough description and I could not google up the reference I was looking for. So I'll leave it at that.
Functions defined in header files are in-lined on usage. They're not compiled as part of the library; instead where the call is made, the code of the function simply replaces the code of the call, and that is what gets compiled.
So, I'm not surprised to see that you are not finding a v-table entry (what would it point to?), and I'm not surprised to see that moving the function definition to a .cpp file suddenly makes things work. I'm a little surprised that creating an instance of the object with a call in the library makes a difference, though.
I'm not sure if it's haste on your part, but from the code provided IClassInterface does not necessarily contain LOL, only QueueClass. But you're casting to a IClassInterface pointer to make the LOL call.
If this example is simplified, and your actual inheritance tree uses multiple inheritance, this might be easily explained. When you do a typecast on an object pointer, the compiler needs to adjust the pointer so that the proper vtable is referenced. Because you're returning a void *, the compiler doesn't have the necessary information to do the adjustment.
Edit: There is no standard for C++ object layout, but for one example of how multiple inheritance might work see this article from Bjarne Stroustrup himself: http://www-plan.cs.colorado.edu/diwan/class-papers/mi.pdf
If this is indeed your problem, you might be able to fix it with one simple change:
IClassInterface *globalMember = new QueueClass();
The C++ compiler will do the necessary pointer modifications when it makes the assignment, so that the C function can return the correct pointer.

C++ Class or Struct compatiblity with C struct

Is it possible to write a C++ class or struct that is fully compatible with C struct. From compatibility I mean size of the object and memory locations of the variables. I know that its evil to use *(point*)&pnt or even (float*)&pnt (on a different case where variables are floats) but consider that its really required for the performance sake. Its not logical to use regular type casting operator million times per second.
Take this example
Class Point {
long x,y;
Point(long x, long y) {
this->x=x;
this->y=y;
}
float Distance(Point &point) {
return ....;
}
};
C version is a POD struct
struct point {
long x,y;
};
The cleanest was to do this is to inherit from the C struct:
struct point
{
long x, y;
};
class Point : public struct point
{
public:
Point(long x, long y)
{ this->x=x; this->y=y; }
float Distance(Point &point)
{ return ....; }
}
The C++ compiler guarantees the C style struct point has the same layout as with the C compiler. The C++ class Point inherits this layout for its base class portion (and since it adds no data or virtual members, it will have the same layout). A pointer to class Point will be converted to a pointer to struct point without a cast, since conversion to a base class pointer is always supported. So, you can use class Point objects and freely pass pointers to them to C functions expecting a pointer to struct point.
Of course, if there is already a C header file defining struct point, then you can just include this instead of repeating the definition.
Yes.
Use the same types in the same order in both languages
Make sure the class doesn't have anything virtual in it (so you don't get a vtable pointer stuck on the front)
Depending on the compilers used you may need to adjust the structure packing (usually with pragmas) to ensure compatibility.
(edit)
Also, you must take care to check the sizeof() the types with your compilers. For example, I've encountered a compiler that stored shorts as 32 bit values (when most will use 16). A more common case is that an int will usually be 32 bits on a 32-bit architecture and 64 bits on a 64-bit architecture.
POD applies to C++. You can have member functions. "A POD type in C++ is an aggregate class that contains only POD types as members, has no user-defined destructor, no user-defined copy assignment operator, and no nonstatic members of pointer-to-member type"
You should design your POD data structures so they have natural alignment, and then they can be passed between programs created by different compilers on different architectures. Natural alignment is where the memory offset of any member is divisible by the size of that member. IE: a float is located at an address that is divisible by 4, a double is on an address divisible by 8. If you declare a char followed by a float, most architectures will pad 3 bytes, but some could conceivably pad 1 byte. If you declare a float followed by a char, all compilers (I ought to add a source for this claim, sorry) will not pad at all.
C and C++ are different languages but it has always been the C++'s intention that you can have an implementation that supports both languages in a binary compatible fashion. Because they are different languages it is always a compiler implementation detail whether this is actually supported. Typically vendors who supply both a C and C++ compiler (or a single compiler with two modes) do support full compatibility for passing POD-structs (and pointers to POD-structs) between C++ code and C code.
Often, merely having a user-defined constructor breaks the guarantee although sometimes you can pass a pointer to such an object to a C function expecting a pointer to a struct with and identical data structure and it will work.
In short: check your compiler documentation.
Use the same "struct" in both C and C++. If you want to add methods in the C++ implementation, you can inherit the struct and the size should be the same as long as you don't add data members or virtual functions.
Be aware that if you have an empty struct or data members that are empty structs, they are different sizes in C and C++. In C, sizeof(empty-struct) == 0 although in C99, empty-structs are not supposed to be allowed (but may be supported anyway as a "compiler extension"). In C++, sizeof(empty-struct) != 0 (typical value is 1).
In addition to other answers, I would be sure not to put any access specifiers (public:, private: etc) into your C++ class / struct. IIRC the compiler is allowed to reorder blocks of member variables according to visibility, so that private: int a; pubic: int b; might get a and b swapped round. See eg this link: http://www.embedded.com/design/218600150?printable=true
I admit to being baffled as to why the definition of POD does not include a prohibition to this effect.
As long as your class doesn't exhibit some advanced traits of its kind, like growing something virtual, it should be pretty much the same struct.
Besides, you can change Class (which is invalid due to capitalization, anyway) to struct without doing any harm. Except for the members will turn public (they are private now).
But now that I think of your talking about type conversion… There's no way you can turn float into long representing the same value or vice versa by casting pointer type. I hope you only want it these pointers for the sake of moving stuff around.