Related
I want to detect during compile time (static assertion) whether a class meets both the following conditions:
Has an implicit default constructor (i.e., no user-defined default constructor).
Has at least one data member which is a pod (i.e., a member whose default initialization is to assume whatever random bytes was in its memory address).
[I hope I used the term pod correctly here]
The idea is to avoid working with objects with uninitialized members. I know there are different methods to do this during coding, but I also want a mechanism to detect this during compilation.
I tried using different std/boost functions, such as is_trivially_constructible, is_pod, but none of these provide the exact terms I need.
For example, let's say I have the following classes:
struct A
{
int a;
}
struct B
{
int* b;
}
struct C
{
bool c;
std::string c_str_;
}
struct D
{
D();
float d;
}
struct E
{
std::string e;
}
Assuming the function I need is called "has_primitive_and_implicit_ctor", I would like the output for each call to be as in the comments:
has_primitive_and_implicit_ctor<A>(); //true - A has at least one pod type member (int)
has_primitive_and_implicit_ctor<B>(); //true - A has at least one pod type member (pointer)
has_primitive_and_implicit_ctor<C>(); //true - A has at least one pod type member (bool), even though there is one non-pod member
has_primitive_and_implicit_ctor<D>(); //false - has a pod member(float), but a user defined ctor
has_primitive_and_implicit_ctor<E>(); //false - doesn't have a default ctor but has no pod members
Firstly, it seems to me like a broken design to expect from the user of a class to care about its member initialisation. You should make sure in the class itself that all its members are initialised, not somewhere else where it is used.
What you are looking for does not exist, and if it would, it would not even help you. The existence of an explicit constructor does not guarantee that a data member is initialised. On the other hand, with C++11 it is even possible to initialise data members without explicitly writing a constructor (using the brace syntax in the class declaration). Also you just seem to care about uninitialised POD members, but what about uninitialised non-POD members?
That said, compiler can generate warnings about uninitialised values, but often you have to enable this warning (e.g. -Wuninitialized option for gcc). Most compilers allow to force treating warnings as an error. In combination this can give you the desired effect even without specifically writing code to test for it, and it would also work for any uninitialised values, not only those in classes. Maybe this is the solution you are looking for.
I've got the classes:
struct A { // has no pointer members, POD - it's fine
int a, b;
char c;
};
struct B { // has no pointer members, but not POD - it's still fine
int a, b;
std::string s;
};
struct C { // has pointer members, it's not fine
int a,b;
char* cs;
};
I need to detect in compile time if any class has the properties of struct C, i.e. has pointers as members.
Short reasoning: I need to assure a user-defined type can be safely serialized and deserialized to some buffer by copying or assignment (e.g. struct A) or by providing user-defined serialize() and deserialize() methods in the class (e.g. struct B and struct c).
If B or C do not have these methods implemented, then compilation should fail, but if A does not have the methods, then the compilation should succeed.
Update:
The solution from Check if a class has a pointer data member works only for:
struct D {
int* p; // the pointer must be named 'p'
};
This issue is a different case. Can we reopen, please?
As of C++17, this is simply not possible.
You can test for a number of traits, but there is no trait that checks data members the way you need. std::is_pod<> (which will be deprecated in C++20) is the best you can get.
You can try magic_get.
It's a C++14 TMP solution for POD refrection, and it surely has its limitations (e.g. doesn't properly support references, bit fields, etc.), but you can enumerate PODs just the way you want (statically or dynamically), and then apply the std::is_pointer trait to check each field.
Example:
#include <iostream>
#include "boost/pfr/precise.hpp"
struct C {
int a, b;
char* cs;
};
int main() {
C var;
boost::pfr::for_each_field(var, [](const auto& field, std::size_t idx) {
std::cout << "field " << idx << ": is_pointer=" << std::is_pointer<std::remove_reference_t<decltype(field)>>() << '\n';
});
}
Output:
field 0: is_pointer=0
field 1: is_pointer=0
field 2: is_pointer=1
This is an XY problem.
The user that created the class needs to provide the serialize / deserialize functions according to the interface you provide. He knows if the class should be serializabile and if so how it should be done. It's the user responsibility, not yours. All you can do on your part is to provide an easy to use interface and tools to make it easier for the user to make a class serializabile.
Checking if a type has a pointer data member doesn't solve any issue and doesn't help in the slightest. You don't know if the pointer points to something the object owns exclusively, to something shared across objects, or points to a resource not-owned.
I would like to know if my following use of reinterpret_cast is undefined behaviour.
Given a template aggregate such as ...
template<typename T>
struct Container
{
Container(T* p) : ptr(p) { }
...
T* ptr;
};
... and a type hierarchy like ...
struct A { };
struct B : A { };
Is the following cast safe, given that B is a dynamic type of A ...
Container<B>* b = new Container<B>( new B() );
Container<A>* a = reinterpret_cast<Container<A>*>(b);
... in so far as that I can now safely use a->ptr and its (possibly virtual) members?
The code where I use this compiles and executes fine (Clang, OS X) but I'm concerned that I've placed a ticking bomb. I guess every instance of Container<T> shares the same layout and size so it shouldn't be a problem, right?
Looking at what cppreference.com says about reinterpret_cast, there seems to be a statement for legal use that covers what I'm trying to do ...
Type aliasing
When a pointer or reference to object of type T1 is reinterpret_cast (or C-style cast) to a pointer or reference to object of a different type T2, the cast always succeeds, but the resulting pointer or reference may only be accessed if both T1 and T2 are standard-layout types and one of the following is true:
...
T2 is an aggregate type or a union type which holds one of the aforementioned types as an element or non-static member (including, recursively, elements of subaggregates and non-static data members of the contained unions): this makes it safe to cast from the first member of a struct and from an element of a union to the struct/union that contains it.
I appreciate that it looks like I'm going the wrong way about this. That's not what I'm concerned about. I'd just like to know if what I'm doing is safe / legal or not. Thanks in advance for any help.
there seems to be a statement for legal use that covers what I'm trying to do ...
That's not what that exception says or means. That exception says that given
struct S { int i; } s;
you can use *reinterpret_cast<int *>(&s) to access s.i.
There is no similar exception for what you're trying to do. What you're trying to do is simply not valid in C++. Even the below is invalid:
struct S { int i; };
struct T { int i; };
int f(S s) { return ((T &) s).i; }
and compilers optimise based on the assumption that you don't write code like that.
For an actual example that fails at run-time with a current compiler:
#include <cstdlib>
struct S { int i; };
struct T { int i; };
void f(S *s, T *t) { int i = s->i; t->i++; if (s->i == i) std::abort(); }
Here, GCC optimises away the check s->i == i (GCC 4.9.2, with -O2 in the command-line options), and unconditionally calls std::abort(), because the compiler knows that s and t cannot possibly point to the same region of memory. Even though you might try to call it as
int main() { S s = { 0 }; f(&s, reinterpret_cast<T *>(&s)); }
Whether or not the type aliasing is legal according to the standard, you may have other issues.
I guess every instance of Container<T> shares the same layout and
size so it shouldn't be a problem, right?
Actually, not every instance of Container<T> shares the same layout! As explained in this question, template members are only created if they are used, so your Container<A> and Container<B> might have different memory layouts if different members are used for each type.
After seeing this question a few minutes ago, I wondered why the language designers allow it as it allows indirect modification of private data. As an example
class TestClass {
private:
int cc;
public:
TestClass(int i) : cc(i) {};
};
TestClass cc(5);
int* pp = (int*)&cc;
*pp = 70; // private member has been modified
I tested the above code and indeed the private data has been modified. Is there any explanation of why this is allowed to happen or this just an oversight in the language? It seems to directly undermine the use of private data members.
Because, as Bjarne puts it, C++ is designed to protect against Murphy, not Machiavelli.
In other words, it's supposed to protect you from accidents -- but if you go to any work at all to subvert it (such as using a cast) it's not even going to attempt to stop you.
When I think of it, I have a somewhat different analogy in mind: it's like the lock on a bathroom door. It gives you a warning that you probably don't want to walk in there right now, but it's trivial to unlock the door from the outside if you decide to.
Edit: as to the question #Xeo discusses, about why the standard says "have the same access control" instead of "have all public access control", the answer is long and a little tortuous.
Let's step back to the beginning and consider a struct like:
struct X {
int a;
int b;
};
C always had a few rules for a struct like this. One is that in an instance of the struct, the address of the struct itself has to equal the address of a, so you can cast a pointer to the struct to a pointer to int, and access a with well defined results. Another is that the members have to be arranged in the same order in memory as they are defined in the struct (though the compiler is free to insert padding between them).
For C++, there was an intent to maintain that, especially for existing C structs. At the same time, there was an apparent intent that if the compiler wanted to enforce private (and protected) at run-time, it should be easy to do that (reasonably efficiently).
Therefore, given something like:
struct Y {
int a;
int b;
private:
int c;
int d;
public:
int e;
// code to use `c` and `d` goes here.
};
The compiler should be required to maintain the same rules as C with respect to Y.a and Y.b. At the same time, if it's going to enforce access at run time, it may want to move all the public variables together in memory, so the layout would be more like:
struct Z {
int a;
int b;
int e;
private:
int c;
int d;
// code to use `c` and `d` goes here.
};
Then, when it's enforcing things at run-time, it can basically do something like if (offset > 3 * sizeof(int)) access_violation();
To my knowledge nobody's ever done this, and I'm not sure the rest of the standard really allows it, but there does seem to have been at least the half-formed germ of an idea along that line.
To enforce both of those, the C++98 said Y::a and Y::b had to be in that order in memory, and Y::a had to be at the beginning of the struct (i.e., C-like rules). But, because of the intervening access specifiers, Y::c and Y::e no longer had to be in order relative to each other. In other words, all the consecutive variables defined without an access specifier between them were grouped together, the compiler was free to rearrange those groups (but still had to keep the first one at the beginning).
That was fine until some jerk (i.e., me) pointed out that the way the rules were written had another little problem. If I wrote code like:
struct A {
int a;
public:
int b;
public:
int c;
public:
int d;
};
...you ended up with a little bit of self contradition. On one hand, this was still officially a POD struct, so the C-like rules were supposed to apply -- but since you had (admittedly meaningless) access specifiers between the members, it also gave the compiler permission to rearrange the members, thus breaking the C-like rules they intended.
To cure that, they re-worded the standard a little so it would talk about the members all having the same access, rather than about whether or not there was an access specifier between them. Yes, they could have just decreed that the rules would only apply to public members, but it would appear that nobody saw anything to be gained from that. Given that this was modifying an existing standard with lots of code that had been in use for quite a while, the opted for the smallest change they could make that would still cure the problem.
Because of backwards-compatability with C, where you can do the same thing.
For all people wondering, here's why this is not UB and is actually allowed by the standard:
First, TestClass is a standard-layout class (§9 [class] p7):
A standard-layout class is a class that:
has no non-static data members of type non-standard-layout class (or array of such types) or reference, // OK: non-static data member is of type 'int'
has no virtual functions (10.3) and no virtual base classes (10.1), // OK
has the same access control (Clause 11) for all non-static data members, // OK, all non-static data members (1) are 'private'
has no non-standard-layout base classes, // OK, no base classes
either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and // OK, no base classes again
has no base classes of the same type as the first non-static data member. // OK, no base classes again
And with that, you can are allowed to reinterpret_cast the class to the type of its first member (§9.2 [class.mem] p20):
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa.
In your case, the C-style (int*) cast resolves to a reinterpret_cast (§5.4 [expr.cast] p4).
A good reason is to allow compatibility with C but extra access safety on the C++ layer.
Consider:
struct S {
#ifdef __cplusplus
private:
#endif // __cplusplus
int i, j;
#ifdef __cplusplus
public:
int get_i() const { return i; }
int get_j() const { return j; }
#endif // __cplusplus
};
By requiring that the C-visible S and the C++-visible S be layout-compatible, S can be used across the language boundary with the C++ side having greater access safety. The reinterpret_cast access safety subversion is an unfortunate but necessary corollary.
As an aside, the restriction on having all members with the same access control is because the implementation is permitted to rearrange members relative to members with different access control. Presumably some implementations put members with the same access control together, for the sake of tidiness; it could also be used to reduce padding, although I don't know of any compiler that does that.
The whole purpose of reinterpret_cast (and a C style cast is even more powerful than a reinterpret_cast) is to provide an escape path around safety measures.
The compiler would have given you an error if you had tried int *pp = &cc.cc, the compiler would have told you that you cannot access a private member.
In your code you are reinterpreting the address of cc as a pointer to an int. You wrote it the C style way, the C++ style way would have been int* pp = reinterpret_cast<int*>(&cc);. The reinterpret_cast always is a warning that you are doing a cast between two pointers that are not related. In such a case you must make sure that you are doing right. You must know the underlying memory (layout). The compiler does not prevent you from doing so, because this if often needed.
When doing the cast you throw away all knowledge about the class. From now on the compiler only sees an int pointer. Of course you can access the memory the pointer points to. In your case, on your platform the compiler happened to put cc in the first n bytes of a TestClass object, so a TestClass pointer also points to the cc member.
This is because you are manipulating the memory where your class is located in memory. In your case it just happen to store the private member at this memory location so you change it. It is not a very good idea to do because you do now know how the object will be stored in memory.
Instead of having to remember to initialize a simple 'C' structure, I might derive from it and zero it in the constructor like this:
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct()
{
memset(this, 0, sizeof(MY_STRUCT));
}
};
This trick is often used to initialize Win32 structures and can sometimes set the ubiquitous cbSize member.
Now, as long as there isn't a virtual function table for the memset call to destroy, is this a safe practice?
You can simply value-initialize the base, and all its members will be zero'ed out. This is guaranteed
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct():MY_STRUCT() { }
};
For this to work, there should be no user declared constructor in the base class, like in your example.
No nasty memset for that. It's not guaranteed that memset works in your code, even though it should work in practice.
PREAMBLE:
While my answer is still Ok, I find litb's answer quite superior to mine because:
It teaches me a trick that I did not know (litb's answers usually have this effect, but this is the first time I write it down)
It answers exactly the question (that is, initializing the original struct's part to zero)
So please, consider litb's answer before mine. In fact, I suggest the question's author to consider litb's answer as the right one.
Original answer
Putting a true object (i.e. std::string) etc. inside will break, because the true object will be initialized before the memset, and then, overwritten by zeroes.
Using the initialization list doesn't work for g++ (I'm surprised...). Initialize it instead in the CMyStruct constructor body. It will be C++ friendly:
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct() { n1 = 0 ; n2 = 0 ; }
};
P.S.: I assumed you did have no control over MY_STRUCT, of course. With control, you would have added the constructor directly inside MY_STRUCT and forgotten about inheritance. Note that you can add non-virtual methods to a C-like struct, and still have it behave as a struct.
EDIT: Added missing parenthesis, after Lou Franco's comment. Thanks!
EDIT 2 : I tried the code on g++, and for some reason, using the initialization list does not work. I corrected the code using the body constructor. The solution is still valid, though.
Please reevaluate my post, as the original code was changed (see changelog for more info).
EDIT 3 : After reading Rob's comment, I guess he has a point worthy of discussion: "Agreed, but this could be an enormous Win32 structure which may change with a new SDK, so a memset is future proof."
I disagree: Knowing Microsoft, it won't change because of their need for perfect backward compatibility. They will create instead an extended MY_STRUCTEx struct with the same initial layout as MY_STRUCT, with additionnal members at the end, and recognizable through a "size" member variable like the struct used for a RegisterWindow, IIRC.
So the only valid point remaining from Rob's comment is the "enormous" struct. In this case, perhaps a memset is more convenient, but you will have to make MY_STRUCT a variable member of CMyStruct instead of inheriting from it.
I see another hack, but I guess this would break because of possible struct alignment problem.
EDIT 4: Please take a look at Frank Krueger's solution. I can't promise it's portable (I guess it is), but it is still interesting from a technical viewpoint because it shows one case where, in C++, the "this" pointer "address" moves from its base class to its inherited class.
Much better than a memset, you can use this little trick instead:
MY_STRUCT foo = { 0 };
This will initialize all members to 0 (or their default value iirc), no need to specifiy a value for each.
This would make me feel much safer as it should work even if there is a vtable (or the compiler will scream).
memset(static_cast<MY_STRUCT*>(this), 0, sizeof(MY_STRUCT));
I'm sure your solution will work, but I doubt there are any guarantees to be made when mixing memset and classes.
This is a perfect example of porting a C idiom to C++ (and why it might not always work...)
The problem you will have with using memset is that in C++, a struct and a class are exactly the same thing except that by default, a struct has public visibility and a class has private visibility.
Thus, what if later on, some well meaning programmer changes MY_STRUCT like so:
struct MY_STRUCT
{
int n1;
int n2;
// Provide a default implementation...
virtual int add() {return n1 + n2;}
};
By adding that single function, your memset might now cause havoc.
There is a detailed discussion in comp.lang.c+
The examples have "unspecified behaviour".
For a non-POD, the order by which the compiler lays out an object (all bases classes and members) is unspecified (ISO C++ 10/3). Consider the following:
struct A {
int i;
};
class B : public A { // 'B' is not a POD
public:
B ();
private:
int j;
};
This can be laid out as:
[ int i ][ int j ]
Or as:
[ int j ][ int i ]
Therefore, using memset directly on the address of 'this' is very much unspecified behaviour. One of the answers above, at first glance looks to be safer:
memset(static_cast<MY_STRUCT*>(this), 0, sizeof(MY_STRUCT));
I believe, however, that strictly speaking this too results in unspecified behaviour. I cannot find the normative text, however the note in 10/5 says: "A base class subobject may have a layout (3.7) different from the layout of a most derived object of the same type".
As a result, I compiler could perform space optimizations with the different members:
struct A {
char c1;
};
struct B {
char c2;
char c3;
char c4;
int i;
};
class C : public A, public B
{
public:
C ()
: c1 (10);
{
memset(static_cast<B*>(this), 0, sizeof(B));
}
};
Can be laid out as:
[ char c1 ] [ char c2, char c3, char c4, int i ]
On a 32 bit system, due to alighments etc. for 'B', sizeof(B) will most likely be 8 bytes. However, sizeof(C) can also be '8' bytes if the compiler packs the data members. Therefore the call to memset might overwrite the value given to 'c1'.
Precise layout of a class or structure is not guaranteed in C++, which is why you should not make assumptions about the size of it from the outside (that means if you're not a compiler).
Probably it works, until you find a compiler on which it doesn't, or you throw some vtable into the mix.
If you already have a constructor, why not just initialize it there with n1=0; n2=0; -- that's certainly the more normal way.
Edit: Actually, as paercebal has shown, ctor initialization is even better.
My opinion is no. I'm not sure what it gains either.
As your definition of CMyStruct changes and you add/delete members, this can lead to bugs. Easily.
Create a constructor for CMyStruct that takes a MyStruct has a parameter.
CMyStruct::CMyStruct(MyStruct &)
Or something of that sought. You can then initialize a public or private 'MyStruct' member.
From an ISO C++ viewpoint, there are two issues:
(1) Is the object a POD? The acronym stands for Plain Old Data, and the standard enumerates what you can't have in a POD (Wikipedia has a good summary). If it's not a POD, you can't memset it.
(2) Are there members for which all-bits-zero is invalid ? On Windows and Unix, the NULL pointer is all bits zero; it need not be. Floating point 0 has all bits zero in IEEE754, which is quite common, and on x86.
Frank Kruegers tip addresses your concerns by restricting the memset to the POD base of the non-POD class.
Try this - overload new.
EDIT: I should add - This is safe because the memory is zeroed before any constructors are called. Big flaw - only works if object is dynamically allocated.
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct()
{
// whatever
}
void* new(size_t size)
{
// dangerous
return memset(malloc(size),0,size);
// better
if (void *p = malloc(size))
{
return (memset(p, 0, size));
}
else
{
throw bad_alloc();
}
}
void delete(void *p, size_t size)
{
free(p);
}
};
If MY_STRUCT is your code, and you are happy using a C++ compiler, you can put the constructor there without wrapping in a class:
struct MY_STRUCT
{
int n1;
int n2;
MY_STRUCT(): n1(0), n2(0) {}
};
I'm not sure about efficiency, but I hate doing tricks when you haven't proved efficiency is needed.
Comment on litb's answer (seems I'm not yet allowed to comment directly):
Even with this nice C++-style solution you have to be very careful that you don't apply this naively to a struct containing a non-POD member.
Some compilers then don't initialize correctly anymore.
See this answer to a similar question.
I personally had the bad experience on VC2008 with an additional std::string.
What I do is use aggregate initialization, but only specifying initializers for members I care about, e.g:
STARTUPINFO si = {
sizeof si, /*cb*/
0, /*lpReserved*/
0, /*lpDesktop*/
"my window" /*lpTitle*/
};
The remaining members will be initialized to zeros of the appropriate type (as in Drealmer's post). Here, you are trusting Microsoft not to gratuitously break compatibility by adding new structure members in the middle (a reasonable assumption). This solution strikes me as optimal - one statement, no classes, no memset, no assumptions about the internal representation of floating point zero or null pointers.
I think the hacks involving inheritance are horrible style. Public inheritance means IS-A to most readers. Note also that you're inheriting from a class which isn't designed to be a base. As there's no virtual destructor, clients who delete a derived class instance through a pointer to base will invoke undefined behaviour.
I assume the structure is provided to you and cannot be modified. If you can change the structure, then the obvious solution is adding a constructor.
Don't over engineer your code with C++ wrappers when all you want is a simple macro to initialise your structure.
#include <stdio.h>
#define MY_STRUCT(x) MY_STRUCT x = {0}
struct MY_STRUCT
{
int n1;
int n2;
};
int main(int argc, char *argv[])
{
MY_STRUCT(s);
printf("n1(%d),n2(%d)\n", s.n1, s.n2);
return 0;
}
It's a bit of code, but it's reusable; include it once and it should work for any POD. You can pass an instance of this class to any function expecting a MY_STRUCT, or use the GetPointer function to pass it into a function that will modify the structure.
template <typename STR>
class CStructWrapper
{
private:
STR MyStruct;
public:
CStructWrapper() { STR temp = {}; MyStruct = temp;}
CStructWrapper(const STR &myStruct) : MyStruct(myStruct) {}
operator STR &() { return MyStruct; }
operator const STR &() const { return MyStruct; }
STR *GetPointer() { return &MyStruct; }
};
CStructWrapper<MY_STRUCT> myStruct;
CStructWrapper<ANOTHER_STRUCT> anotherStruct;
This way, you don't have to worry about whether NULLs are all 0, or floating point representations. As long as STR is a simple aggregate type, things will work. When STR is not a simple aggregate type, you'll get a compile-time error, so you won't have to worry about accidentally misusing this. Also, if the type contains something more complex, as long as it has a default constructor, you're ok:
struct MY_STRUCT2
{
int n1;
std::string s1;
};
CStructWrapper<MY_STRUCT2> myStruct2; // n1 is set to 0, s1 is set to "";
On the downside, it's slower since you're making an extra temporary copy, and the compiler will assign each member to 0 individually, instead of one memset.