I would like to know if my following use of reinterpret_cast is undefined behaviour.
Given a template aggregate such as ...
template<typename T>
struct Container
{
Container(T* p) : ptr(p) { }
...
T* ptr;
};
... and a type hierarchy like ...
struct A { };
struct B : A { };
Is the following cast safe, given that B is a dynamic type of A ...
Container<B>* b = new Container<B>( new B() );
Container<A>* a = reinterpret_cast<Container<A>*>(b);
... in so far as that I can now safely use a->ptr and its (possibly virtual) members?
The code where I use this compiles and executes fine (Clang, OS X) but I'm concerned that I've placed a ticking bomb. I guess every instance of Container<T> shares the same layout and size so it shouldn't be a problem, right?
Looking at what cppreference.com says about reinterpret_cast, there seems to be a statement for legal use that covers what I'm trying to do ...
Type aliasing
When a pointer or reference to object of type T1 is reinterpret_cast (or C-style cast) to a pointer or reference to object of a different type T2, the cast always succeeds, but the resulting pointer or reference may only be accessed if both T1 and T2 are standard-layout types and one of the following is true:
...
T2 is an aggregate type or a union type which holds one of the aforementioned types as an element or non-static member (including, recursively, elements of subaggregates and non-static data members of the contained unions): this makes it safe to cast from the first member of a struct and from an element of a union to the struct/union that contains it.
I appreciate that it looks like I'm going the wrong way about this. That's not what I'm concerned about. I'd just like to know if what I'm doing is safe / legal or not. Thanks in advance for any help.
there seems to be a statement for legal use that covers what I'm trying to do ...
That's not what that exception says or means. That exception says that given
struct S { int i; } s;
you can use *reinterpret_cast<int *>(&s) to access s.i.
There is no similar exception for what you're trying to do. What you're trying to do is simply not valid in C++. Even the below is invalid:
struct S { int i; };
struct T { int i; };
int f(S s) { return ((T &) s).i; }
and compilers optimise based on the assumption that you don't write code like that.
For an actual example that fails at run-time with a current compiler:
#include <cstdlib>
struct S { int i; };
struct T { int i; };
void f(S *s, T *t) { int i = s->i; t->i++; if (s->i == i) std::abort(); }
Here, GCC optimises away the check s->i == i (GCC 4.9.2, with -O2 in the command-line options), and unconditionally calls std::abort(), because the compiler knows that s and t cannot possibly point to the same region of memory. Even though you might try to call it as
int main() { S s = { 0 }; f(&s, reinterpret_cast<T *>(&s)); }
Whether or not the type aliasing is legal according to the standard, you may have other issues.
I guess every instance of Container<T> shares the same layout and
size so it shouldn't be a problem, right?
Actually, not every instance of Container<T> shares the same layout! As explained in this question, template members are only created if they are used, so your Container<A> and Container<B> might have different memory layouts if different members are used for each type.
Simple question: How do I get this to work?
struct A {
double whatever;
std::unordered_map<std::string, A> mapToMoreA;
}
g++ error: std::pair<_T1, _T2>::second has incomplete type
As far as I understand, when instantiating the map, the compiler needs to know the size of A, but it doesn't know this because the map is declared in A's declaration, so is the only way to get around this to use pointers to A (don't feel like doing that)?
Most of the time it will depend on the container implementation details (more precisely, on what gets instantiated at the point of container declaration and what doesn't). Apparently, std::unordered_map implementation requires the types to be complete. At the same time GCC's implementation of std::map compiles perfectly fine with incomplete type.
To illustrate the source of such difference, consider the following example. Let's say we decided to make our own naive implementation of std::vector-like functionality and declared our vector class as follows
template <typename T> class my_vector {
T *begin;
T *end;
...
};
As long as our class definition contains only pointers to T, the type T is not required to be complete for the class definition itself. We can instantiate my_vector itself for an incomplete T without any problems
class X;
my_vector<X> v; // OK
The "completeness" of the type would be required later, when we begin to use (and therefore instantiate) the individual methods of my_vector.
However, if for some reason we decide to include a direct instance of T into our vector class, things will chahge
template <typename T>
class my_vector {
T *begin;
T *end;
T dummy_element;
...
};
Now the completeness of T will be required very early, at the point of instantiation of my_vector itself
class X;
my_vector<X> v; // ERROR, incomplete type
Something like that must be happening in your case. The definition of unordered_map you are dealing with somehow contains a direct instance of A. Which is the reason why it is impossible to instantiate (obviously, you would end up with infinitely recursive type in that case).
A better thought through implementation of unordered_map would make sure not to include A into itself as a direct member. Such implementation would not require A to be complete. As you noted yourself, Boost's implementation of unordered_map is designed better in this regard.
I don't know of any STL containers other than smart pointers that work with incomplete types. You can use a wrapper struct however if you don't want to use pointers:
struct A {
struct B { double whatever; };
std::unordered_map<std::string, B> mapToB;
};
Edit: Here is a pointer alternative if the above doesn't meet your use case.
struct A {
double whatever;
std::unordered_map<std::string, std::unique_ptr<A>> mapToMoreA;
};
You can also just use boost::unordered_map which not only supports incomplete types but also has far greater debug performance in Visual Studio as Microsoft's implementation of std::unordered_map is incredibly inefficient due to excessive iterator debugging checks. I am unaware of any performance concerns on gcc for either container.
Boost.Variant has a handy utility explicitly for this purpose – boost::recusive_wrapper<>. The following should work:
struct A {
double whatever;
std::unordered_map<std::string, boost::recursive_wrapper<A>> mapToMoreA;
};
The only notable drawback is that Boost.Variant has not yet been updated to support C++11 move semantics. Update: added in Boost 1.56.
If having the map hold pointers isn't acceptable, perhaps this will work for you:
struct A {
struct hidden;
std::unique_ptr<hidden> pimpl;
};
struct A::hidden {
double whatever;
std::unordered_map<std::string, A> mapToMoreA;
};
In C++ you usually use pointers, which have predefined constant size, for incomplete types:
This of course changes how you use the map: you'll have to dereference with the * or -> operators to access members and have to delete the pointers at some point.
struct A
{
double bla;
std::map<std::string, A*> mapToMoreA;
};
Member functions of A should be split into a prototype inside the struct block and implemented later, otherwise A and its members are not yet completely defined:
struct A
{
double bla;
std::map<std::string, A*> mapToMoreA;
void doStuff(const std::string& str);
};
void A::doStuff(const std::string& str)
{
mapToMoreA[str] = new A();
}
Or use a pointer to the map. The pointer must be of type void* in this case (can be hidden behind a set of functions). Maybe there are alternatives to std::unordered_map that can cope with incomplete value types.
I think you can just forward declare struct A; prior to its definition and the compiler should be happy.
EDIT: So after being downvoted several times, I wrote the following to see what I was missing:
#include <boost/unordered_map.hpp>
#include <string>
#include <iostream>
struct A;
struct A {
double whatever;
boost::unordered_map<std::string, A> mapToMoreA;
};
int main(void)
{
A b;
b.whatever = 2.5;
b.mapToMoreA["abc"] = b;
std::cerr << b.mapToMoreA["abc"].whatever << std::endl;
return 0;
}
This compiles fine using g++ 4.2.1 on my mac, and prints out "2.5" when it's run (as expected).
Sorry that I don't have unordered_map without boost. Is that the issue? (i.e., does std::unordered_map somehow place more constraints on the compiler than boost does?) Otherwise, I'm not sure what I'm missing here about the question. Those downvoting this, please enlighten me with comments. Thanks!
I've found a strange looking piece of code in a project I have to maintain. There's an empty array member of a class which doesn't lead to an compiler error. I've tested some variations of such a code with MSVC 10.0:
template<class T> struct A {
int i[];
}; // warning C4200: nonstandard extension used : zero-sized array in struct/union
template<class T> struct B { static int i[]; };
template<class T> int B<T>::i[];
struct C {
int i[];
}; //warning C4200: nonstandard extension used : zero-sized array in struct/union
template<class T> struct D { static int i[]; };
template<class T> int D<T>::i[4];
template<> int D<int>::i[] = { 1 };
int main()
{
A<void> a;
B<void> b;
C c;
D<void> d0;
D<int> d1;
a.i[0] = 0; // warning C4739: reference to variable 'a' exceeds its storage space
b.i[0] = 0; // warning C4789: destination of memory copy is too small
c.i[0] = 0; // warning C4739: reference to variable 'c' exceeds its storage space
int i[]; // error C2133: 'i' : unknown size
d0.i[0] = 0; // ok
d0.i[1] = 0; // ok
return 0;
}
The error message at int i[] is absolutely sensible to me. The code which is shown with class D is well-formed standard C++. But what's about the classes A, B and C? What kind of types are the member variables int i[] in this classes?
EDIT:
your doubt is explained by the definition of the extension to the language, which allows for zero-sized arrays at the end of structs/unions. I have not tried it, but if you declare another member after the zero-sized array, it should fail.
so, if you allocate a variable on the stack, you have to know its size; the exception to the rule is when allocating an array at the end of a struct/union, where some C-typical trickery is possible.
In c++ this raises a warning because the default copy constructor and assignment operator will probably not work.
PREVIOUS ANSWER:
The compiler warns you about the fact that you are trying to define an array with zero size. This is not allowed in standard C/C++.
Let's see the differences class by class.
In class D:
template<class T> struct D { static int i[]; };
it works because you are just declaring the type of a static member variable. For this to link, you need also defining the actual array, in a definition statement like you do:
template<> int D<int>::i[] = { 1 };
here you also specify the size of the array through the initializer.
With class B, you are doing something similar, but the definition is:
template<class T> int B<T>::i[];
i.e., you don't specify the size and get the warning.
With class A, more of the same, you are defining a member variable of type array without the size.
Good one. Just to be certain, you are wondering why the compiler isn't flagging it as an error right? In that case, I think this problem is unpredictable across compilers but I'm aware of this happening on MSVC all the time.
http://support.microsoft.com/kb/98409
Let me see if I can explain it like they did. If I were to declare a struct with an empty array like this,
struct a
{
int x;
char empty[];
};
the compiler might allocate 4 bytes for x and probably another 4 bytes for the char pointer. empty will contain the address 4 bytes past the start of struct a.
Since it is a character array of no length, trying to access it would be an error since there is no trailing 0 to signify the end of the string.
I could choose to initialize the struct later to point to the start of an actual string to overcome this error.
struct a myStruct = { 1, "hello world"}; // empty now points to the start of "hello world"
Since a struct is basically a class, turns out you can do the same thing with a class if you make sure its an aggregate and not a full class.
So there ya go. MSVC compilers treat arrays with no fixed sized as a pointer when declared within a struct/class. Remember that class definitions are merely just declarations. The compiler doesn't allocate space for them until you create an instance for it. When you start to think about it, it sorta makes since. How will the compiler know if you plan to allocate storage for it later. It becomes a run-time artifact but the compiler was still smart enough to warn you about the problem.
I want to initialize constant in child-class, instead of base class. And use it to get rid of dynamic memory allocation (I know array sizes already, and there will be a few child-classes with different constants).
So I try:
class A {
public:
const int x;
A() : x(0) {}
A(int x) : x(x) {}
void f() {
double y[this->x];
}
};
class B : A {
B() : A(2) {}
};
Pretty simple, but compiler says:
error C2057: expected constant expression
How can I say to compiler, that it is really a constant?
It isn't a constant though. It can still be modified by the constructor. Only a compile time constant is allowed for the size of an array. When the compiler says "constant expression", it is not meaning an expression which returns a constant value, but an constant, such as "52" or "45" or something along those lines.
Use std::vector instead.
EDIT: In response to "I know array sizes already, and there will be a few child-classes with different constants"
The only way to do that is to use a template.
template<size_t x>
class A {
public:
void f() {
double y[x];
}
};
typedef A<2> B;
The behaviour you expect could be achieved using the following template.
Note that this is actually unreliable, disgusting and could be used only as "a sample". Use std::vector instead.
template <size_t a = 0>
class A {
public:
A() { }
void f() {
int y[a];
y[0] = 5;
}
};
class B : A<2> {
B() { }
};
void main() {
A<1> a;
a.f();
// Undefined behaviour - creating an array of size 0
// At least, MSVS2008 treats it as an error :)
// A<0> a_;
}
There's "constant", and then there's "constant". If you want to allocate an array on the stack like that, the compiler needs the length of the array at compile time, and based on what you've given there it can't figure that out. Interestingly, gcc supports an extension (not supported in standard C++) that allows for stack allocation for variable lengths.
I don't know if it will work for your purposes, but one possibility would be to make it a template parameter:
template <int size>
class A {
double y[size];
};
In this case, you'd probably want to create an instance of A in B instead of using inheritance.
The other obvious possibility would be to use a tr1::array object instead. This is is also a template, so the idea is pretty much the same, but it's already written, tested and working so you can avoid all that. If your compiler doesn't supply TR1 classes, Boost has a mostly conforming implementation (boost::array).
Instead of having to remember to initialize a simple 'C' structure, I might derive from it and zero it in the constructor like this:
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct()
{
memset(this, 0, sizeof(MY_STRUCT));
}
};
This trick is often used to initialize Win32 structures and can sometimes set the ubiquitous cbSize member.
Now, as long as there isn't a virtual function table for the memset call to destroy, is this a safe practice?
You can simply value-initialize the base, and all its members will be zero'ed out. This is guaranteed
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct():MY_STRUCT() { }
};
For this to work, there should be no user declared constructor in the base class, like in your example.
No nasty memset for that. It's not guaranteed that memset works in your code, even though it should work in practice.
PREAMBLE:
While my answer is still Ok, I find litb's answer quite superior to mine because:
It teaches me a trick that I did not know (litb's answers usually have this effect, but this is the first time I write it down)
It answers exactly the question (that is, initializing the original struct's part to zero)
So please, consider litb's answer before mine. In fact, I suggest the question's author to consider litb's answer as the right one.
Original answer
Putting a true object (i.e. std::string) etc. inside will break, because the true object will be initialized before the memset, and then, overwritten by zeroes.
Using the initialization list doesn't work for g++ (I'm surprised...). Initialize it instead in the CMyStruct constructor body. It will be C++ friendly:
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct() { n1 = 0 ; n2 = 0 ; }
};
P.S.: I assumed you did have no control over MY_STRUCT, of course. With control, you would have added the constructor directly inside MY_STRUCT and forgotten about inheritance. Note that you can add non-virtual methods to a C-like struct, and still have it behave as a struct.
EDIT: Added missing parenthesis, after Lou Franco's comment. Thanks!
EDIT 2 : I tried the code on g++, and for some reason, using the initialization list does not work. I corrected the code using the body constructor. The solution is still valid, though.
Please reevaluate my post, as the original code was changed (see changelog for more info).
EDIT 3 : After reading Rob's comment, I guess he has a point worthy of discussion: "Agreed, but this could be an enormous Win32 structure which may change with a new SDK, so a memset is future proof."
I disagree: Knowing Microsoft, it won't change because of their need for perfect backward compatibility. They will create instead an extended MY_STRUCTEx struct with the same initial layout as MY_STRUCT, with additionnal members at the end, and recognizable through a "size" member variable like the struct used for a RegisterWindow, IIRC.
So the only valid point remaining from Rob's comment is the "enormous" struct. In this case, perhaps a memset is more convenient, but you will have to make MY_STRUCT a variable member of CMyStruct instead of inheriting from it.
I see another hack, but I guess this would break because of possible struct alignment problem.
EDIT 4: Please take a look at Frank Krueger's solution. I can't promise it's portable (I guess it is), but it is still interesting from a technical viewpoint because it shows one case where, in C++, the "this" pointer "address" moves from its base class to its inherited class.
Much better than a memset, you can use this little trick instead:
MY_STRUCT foo = { 0 };
This will initialize all members to 0 (or their default value iirc), no need to specifiy a value for each.
This would make me feel much safer as it should work even if there is a vtable (or the compiler will scream).
memset(static_cast<MY_STRUCT*>(this), 0, sizeof(MY_STRUCT));
I'm sure your solution will work, but I doubt there are any guarantees to be made when mixing memset and classes.
This is a perfect example of porting a C idiom to C++ (and why it might not always work...)
The problem you will have with using memset is that in C++, a struct and a class are exactly the same thing except that by default, a struct has public visibility and a class has private visibility.
Thus, what if later on, some well meaning programmer changes MY_STRUCT like so:
struct MY_STRUCT
{
int n1;
int n2;
// Provide a default implementation...
virtual int add() {return n1 + n2;}
};
By adding that single function, your memset might now cause havoc.
There is a detailed discussion in comp.lang.c+
The examples have "unspecified behaviour".
For a non-POD, the order by which the compiler lays out an object (all bases classes and members) is unspecified (ISO C++ 10/3). Consider the following:
struct A {
int i;
};
class B : public A { // 'B' is not a POD
public:
B ();
private:
int j;
};
This can be laid out as:
[ int i ][ int j ]
Or as:
[ int j ][ int i ]
Therefore, using memset directly on the address of 'this' is very much unspecified behaviour. One of the answers above, at first glance looks to be safer:
memset(static_cast<MY_STRUCT*>(this), 0, sizeof(MY_STRUCT));
I believe, however, that strictly speaking this too results in unspecified behaviour. I cannot find the normative text, however the note in 10/5 says: "A base class subobject may have a layout (3.7) different from the layout of a most derived object of the same type".
As a result, I compiler could perform space optimizations with the different members:
struct A {
char c1;
};
struct B {
char c2;
char c3;
char c4;
int i;
};
class C : public A, public B
{
public:
C ()
: c1 (10);
{
memset(static_cast<B*>(this), 0, sizeof(B));
}
};
Can be laid out as:
[ char c1 ] [ char c2, char c3, char c4, int i ]
On a 32 bit system, due to alighments etc. for 'B', sizeof(B) will most likely be 8 bytes. However, sizeof(C) can also be '8' bytes if the compiler packs the data members. Therefore the call to memset might overwrite the value given to 'c1'.
Precise layout of a class or structure is not guaranteed in C++, which is why you should not make assumptions about the size of it from the outside (that means if you're not a compiler).
Probably it works, until you find a compiler on which it doesn't, or you throw some vtable into the mix.
If you already have a constructor, why not just initialize it there with n1=0; n2=0; -- that's certainly the more normal way.
Edit: Actually, as paercebal has shown, ctor initialization is even better.
My opinion is no. I'm not sure what it gains either.
As your definition of CMyStruct changes and you add/delete members, this can lead to bugs. Easily.
Create a constructor for CMyStruct that takes a MyStruct has a parameter.
CMyStruct::CMyStruct(MyStruct &)
Or something of that sought. You can then initialize a public or private 'MyStruct' member.
From an ISO C++ viewpoint, there are two issues:
(1) Is the object a POD? The acronym stands for Plain Old Data, and the standard enumerates what you can't have in a POD (Wikipedia has a good summary). If it's not a POD, you can't memset it.
(2) Are there members for which all-bits-zero is invalid ? On Windows and Unix, the NULL pointer is all bits zero; it need not be. Floating point 0 has all bits zero in IEEE754, which is quite common, and on x86.
Frank Kruegers tip addresses your concerns by restricting the memset to the POD base of the non-POD class.
Try this - overload new.
EDIT: I should add - This is safe because the memory is zeroed before any constructors are called. Big flaw - only works if object is dynamically allocated.
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct()
{
// whatever
}
void* new(size_t size)
{
// dangerous
return memset(malloc(size),0,size);
// better
if (void *p = malloc(size))
{
return (memset(p, 0, size));
}
else
{
throw bad_alloc();
}
}
void delete(void *p, size_t size)
{
free(p);
}
};
If MY_STRUCT is your code, and you are happy using a C++ compiler, you can put the constructor there without wrapping in a class:
struct MY_STRUCT
{
int n1;
int n2;
MY_STRUCT(): n1(0), n2(0) {}
};
I'm not sure about efficiency, but I hate doing tricks when you haven't proved efficiency is needed.
Comment on litb's answer (seems I'm not yet allowed to comment directly):
Even with this nice C++-style solution you have to be very careful that you don't apply this naively to a struct containing a non-POD member.
Some compilers then don't initialize correctly anymore.
See this answer to a similar question.
I personally had the bad experience on VC2008 with an additional std::string.
What I do is use aggregate initialization, but only specifying initializers for members I care about, e.g:
STARTUPINFO si = {
sizeof si, /*cb*/
0, /*lpReserved*/
0, /*lpDesktop*/
"my window" /*lpTitle*/
};
The remaining members will be initialized to zeros of the appropriate type (as in Drealmer's post). Here, you are trusting Microsoft not to gratuitously break compatibility by adding new structure members in the middle (a reasonable assumption). This solution strikes me as optimal - one statement, no classes, no memset, no assumptions about the internal representation of floating point zero or null pointers.
I think the hacks involving inheritance are horrible style. Public inheritance means IS-A to most readers. Note also that you're inheriting from a class which isn't designed to be a base. As there's no virtual destructor, clients who delete a derived class instance through a pointer to base will invoke undefined behaviour.
I assume the structure is provided to you and cannot be modified. If you can change the structure, then the obvious solution is adding a constructor.
Don't over engineer your code with C++ wrappers when all you want is a simple macro to initialise your structure.
#include <stdio.h>
#define MY_STRUCT(x) MY_STRUCT x = {0}
struct MY_STRUCT
{
int n1;
int n2;
};
int main(int argc, char *argv[])
{
MY_STRUCT(s);
printf("n1(%d),n2(%d)\n", s.n1, s.n2);
return 0;
}
It's a bit of code, but it's reusable; include it once and it should work for any POD. You can pass an instance of this class to any function expecting a MY_STRUCT, or use the GetPointer function to pass it into a function that will modify the structure.
template <typename STR>
class CStructWrapper
{
private:
STR MyStruct;
public:
CStructWrapper() { STR temp = {}; MyStruct = temp;}
CStructWrapper(const STR &myStruct) : MyStruct(myStruct) {}
operator STR &() { return MyStruct; }
operator const STR &() const { return MyStruct; }
STR *GetPointer() { return &MyStruct; }
};
CStructWrapper<MY_STRUCT> myStruct;
CStructWrapper<ANOTHER_STRUCT> anotherStruct;
This way, you don't have to worry about whether NULLs are all 0, or floating point representations. As long as STR is a simple aggregate type, things will work. When STR is not a simple aggregate type, you'll get a compile-time error, so you won't have to worry about accidentally misusing this. Also, if the type contains something more complex, as long as it has a default constructor, you're ok:
struct MY_STRUCT2
{
int n1;
std::string s1;
};
CStructWrapper<MY_STRUCT2> myStruct2; // n1 is set to 0, s1 is set to "";
On the downside, it's slower since you're making an extra temporary copy, and the compiler will assign each member to 0 individually, instead of one memset.