Is this C++ structure initialization trick safe? - c++

Instead of having to remember to initialize a simple 'C' structure, I might derive from it and zero it in the constructor like this:
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct()
{
memset(this, 0, sizeof(MY_STRUCT));
}
};
This trick is often used to initialize Win32 structures and can sometimes set the ubiquitous cbSize member.
Now, as long as there isn't a virtual function table for the memset call to destroy, is this a safe practice?

You can simply value-initialize the base, and all its members will be zero'ed out. This is guaranteed
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct():MY_STRUCT() { }
};
For this to work, there should be no user declared constructor in the base class, like in your example.
No nasty memset for that. It's not guaranteed that memset works in your code, even though it should work in practice.

PREAMBLE:
While my answer is still Ok, I find litb's answer quite superior to mine because:
It teaches me a trick that I did not know (litb's answers usually have this effect, but this is the first time I write it down)
It answers exactly the question (that is, initializing the original struct's part to zero)
So please, consider litb's answer before mine. In fact, I suggest the question's author to consider litb's answer as the right one.
Original answer
Putting a true object (i.e. std::string) etc. inside will break, because the true object will be initialized before the memset, and then, overwritten by zeroes.
Using the initialization list doesn't work for g++ (I'm surprised...). Initialize it instead in the CMyStruct constructor body. It will be C++ friendly:
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct() { n1 = 0 ; n2 = 0 ; }
};
P.S.: I assumed you did have no control over MY_STRUCT, of course. With control, you would have added the constructor directly inside MY_STRUCT and forgotten about inheritance. Note that you can add non-virtual methods to a C-like struct, and still have it behave as a struct.
EDIT: Added missing parenthesis, after Lou Franco's comment. Thanks!
EDIT 2 : I tried the code on g++, and for some reason, using the initialization list does not work. I corrected the code using the body constructor. The solution is still valid, though.
Please reevaluate my post, as the original code was changed (see changelog for more info).
EDIT 3 : After reading Rob's comment, I guess he has a point worthy of discussion: "Agreed, but this could be an enormous Win32 structure which may change with a new SDK, so a memset is future proof."
I disagree: Knowing Microsoft, it won't change because of their need for perfect backward compatibility. They will create instead an extended MY_STRUCTEx struct with the same initial layout as MY_STRUCT, with additionnal members at the end, and recognizable through a "size" member variable like the struct used for a RegisterWindow, IIRC.
So the only valid point remaining from Rob's comment is the "enormous" struct. In this case, perhaps a memset is more convenient, but you will have to make MY_STRUCT a variable member of CMyStruct instead of inheriting from it.
I see another hack, but I guess this would break because of possible struct alignment problem.
EDIT 4: Please take a look at Frank Krueger's solution. I can't promise it's portable (I guess it is), but it is still interesting from a technical viewpoint because it shows one case where, in C++, the "this" pointer "address" moves from its base class to its inherited class.

Much better than a memset, you can use this little trick instead:
MY_STRUCT foo = { 0 };
This will initialize all members to 0 (or their default value iirc), no need to specifiy a value for each.

This would make me feel much safer as it should work even if there is a vtable (or the compiler will scream).
memset(static_cast<MY_STRUCT*>(this), 0, sizeof(MY_STRUCT));
I'm sure your solution will work, but I doubt there are any guarantees to be made when mixing memset and classes.

This is a perfect example of porting a C idiom to C++ (and why it might not always work...)
The problem you will have with using memset is that in C++, a struct and a class are exactly the same thing except that by default, a struct has public visibility and a class has private visibility.
Thus, what if later on, some well meaning programmer changes MY_STRUCT like so:
struct MY_STRUCT
{
int n1;
int n2;
// Provide a default implementation...
virtual int add() {return n1 + n2;}
};
By adding that single function, your memset might now cause havoc.
There is a detailed discussion in comp.lang.c+

The examples have "unspecified behaviour".
For a non-POD, the order by which the compiler lays out an object (all bases classes and members) is unspecified (ISO C++ 10/3). Consider the following:
struct A {
int i;
};
class B : public A { // 'B' is not a POD
public:
B ();
private:
int j;
};
This can be laid out as:
[ int i ][ int j ]
Or as:
[ int j ][ int i ]
Therefore, using memset directly on the address of 'this' is very much unspecified behaviour. One of the answers above, at first glance looks to be safer:
memset(static_cast<MY_STRUCT*>(this), 0, sizeof(MY_STRUCT));
I believe, however, that strictly speaking this too results in unspecified behaviour. I cannot find the normative text, however the note in 10/5 says: "A base class subobject may have a layout (3.7) different from the layout of a most derived object of the same type".
As a result, I compiler could perform space optimizations with the different members:
struct A {
char c1;
};
struct B {
char c2;
char c3;
char c4;
int i;
};
class C : public A, public B
{
public:
C ()
: c1 (10);
{
memset(static_cast<B*>(this), 0, sizeof(B));
}
};
Can be laid out as:
[ char c1 ] [ char c2, char c3, char c4, int i ]
On a 32 bit system, due to alighments etc. for 'B', sizeof(B) will most likely be 8 bytes. However, sizeof(C) can also be '8' bytes if the compiler packs the data members. Therefore the call to memset might overwrite the value given to 'c1'.

Precise layout of a class or structure is not guaranteed in C++, which is why you should not make assumptions about the size of it from the outside (that means if you're not a compiler).
Probably it works, until you find a compiler on which it doesn't, or you throw some vtable into the mix.

If you already have a constructor, why not just initialize it there with n1=0; n2=0; -- that's certainly the more normal way.
Edit: Actually, as paercebal has shown, ctor initialization is even better.

My opinion is no. I'm not sure what it gains either.
As your definition of CMyStruct changes and you add/delete members, this can lead to bugs. Easily.
Create a constructor for CMyStruct that takes a MyStruct has a parameter.
CMyStruct::CMyStruct(MyStruct &)
Or something of that sought. You can then initialize a public or private 'MyStruct' member.

From an ISO C++ viewpoint, there are two issues:
(1) Is the object a POD? The acronym stands for Plain Old Data, and the standard enumerates what you can't have in a POD (Wikipedia has a good summary). If it's not a POD, you can't memset it.
(2) Are there members for which all-bits-zero is invalid ? On Windows and Unix, the NULL pointer is all bits zero; it need not be. Floating point 0 has all bits zero in IEEE754, which is quite common, and on x86.
Frank Kruegers tip addresses your concerns by restricting the memset to the POD base of the non-POD class.

Try this - overload new.
EDIT: I should add - This is safe because the memory is zeroed before any constructors are called. Big flaw - only works if object is dynamically allocated.
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct()
{
// whatever
}
void* new(size_t size)
{
// dangerous
return memset(malloc(size),0,size);
// better
if (void *p = malloc(size))
{
return (memset(p, 0, size));
}
else
{
throw bad_alloc();
}
}
void delete(void *p, size_t size)
{
free(p);
}
};

If MY_STRUCT is your code, and you are happy using a C++ compiler, you can put the constructor there without wrapping in a class:
struct MY_STRUCT
{
int n1;
int n2;
MY_STRUCT(): n1(0), n2(0) {}
};
I'm not sure about efficiency, but I hate doing tricks when you haven't proved efficiency is needed.

Comment on litb's answer (seems I'm not yet allowed to comment directly):
Even with this nice C++-style solution you have to be very careful that you don't apply this naively to a struct containing a non-POD member.
Some compilers then don't initialize correctly anymore.
See this answer to a similar question.
I personally had the bad experience on VC2008 with an additional std::string.

What I do is use aggregate initialization, but only specifying initializers for members I care about, e.g:
STARTUPINFO si = {
sizeof si, /*cb*/
0, /*lpReserved*/
0, /*lpDesktop*/
"my window" /*lpTitle*/
};
The remaining members will be initialized to zeros of the appropriate type (as in Drealmer's post). Here, you are trusting Microsoft not to gratuitously break compatibility by adding new structure members in the middle (a reasonable assumption). This solution strikes me as optimal - one statement, no classes, no memset, no assumptions about the internal representation of floating point zero or null pointers.
I think the hacks involving inheritance are horrible style. Public inheritance means IS-A to most readers. Note also that you're inheriting from a class which isn't designed to be a base. As there's no virtual destructor, clients who delete a derived class instance through a pointer to base will invoke undefined behaviour.

I assume the structure is provided to you and cannot be modified. If you can change the structure, then the obvious solution is adding a constructor.
Don't over engineer your code with C++ wrappers when all you want is a simple macro to initialise your structure.
#include <stdio.h>
#define MY_STRUCT(x) MY_STRUCT x = {0}
struct MY_STRUCT
{
int n1;
int n2;
};
int main(int argc, char *argv[])
{
MY_STRUCT(s);
printf("n1(%d),n2(%d)\n", s.n1, s.n2);
return 0;
}

It's a bit of code, but it's reusable; include it once and it should work for any POD. You can pass an instance of this class to any function expecting a MY_STRUCT, or use the GetPointer function to pass it into a function that will modify the structure.
template <typename STR>
class CStructWrapper
{
private:
STR MyStruct;
public:
CStructWrapper() { STR temp = {}; MyStruct = temp;}
CStructWrapper(const STR &myStruct) : MyStruct(myStruct) {}
operator STR &() { return MyStruct; }
operator const STR &() const { return MyStruct; }
STR *GetPointer() { return &MyStruct; }
};
CStructWrapper<MY_STRUCT> myStruct;
CStructWrapper<ANOTHER_STRUCT> anotherStruct;
This way, you don't have to worry about whether NULLs are all 0, or floating point representations. As long as STR is a simple aggregate type, things will work. When STR is not a simple aggregate type, you'll get a compile-time error, so you won't have to worry about accidentally misusing this. Also, if the type contains something more complex, as long as it has a default constructor, you're ok:
struct MY_STRUCT2
{
int n1;
std::string s1;
};
CStructWrapper<MY_STRUCT2> myStruct2; // n1 is set to 0, s1 is set to "";
On the downside, it's slower since you're making an extra temporary copy, and the compiler will assign each member to 0 individually, instead of one memset.

Related

C++: Are Structs really the same as Classes? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What are the differences between struct and class in C++
This question has been asked and answered a lot but every once in a while I come across something confusing.
To sum up, the difference between C++ structs and classes is the famous default public vs. private access. Apart from that, a C++ compiler treats a struct the same way it would treat a class. Structs could have constructors, copy constructors, virtual functions. etc. And the memory layout of a struct is same ad that of a class. And the reason C++ has structs is for backwward compatibility with C.
Now since people confuse as to which one to use, struct or class, the rule of thumb is if you have just plain old data, use a struct. Otherwise use a class. And I have read that structs are good in serialization but don't where this comes from.
Then the other day I came across this article: http://www.codeproject.com/Articles/468882/Introduction-to-a-Cplusplus-low-level-object-model
It says that if we have (directly quoting):
struct SomeStruct
{
int field1;
char field2;
double field3;
bool field4;
};
then this:
void SomeFunction()
{
SomeStruct someStructVariable;
// usage of someStructVariable
...
}
and this:
void SomeFunction()
{
int field1;
char field2;
double field3;
bool field4;
// usage of 4 variables
...
}
are the same.
It says the machine code generated is the same if we have a struct or just write down the variables inside the function. Now of course this only applies if your struct if a POD.
This is where I get confused. In Effective C++ Scott Meyers says that there no such thing as an empty class.
If we have:
class EmptyClass { };
It is actually laid out by the compiler for example as:
class EmptyClass
{
EmptyClass() {}
~EmptyClass() {}
...
};
So you would not have an empty class.
Now if we change the above struct to a class:
class SomeClass
{
int field1;
char field2
double field3;
bool field4;
};
does it mean that:
void SomeFunction()
{
someClass someClassVariable;
// usage of someClassVariable
...
}
and this:
void SomeFunction()
{
int field1;
char field2
double field3;
bool field4;
// usage of 4 variables
...
}
are the same in terms of machine instructions? That there is no call to someClass constructor? Or that the memory allocated is the same as instantiating a class or defining the variables individually? And what about padding? structs and classes do padding. Would padding be the same in these cases?
I'd really appreciate if somebody can shed some light on to this.
I believe the author of that article is mistaken. Although there is probably no difference between the struct and the non-member variable layout version of the two functions, I don't think this is guaranteed. The only things I can think of that are guaranteed here is that since it's a POD, the address of the struct and the first member are the same...and each member follows in memory after that at some point.
In neither case, since it's a POD (and classes can be too, don't make THAT mistake) will the data be initialized.
I would recommend not making such an assumption anyway. If you wrote code that leveraged it, and I can't imagine why you'd want to, most other developers would find it baffling anyway. Only break out the legal books if you HAVE to. Otherwise prefer to code in manners that people are used to. The only important part of all this that you really should keep in mind that POD objects are not initialized unless you do so explicitly.
The only difference is that the members of structs are public by default, while the members of classes are private by default (when I say by default, I mean "unless specified otherwise"). Check out this code:
#include <iostream>
using namespace std;
struct A {
int x;
int y;
};
class A obj1;
int main() {
obj1.x = 0;
obj1.y = 1;
cout << obj1.x << " " << obj1.y << endl;
return 0;
}
The code compiles and runs just fine.
There is no difference between structs and classes besides the default for protection (note that default protection type for base classes is different also). Books and my own 20+ years experience tells this.
Regarding default empty ctor/dector. Standard is not asking for this. Nevertheless some compiler may generate this empty pair of ctor/dector. Every reasonable optimizer would immediately throw them away. If at some place a function that is doing nothing is called, how can you detect this? How this can affect anything besides consuming CPU cycles?
MSVC is not generating useless functions. It is reasonable to think that every good compiler will do the same.
Regarding the examples
struct SomeStruct
{
int field1;
char field2;
double field3;
bool field4;
};
void SomeFunction()
{
int field1;
char field2;
double field3;
bool field4;
...
}
The padding rules, order in memory, etc may be and most likely will be completely different. Optimizer may easily throw away unused local variable. It is much less likely (if possible at all) that optimizer will remove a data field from the struct. For this to happen the struct should be in defined in cpp file, certain flags should be set, etc.
I am not sure you will find any docs about padding of local vars on the stack. AFAIK, this is 100% up to compiler for making this layout. On the contrary, layout of the structs/classes are described, there are #pargma and command line keys that control this, etc.
are the same in terms of machine instructions?
There is no reason not to be. But there is no gurantee from the standard.
That there is no call to someClass constructor?
Yes there is a call to the constructor. But the constructor does no work (as all the members are POD and the way you declare someClass someClassVariable; causes value initialization which does nothing for POD members). So since there is no work to be done there is no need to plant any instructions.
Or that the memory allocated is the same as instantiating a class or defining the variables individually?
The class may contain padding that declaring the variables individually does not.
Also I am sure that the compiler will have an easier time optimizing away individual variables.
And what about padding?
Yes there is a possibility of padding in the structure (struct/class).
structs and classes do padding. Would padding be the same in these cases?
Yes. Just make sure you compare apples to apples (ie)
struct SomeStruct
{
int field1;
char field2;
double field3;
bool field4;
};
class SomeStruct
{
public: /// Make sure you add this line. Now they are identical.
int field1;
char field2;
double field3;
bool field4;
};

Why does C++ allow private members to be modified using this approach?

After seeing this question a few minutes ago, I wondered why the language designers allow it as it allows indirect modification of private data. As an example
class TestClass {
private:
int cc;
public:
TestClass(int i) : cc(i) {};
};
TestClass cc(5);
int* pp = (int*)&cc;
*pp = 70; // private member has been modified
I tested the above code and indeed the private data has been modified. Is there any explanation of why this is allowed to happen or this just an oversight in the language? It seems to directly undermine the use of private data members.
Because, as Bjarne puts it, C++ is designed to protect against Murphy, not Machiavelli.
In other words, it's supposed to protect you from accidents -- but if you go to any work at all to subvert it (such as using a cast) it's not even going to attempt to stop you.
When I think of it, I have a somewhat different analogy in mind: it's like the lock on a bathroom door. It gives you a warning that you probably don't want to walk in there right now, but it's trivial to unlock the door from the outside if you decide to.
Edit: as to the question #Xeo discusses, about why the standard says "have the same access control" instead of "have all public access control", the answer is long and a little tortuous.
Let's step back to the beginning and consider a struct like:
struct X {
int a;
int b;
};
C always had a few rules for a struct like this. One is that in an instance of the struct, the address of the struct itself has to equal the address of a, so you can cast a pointer to the struct to a pointer to int, and access a with well defined results. Another is that the members have to be arranged in the same order in memory as they are defined in the struct (though the compiler is free to insert padding between them).
For C++, there was an intent to maintain that, especially for existing C structs. At the same time, there was an apparent intent that if the compiler wanted to enforce private (and protected) at run-time, it should be easy to do that (reasonably efficiently).
Therefore, given something like:
struct Y {
int a;
int b;
private:
int c;
int d;
public:
int e;
// code to use `c` and `d` goes here.
};
The compiler should be required to maintain the same rules as C with respect to Y.a and Y.b. At the same time, if it's going to enforce access at run time, it may want to move all the public variables together in memory, so the layout would be more like:
struct Z {
int a;
int b;
int e;
private:
int c;
int d;
// code to use `c` and `d` goes here.
};
Then, when it's enforcing things at run-time, it can basically do something like if (offset > 3 * sizeof(int)) access_violation();
To my knowledge nobody's ever done this, and I'm not sure the rest of the standard really allows it, but there does seem to have been at least the half-formed germ of an idea along that line.
To enforce both of those, the C++98 said Y::a and Y::b had to be in that order in memory, and Y::a had to be at the beginning of the struct (i.e., C-like rules). But, because of the intervening access specifiers, Y::c and Y::e no longer had to be in order relative to each other. In other words, all the consecutive variables defined without an access specifier between them were grouped together, the compiler was free to rearrange those groups (but still had to keep the first one at the beginning).
That was fine until some jerk (i.e., me) pointed out that the way the rules were written had another little problem. If I wrote code like:
struct A {
int a;
public:
int b;
public:
int c;
public:
int d;
};
...you ended up with a little bit of self contradition. On one hand, this was still officially a POD struct, so the C-like rules were supposed to apply -- but since you had (admittedly meaningless) access specifiers between the members, it also gave the compiler permission to rearrange the members, thus breaking the C-like rules they intended.
To cure that, they re-worded the standard a little so it would talk about the members all having the same access, rather than about whether or not there was an access specifier between them. Yes, they could have just decreed that the rules would only apply to public members, but it would appear that nobody saw anything to be gained from that. Given that this was modifying an existing standard with lots of code that had been in use for quite a while, the opted for the smallest change they could make that would still cure the problem.
Because of backwards-compatability with C, where you can do the same thing.
For all people wondering, here's why this is not UB and is actually allowed by the standard:
First, TestClass is a standard-layout class (§9 [class] p7):
A standard-layout class is a class that:
has no non-static data members of type non-standard-layout class (or array of such types) or reference, // OK: non-static data member is of type 'int'
has no virtual functions (10.3) and no virtual base classes (10.1), // OK
has the same access control (Clause 11) for all non-static data members, // OK, all non-static data members (1) are 'private'
has no non-standard-layout base classes, // OK, no base classes
either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and // OK, no base classes again
has no base classes of the same type as the first non-static data member. // OK, no base classes again
And with that, you can are allowed to reinterpret_cast the class to the type of its first member (§9.2 [class.mem] p20):
A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa.
In your case, the C-style (int*) cast resolves to a reinterpret_cast (§5.4 [expr.cast] p4).
A good reason is to allow compatibility with C but extra access safety on the C++ layer.
Consider:
struct S {
#ifdef __cplusplus
private:
#endif // __cplusplus
int i, j;
#ifdef __cplusplus
public:
int get_i() const { return i; }
int get_j() const { return j; }
#endif // __cplusplus
};
By requiring that the C-visible S and the C++-visible S be layout-compatible, S can be used across the language boundary with the C++ side having greater access safety. The reinterpret_cast access safety subversion is an unfortunate but necessary corollary.
As an aside, the restriction on having all members with the same access control is because the implementation is permitted to rearrange members relative to members with different access control. Presumably some implementations put members with the same access control together, for the sake of tidiness; it could also be used to reduce padding, although I don't know of any compiler that does that.
The whole purpose of reinterpret_cast (and a C style cast is even more powerful than a reinterpret_cast) is to provide an escape path around safety measures.
The compiler would have given you an error if you had tried int *pp = &cc.cc, the compiler would have told you that you cannot access a private member.
In your code you are reinterpreting the address of cc as a pointer to an int. You wrote it the C style way, the C++ style way would have been int* pp = reinterpret_cast<int*>(&cc);. The reinterpret_cast always is a warning that you are doing a cast between two pointers that are not related. In such a case you must make sure that you are doing right. You must know the underlying memory (layout). The compiler does not prevent you from doing so, because this if often needed.
When doing the cast you throw away all knowledge about the class. From now on the compiler only sees an int pointer. Of course you can access the memory the pointer points to. In your case, on your platform the compiler happened to put cc in the first n bytes of a TestClass object, so a TestClass pointer also points to the cc member.
This is because you are manipulating the memory where your class is located in memory. In your case it just happen to store the private member at this memory location so you change it. It is not a very good idea to do because you do now know how the object will be stored in memory.

C++: Use memset or a struct constructor? What's the fastest?

I'm having trouble figuring out which is better in C++:
I use a struct to manage clients in a message queue, the struct looks like this:
typedef struct _MsgClient {
int handle;
int message_class;
void *callback;
void *private_data;
int priority;
} MsgClient;
All of these being POD entities.
Now, I have an array of these structs where I store my clients (I use an array for memory constraints, I have to limit fragmentation). So in my class I have something like this:
class Foo
{
private:
MsgClient _clients[32];
public:
Foo()
{
memset(_clients, 0x0, sizeof(_clients));
}
}
Now, I read here and there on SO that using memset is bad in C++, and that I'd rather use a constructor for my structure.
I figured something like this:
typedef struct _MsgClient {
int handle;
int message_class;
void *callback;
void *private_data;
int priority;
// struct constructor
_MsgClient(): handle(0), message_class(0), callback(NULL), private_data(NULL), priority(0) {};
} MsgClient;
...would eliminate the need of the memset. But my fear is that when foo is initialized, the struct constructor will be called 32 times, instead of optimizing it as a simple zero out of the memory taken by the array.
What's your opinion on this?
I just found this: Can a member struct be zero-init from the constructor initializer list without calling memset? , is it appropriate in my case (which is different: I have an array, not a single instance of the structure)?
Also, according to this post, adding a constructor to my structure will automatically convert it into a non-POD structure, is it right?
On a conforming implementation, it's perfectly valid to value-initialize an array in the constructor initializer list with an empty member initializer. For your array members, this will have the effect of zero-initializing the members of each array element.
The compiler should be able to make this very efficient and there's no need for you to add a constructor to your struct.
E.g.
Foo() : _clients() {}
You can you memset freely even in C++, as long as you understand what you are doing.
About the performance - the only way to see which way is really faster is to build your program in release configuration, and then see in the disassembler the code generated.
Using memset sounds somewhat faster than per-object initialization. However there's a chance the compiler will generated the same code.

memset for initialization in C++

memset is sometimes used to initialize data in a constructor like the example below. Does it work in general ? Is it a good idea in general?
class A {
public:
A();
private:
int a;
float f;
char str[35];
long *lp;
};
A::A()
{
memset(this, 0, sizeof(*this));
}
Don't use memset. It's a holdover from C and won't work on non-PODs. Specifically, using it on a derived class that contains any virtual functions -- or any class containing a non-builtin -- will result in disaster.
C++ provides a specific syntax for initialization:
class A {
public:
A();
private:
int a;
float f;
char str[35];
long *lp;
};
A::A()
: a(0), f(0), str(), lp(NULL)
{
}
To be honest, I'm not sure, but memset might also be a bad idea on floating-points since their format is unspecified.
It's a terrible idea. You're just tromping over data, paying no heed to how objects should be initialized. If your class is virtual, you're likely to wipe out the vtable pointer as well.
memset works on raw data, but C++ isn't about raw data. C++ creates abstractions, so if you want to be safe you use those abstractions. Use the initializer list to initialize members.
You can do it to POD types:
struct nothing_fancy_here
{
bool b;
int i;
void* p;
};
nothing_fancy_here x;
memset(&x, 0, sizeof(x));
But if you're doing it on this, that means you're in a user-defined constructor and no longer qualify as a POD type. (Though if all your members are POD it might work, as long as none contain 0 as a trap value. I'm sure not sure if any other sources of undefined behavior come into play here.)

Is it possible to subclass a C struct in C++ and use pointers to the struct in C code?

Is there a side effect in doing this:
C code:
struct foo {
int k;
};
int ret_foo(const struct foo* f){
return f.k;
}
C++ code:
class bar : public foo {
int my_bar() {
return ret_foo( (foo)this );
}
};
There's an extern "C" around the C++ code and each code is inside its own compilation unit.
Is this portable across compilers?
This is entirely legal. In C++, classes and structs are identical concepts, with the exception that all struct members are public by default. That's the only difference. So asking whether you can extend a struct is no different than asking if you can extend a class.
There is one caveat here. There is no guarantee of layout consistency from compiler to compiler. So if you compile your C code with a different compiler than your C++ code, you may run into problems related to member layout (padding especially). This can even occur when using C and C++ compilers from the same vendor.
I have had this happen with gcc and g++. I worked on a project which used several large structs. Unfortunately, g++ packed the structs significantly looser than gcc, which caused significant problems sharing objects between C and C++ code. We eventually had to manually set packing and insert padding to make the C and C++ code treat the structs the same. Note however, that this problem can occur regardless of subclassing. In fact we weren't subclassing the C struct in this case.
I certainly not recommend using such weird subclassing. It would be better to change your design to use composition instead of inheritance.
Just make one member
foo* m_pfoo;
in the bar class and it will do the same job.
Other thing you can do is to make one more class FooWrapper, containing the structure in itself with the corresponding getter method. Then you can subclass the wrapper. This way the problem with the virtual destructor is gone.
“Never derive from concrete classes.” — Sutter
“Make non-leaf classes abstract.” — Meyers
It’s simply wrong to subclass non-interface classes. You should refactor your libraries.
Technically, you can do what you want, so long as you don’t invoke undefined behavior by, e. g., deleting a pointer to the derived class by a pointer to its base class subobject. You don’t even need extern "C" for the C++ code. Yes, it’s portable. But it’s poor design.
This is perfectly legal, though it might be confusing for other programmers.
You can use inheritance to extend C-structs with methods and constructors.
Sample :
struct POINT { int x, y; }
class CPoint : POINT
{
public:
CPoint( int x_, int y_ ) { x = x_; y = y_; }
const CPoint& operator+=( const POINT& op2 )
{ x += op2.x; y += op2.y; return *this; }
// etc.
};
Extending structs might be "more" evil, but is not something you are forbidden to do.
Wow, that's evil.
Is this portable across compilers?
Most definitely not. Consider the following:
foo* x = new bar();
delete x;
In order for this to work, foo's destructor must be virtual which it clearly isn't. As long as you don't use new and as long as the derived objectd don't have custom destructors, though, you could be lucky.
/EDIT: On the other hand, if the code is only used as in the question, inheritance has no advantage over composition. Just follow the advice given by m_pGladiator.
This is perfectly legal, and you can see it in practice with the MFC CRect and CPoint classes. CPoint derives from POINT (defined in windef.h), and CRect derives from RECT. You are simply decorating an object with member functions. As long as you don't extend the object with more data, you're fine. In fact, if you have a complex C struct that is a pain to default-initialize, extending it with a class that contains a default constructor is an easy way to deal with that issue.
Even if you do this:
foo *pFoo = new bar;
delete pFoo;
then you're fine, since your constructor and destructor are trivial, and you haven't allocated any extra memory.
You also don't have to wrap your C++ object with 'extern "C"', since you're not actually passing a C++ type to the C functions.
I don't think it is necessarily a problem. The behaviour is well defined, and as long as you are careful with life-time issues (don't mix and match allocations between the C++ and C code) will do what you want. It should be perfectly portable across compilers.
The problem with destructors is real, but applies any time the base class destructor isn't virtual not just for C structs. It is something you need to be aware of but doesn't preclude using this pattern.
It will work, and portably BUT you cannot use any virtual functions (which includes destructors).
I would recommend that instead of doing this you have Bar contain a Foo.
class Bar
{
private:
Foo mFoo;
};
I don't get why you don't simply make ret_foo a member method. Your current way makes your code awfully hard to understand. What is so difficult about using a real class in the first place with a member variable and get/set methods?
I know it's possible to subclass structs in C++, but the danger is that others won't be able to understand what you coded because it's so seldom that somebody actually does it. I'd go for a robust and common solution instead.
It probably will work but I do not believe it is guaranteed to. The following is a quote from ISO C++ 10/5:
A base class subobject might have a layout (3.7) different from the layout of a most derived object of the same type.
It's hard to see how in the "real world" this could actually be the case.
EDIT:
The bottom line is that the standard has not limited the number of places where a base class subobject layout can be different from a concrete object with that same Base type. The result is that any assumptions you may have, such as POD-ness etc. are not necessarily true for the base class subobject.
EDIT:
An alternative approach, and one whose behaviour is well defined is to make 'foo' a member of 'bar' and to provide a conversion operator where it's necessary.
class bar {
public:
int my_bar() {
return ret_foo( foo_ );
}
//
// This allows a 'bar' to be used where a 'foo' is expected
inline operator foo& () {
return foo_;
}
private:
foo foo_;
};