memset for initialization in C++ - c++

memset is sometimes used to initialize data in a constructor like the example below. Does it work in general ? Is it a good idea in general?
class A {
public:
A();
private:
int a;
float f;
char str[35];
long *lp;
};
A::A()
{
memset(this, 0, sizeof(*this));
}

Don't use memset. It's a holdover from C and won't work on non-PODs. Specifically, using it on a derived class that contains any virtual functions -- or any class containing a non-builtin -- will result in disaster.
C++ provides a specific syntax for initialization:
class A {
public:
A();
private:
int a;
float f;
char str[35];
long *lp;
};
A::A()
: a(0), f(0), str(), lp(NULL)
{
}
To be honest, I'm not sure, but memset might also be a bad idea on floating-points since their format is unspecified.

It's a terrible idea. You're just tromping over data, paying no heed to how objects should be initialized. If your class is virtual, you're likely to wipe out the vtable pointer as well.
memset works on raw data, but C++ isn't about raw data. C++ creates abstractions, so if you want to be safe you use those abstractions. Use the initializer list to initialize members.
You can do it to POD types:
struct nothing_fancy_here
{
bool b;
int i;
void* p;
};
nothing_fancy_here x;
memset(&x, 0, sizeof(x));
But if you're doing it on this, that means you're in a user-defined constructor and no longer qualify as a POD type. (Though if all your members are POD it might work, as long as none contain 0 as a trap value. I'm sure not sure if any other sources of undefined behavior come into play here.)

Related

memset() to initialize object in constructor?

I found this piece of C++ code that uses memset() to initialize an object:
struct Message
{
Message()
{
memset(this, 0, sizeof(Message));
}
unsigned int a, b, c;
};
Since this is a POD structure, this code should be fine.
Is there any advantage in using memset instead of a constructor such as:
Message() : a(0), b(0), c(0) {}
There is no advantage in using memset() like this. Leaving behind all the obvious disadvantages and future pain, there is one disadvantage that makes it less efficient than
Message() : a(0), b(0), c(0) {}
This is because PODs are usually stored in arrays. So good (smart) compiler will have an advantage to replace initialization of multiple objects in an array with a single memset(), in case of
Message * messages_1 = new Message[100];
Or
std::vector<Message> messages_2;
messages_2.resize(100);
Even when only single object is being constructed, good compiler will use memset(), behind the curtain.
Note that in C++11 and newer you have a nicer option than either of those:
struct Message
{
unsigned int a = 0;
unsigned int b = 0;
unsigned int c = 0;
};
This should produce identical code (and optimisation opportunities) to the constructor-with-initialisation-list approach while:
Using less code.
Being more readable.
Removing the need to worry about updating initialisation list when members are added.
Removing the need to think about what to do if b needs to default to -1 later.

how to set everything inside a structure to 0?

I have a structure that has 3 different types of values in it (char, float, int).......I need to set everything to 0 at the beginning of the program. How do I do that?
There are two usual ways:
A a = A();
or
A a = {};
The first has the advantage that if you later provide
constructor, it still works (as long as you provide a default
constructor).
With regards to the suggestion to use memset: memset is only
guaranteed to work for integral types. I can't imagine it not
working for a float, but formally, it's not guaranteed. And
of course, if you later modify the struct, it might stop
working. It is a solution to avoid.
struct A
{
int a;
char b;
float c;
};
struct A is an aggregate, built-in types initialized to 0. You can initialize to A in two ways:
//Class method pass to function
int main()
{
A a = {}; // initialize single A to 0
A b = A(); // same effort
A c[10] = {}; // initialize array to 0
return 0;
}
You could make use of a constructor to initialise all of your elements. So you will be able not just to use 0 maybe also other values.
struct A
{
int a;
char b;
float c;
A(int _a=0,char _b=0,float _c=0.0) : a(_a), b(_b), c(_c) {}
};
int main()
{
A a;
// work with a
return 0;
}
There are several ways:
Use initializer mystruct x = { '\0', 0.0f, 0 };
Use mystruct x; memset(&x, 0, sizeof(x));
Write a function (or, in C++, a constructor) that sets each value to zero.
Generally, first one is the most obvious, but if you have a large number of structs, then you may find either of option 2 or 3 more suitable.
Note: using memset is ONLY safe on data structures that ONLY contain data. In C++, a struct and a class are almost identical, and a struct that has member functions, has other struct or class members or has inherited from another class or struct, will definitely not be safe to use memset on. And of course, this is particularly dangerous if you start out with a plain data struct, and then ADD functionality into the struct that "breaks" the 'only data' promise.

Can I get away with this C++ downcasting fib?

I have a C library that has types like this:
typedef struct {
// ...
} mytype;
mytype *mytype_new() {
mytype *t = malloc(sizeof(*t));
// [Initialize t]
return t;
}
void mytype_dosomething(mytype *t, int arg);
I want to provide C++ wrappers to provide a better syntax. However, I want to avoid the complication of having a separately-allocated wrapper object. I have a relatively complicated graph of objects whose memory-management is already more complicated than I would like (objects are refcounted in such a way that all reachable objects are kept alive). Also the C library will be calling back into C++ with pointers to this object and the cost of a new wrapper object to be constructed for each C->C++ callback (since C doesn't know about the wrappers) is unacceptable to me.
My general scheme is to do:
class MyType : public mytype {
public:
static MyType* New() { return (MyType*)mytype_new(); }
void DoSomething(int arg) { mytype_dosomething(this, arg); }
};
This will give C++ programmers nicer syntax:
// C Usage:
mytype *t = mytype_new();
mytype_dosomething(t, arg);
// C++ Usage:
MyType *t = MyType::New();
t->DoSomething(arg);
The fib is that I'm downcasting a mytype* (which was allocated with malloc()) to a MyType*, which is a lie. But if MyType has no members and no virtual functions, it seems like I should be able to depend on sizeof(mytype) == sizeof(MyType), and besides MyType has no actual data to which the compiler could generate any kind of reference.
So even though this probably violates the C++ standard, I'm tempted to think that I can get away with this, even across a wide array of compilers and platforms.
My questions are:
Is it possible that, by some streak of luck, this does not actually violate the C++ standard?
Can anyone think of any kind of real-world, practical problem I could run into by using a scheme like this?
EDIT: #James McNellis asks a good question of why I can't define MyType as:
class MyType {
public:
MyType() { mytype_init(this); }
private:
mytype t;
};
The reason is that I have C callbacks that will call back into C++ with a mytype*, and I want to be able convert this directly into a MyType* without having to copy.
You're downcasting a mytype* to a MyType*, which is legal C++. But here it's problematic since the mytype* pointer doesn't actually point to a MyType. It actually points to a mytype. Thus, if you downcast it do a MyType and attempt to access its members, it'll almost certainly not work. Even if there are no data members or virtual functions, you might in the future, and it's still a huge code smell.
Even if it doesn't violate the C++ standard (which I think it does), I would still be a bit suspicious about the code. Typically if you're wrapping a C library the "modern C++ way" is through the RAII idiom:
class MyType
{
public:
// Constructor
MyType() : myType(::mytype_new()) {}
// Destructor
~MyType() { ::my_type_delete(); /* or something similar to this */ }
mytype* GetRawMyType() { return myType; }
const mytype* GetConstRawMyType() const { return myType; }
void DoSomething(int arg) { ::mytype_dosomething(myType, int arg); }
private:
// MyType is not copyable.
MyType(const MyType&);
MyType& operator=(const MyType&);
mytype* myType;
};
// Usage example:
{
MyType t; // constructor called here
t.DoSomething(123);
} // destructor called when scope ends
Is it possible that, by some streak of luck, this does not actually violate the C++ standard?
I'm not advocating this style, but as MyType and mytype are both PODs, I believe the cast does not violate the Standard. I believe MyType and mytype are layout-compatible (2003 version, Section 9.2, clause 14: "Two POD-struct ... types are layout-compatible if they have the same number of nonstatic data members, and corresponding nonstatic data members (in order) have layout-compatible types (3.9)."), and as such can be cast around without trouble.
EDIT: I had to test things, and it turns out I'm wrong. This is not Standard, as the base class makes MyType non-POD. The following doesn't compile:
#include <cstdio>
namespace {
extern "C" struct Foo {
int i;
};
extern "C" int do_foo(Foo* f)
{
return 5 + f->i;
}
struct Bar : Foo {
int foo_it_up()
{
return do_foo(this);
}
};
}
int main()
{
Bar f = { 5 };
std::printf("%d\n", f.foo_it_up());
}
Visual C++ gives the error message that "Types with a base are not aggregate." Since "Types with a base are not aggregate," then the passage I quoted simply doesn't apply.
I believe that you're still safe in that most compilers will make MyType layout-compatible with with mytype. The cast will "work," but it's not Standard.
I think it would be much safer and elegant to have a mytype* data member of MyType, and initialize it in the constructor of MyType rather than having a New() method (which, by the way, has to be static if you do want to have it).
It does violate the c++ standard, however it should work on most (all that I know) compilers .
You're relying on a specific implementation detail here (that the compiler doesn't care what the actual object is, just what is the type you gave it), but I don't think any compiler has a different implementation detail. be sure to check it on every compiler you use, it might catch you unprepared.

C++: Is it possible to call an object's function before constructor completes?

In C++, is it possible to call a function of an instance before the constructor of that instance completes?
e.g. if A's constructor instantiates B and B's constructor calls one of A's functions.
Yes, that's possible. However, you are responsible that the function invoked won't try to access any sub-objects which didn't have their constructor called. Usually this is quite error-prone, which is why it should be avoided.
This is very possible
class A;
class B {
public:
B(A* pValue);
};
class A {
public:
A() {
B value(this);
}
void SomeMethod() {}
};
B::B(A* pValue) {
pValue->SomeMethod();
}
It's possible and sometimes practically necessary (although it amplifies the ability to level a city block inadvertently). For example, in C++98, instead of defining an artificial base class for common initialization, in C++98 one often see that done by an init function called from each constructor. I'm not talking about two-phase construction, which is just Evil, but about factoring out common initialization.
C++0x provides constructor forwarding which will help to alleviate the problem.
For the in-practice it is Dangerous, one has to be extra careful about what's initialized and not. And for the purely formal there is some unnecessarily vague wording in the standard which can be construed as if the object doesn't really exist until a constructor has completed successfully. However, since that interpretation would make it UB to use e.g. an init function to factor out common initialization, which is a common practice, it can just be disregarded.
why would you wanna do that? No, It can not be done as you need to have an object as one of its parameter(s). C++ member function implementation and C function are different things.
c++ code
class foo
{
int data;
void DoSomething()
{
data++;
}
};
int main()
{
foo a; //an object
a.data = 0; //set the data member to 0
a.DoSomething(); //the object is doing something with itself and is using 'data'
}
Here is a simple way how to do it C.
typedef void (*pDoSomething) ();
typedef struct __foo
{
int data;
pDoSomething ds; //<--pointer to DoSomething function
}foo;
void DoSomething(foo* this)
{
this->data++; //<-- C++ compiler won't compile this as C++ compiler uses 'this' as one of its keywords.
}
int main()
{
foo a;
a.ds = DoSomething; // you have to set the function.
a.data = 0;
a.ds(&a); //this is the same as C++ a.DoSomething code above.
}
Finally, the answer to your question is the code below.
void DoSomething(foo* this);
int main()
{
DoSomething( ?? ); //WHAT!?? We need to pass something here.
}
See, you need an object to pass to it. The answer is no.

Is this C++ structure initialization trick safe?

Instead of having to remember to initialize a simple 'C' structure, I might derive from it and zero it in the constructor like this:
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct()
{
memset(this, 0, sizeof(MY_STRUCT));
}
};
This trick is often used to initialize Win32 structures and can sometimes set the ubiquitous cbSize member.
Now, as long as there isn't a virtual function table for the memset call to destroy, is this a safe practice?
You can simply value-initialize the base, and all its members will be zero'ed out. This is guaranteed
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct():MY_STRUCT() { }
};
For this to work, there should be no user declared constructor in the base class, like in your example.
No nasty memset for that. It's not guaranteed that memset works in your code, even though it should work in practice.
PREAMBLE:
While my answer is still Ok, I find litb's answer quite superior to mine because:
It teaches me a trick that I did not know (litb's answers usually have this effect, but this is the first time I write it down)
It answers exactly the question (that is, initializing the original struct's part to zero)
So please, consider litb's answer before mine. In fact, I suggest the question's author to consider litb's answer as the right one.
Original answer
Putting a true object (i.e. std::string) etc. inside will break, because the true object will be initialized before the memset, and then, overwritten by zeroes.
Using the initialization list doesn't work for g++ (I'm surprised...). Initialize it instead in the CMyStruct constructor body. It will be C++ friendly:
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct() { n1 = 0 ; n2 = 0 ; }
};
P.S.: I assumed you did have no control over MY_STRUCT, of course. With control, you would have added the constructor directly inside MY_STRUCT and forgotten about inheritance. Note that you can add non-virtual methods to a C-like struct, and still have it behave as a struct.
EDIT: Added missing parenthesis, after Lou Franco's comment. Thanks!
EDIT 2 : I tried the code on g++, and for some reason, using the initialization list does not work. I corrected the code using the body constructor. The solution is still valid, though.
Please reevaluate my post, as the original code was changed (see changelog for more info).
EDIT 3 : After reading Rob's comment, I guess he has a point worthy of discussion: "Agreed, but this could be an enormous Win32 structure which may change with a new SDK, so a memset is future proof."
I disagree: Knowing Microsoft, it won't change because of their need for perfect backward compatibility. They will create instead an extended MY_STRUCTEx struct with the same initial layout as MY_STRUCT, with additionnal members at the end, and recognizable through a "size" member variable like the struct used for a RegisterWindow, IIRC.
So the only valid point remaining from Rob's comment is the "enormous" struct. In this case, perhaps a memset is more convenient, but you will have to make MY_STRUCT a variable member of CMyStruct instead of inheriting from it.
I see another hack, but I guess this would break because of possible struct alignment problem.
EDIT 4: Please take a look at Frank Krueger's solution. I can't promise it's portable (I guess it is), but it is still interesting from a technical viewpoint because it shows one case where, in C++, the "this" pointer "address" moves from its base class to its inherited class.
Much better than a memset, you can use this little trick instead:
MY_STRUCT foo = { 0 };
This will initialize all members to 0 (or their default value iirc), no need to specifiy a value for each.
This would make me feel much safer as it should work even if there is a vtable (or the compiler will scream).
memset(static_cast<MY_STRUCT*>(this), 0, sizeof(MY_STRUCT));
I'm sure your solution will work, but I doubt there are any guarantees to be made when mixing memset and classes.
This is a perfect example of porting a C idiom to C++ (and why it might not always work...)
The problem you will have with using memset is that in C++, a struct and a class are exactly the same thing except that by default, a struct has public visibility and a class has private visibility.
Thus, what if later on, some well meaning programmer changes MY_STRUCT like so:
struct MY_STRUCT
{
int n1;
int n2;
// Provide a default implementation...
virtual int add() {return n1 + n2;}
};
By adding that single function, your memset might now cause havoc.
There is a detailed discussion in comp.lang.c+
The examples have "unspecified behaviour".
For a non-POD, the order by which the compiler lays out an object (all bases classes and members) is unspecified (ISO C++ 10/3). Consider the following:
struct A {
int i;
};
class B : public A { // 'B' is not a POD
public:
B ();
private:
int j;
};
This can be laid out as:
[ int i ][ int j ]
Or as:
[ int j ][ int i ]
Therefore, using memset directly on the address of 'this' is very much unspecified behaviour. One of the answers above, at first glance looks to be safer:
memset(static_cast<MY_STRUCT*>(this), 0, sizeof(MY_STRUCT));
I believe, however, that strictly speaking this too results in unspecified behaviour. I cannot find the normative text, however the note in 10/5 says: "A base class subobject may have a layout (3.7) different from the layout of a most derived object of the same type".
As a result, I compiler could perform space optimizations with the different members:
struct A {
char c1;
};
struct B {
char c2;
char c3;
char c4;
int i;
};
class C : public A, public B
{
public:
C ()
: c1 (10);
{
memset(static_cast<B*>(this), 0, sizeof(B));
}
};
Can be laid out as:
[ char c1 ] [ char c2, char c3, char c4, int i ]
On a 32 bit system, due to alighments etc. for 'B', sizeof(B) will most likely be 8 bytes. However, sizeof(C) can also be '8' bytes if the compiler packs the data members. Therefore the call to memset might overwrite the value given to 'c1'.
Precise layout of a class or structure is not guaranteed in C++, which is why you should not make assumptions about the size of it from the outside (that means if you're not a compiler).
Probably it works, until you find a compiler on which it doesn't, or you throw some vtable into the mix.
If you already have a constructor, why not just initialize it there with n1=0; n2=0; -- that's certainly the more normal way.
Edit: Actually, as paercebal has shown, ctor initialization is even better.
My opinion is no. I'm not sure what it gains either.
As your definition of CMyStruct changes and you add/delete members, this can lead to bugs. Easily.
Create a constructor for CMyStruct that takes a MyStruct has a parameter.
CMyStruct::CMyStruct(MyStruct &)
Or something of that sought. You can then initialize a public or private 'MyStruct' member.
From an ISO C++ viewpoint, there are two issues:
(1) Is the object a POD? The acronym stands for Plain Old Data, and the standard enumerates what you can't have in a POD (Wikipedia has a good summary). If it's not a POD, you can't memset it.
(2) Are there members for which all-bits-zero is invalid ? On Windows and Unix, the NULL pointer is all bits zero; it need not be. Floating point 0 has all bits zero in IEEE754, which is quite common, and on x86.
Frank Kruegers tip addresses your concerns by restricting the memset to the POD base of the non-POD class.
Try this - overload new.
EDIT: I should add - This is safe because the memory is zeroed before any constructors are called. Big flaw - only works if object is dynamically allocated.
struct MY_STRUCT
{
int n1;
int n2;
};
class CMyStruct : public MY_STRUCT
{
public:
CMyStruct()
{
// whatever
}
void* new(size_t size)
{
// dangerous
return memset(malloc(size),0,size);
// better
if (void *p = malloc(size))
{
return (memset(p, 0, size));
}
else
{
throw bad_alloc();
}
}
void delete(void *p, size_t size)
{
free(p);
}
};
If MY_STRUCT is your code, and you are happy using a C++ compiler, you can put the constructor there without wrapping in a class:
struct MY_STRUCT
{
int n1;
int n2;
MY_STRUCT(): n1(0), n2(0) {}
};
I'm not sure about efficiency, but I hate doing tricks when you haven't proved efficiency is needed.
Comment on litb's answer (seems I'm not yet allowed to comment directly):
Even with this nice C++-style solution you have to be very careful that you don't apply this naively to a struct containing a non-POD member.
Some compilers then don't initialize correctly anymore.
See this answer to a similar question.
I personally had the bad experience on VC2008 with an additional std::string.
What I do is use aggregate initialization, but only specifying initializers for members I care about, e.g:
STARTUPINFO si = {
sizeof si, /*cb*/
0, /*lpReserved*/
0, /*lpDesktop*/
"my window" /*lpTitle*/
};
The remaining members will be initialized to zeros of the appropriate type (as in Drealmer's post). Here, you are trusting Microsoft not to gratuitously break compatibility by adding new structure members in the middle (a reasonable assumption). This solution strikes me as optimal - one statement, no classes, no memset, no assumptions about the internal representation of floating point zero or null pointers.
I think the hacks involving inheritance are horrible style. Public inheritance means IS-A to most readers. Note also that you're inheriting from a class which isn't designed to be a base. As there's no virtual destructor, clients who delete a derived class instance through a pointer to base will invoke undefined behaviour.
I assume the structure is provided to you and cannot be modified. If you can change the structure, then the obvious solution is adding a constructor.
Don't over engineer your code with C++ wrappers when all you want is a simple macro to initialise your structure.
#include <stdio.h>
#define MY_STRUCT(x) MY_STRUCT x = {0}
struct MY_STRUCT
{
int n1;
int n2;
};
int main(int argc, char *argv[])
{
MY_STRUCT(s);
printf("n1(%d),n2(%d)\n", s.n1, s.n2);
return 0;
}
It's a bit of code, but it's reusable; include it once and it should work for any POD. You can pass an instance of this class to any function expecting a MY_STRUCT, or use the GetPointer function to pass it into a function that will modify the structure.
template <typename STR>
class CStructWrapper
{
private:
STR MyStruct;
public:
CStructWrapper() { STR temp = {}; MyStruct = temp;}
CStructWrapper(const STR &myStruct) : MyStruct(myStruct) {}
operator STR &() { return MyStruct; }
operator const STR &() const { return MyStruct; }
STR *GetPointer() { return &MyStruct; }
};
CStructWrapper<MY_STRUCT> myStruct;
CStructWrapper<ANOTHER_STRUCT> anotherStruct;
This way, you don't have to worry about whether NULLs are all 0, or floating point representations. As long as STR is a simple aggregate type, things will work. When STR is not a simple aggregate type, you'll get a compile-time error, so you won't have to worry about accidentally misusing this. Also, if the type contains something more complex, as long as it has a default constructor, you're ok:
struct MY_STRUCT2
{
int n1;
std::string s1;
};
CStructWrapper<MY_STRUCT2> myStruct2; // n1 is set to 0, s1 is set to "";
On the downside, it's slower since you're making an extra temporary copy, and the compiler will assign each member to 0 individually, instead of one memset.