memset() or value initialization to zero out a struct? - c++

In Win32 API programming it's typical to use C structs with multiple fields. Usually only a couple of them have meaningful values and all others have to be zeroed out. This can be achieved in either of the two ways:
STRUCT theStruct;
memset( &theStruct, 0, sizeof( STRUCT ) );
or
STRUCT theStruct = {};
The second variant looks cleaner - it's a one-liner, it doesn't have any parameters that could be mistyped and lead to an error being planted.
Does it have any drawbacks compared to the first variant? Which variant to use and why?

Those two constructs a very different in their meaning. The first one uses a memset function, which is intended to set a buffer of memory to certain value. The second to initialize an object. Let me explain it with a bit of code:
Lets assume you have a structure that has members only of POD types ("Plain Old Data" - see What are POD types in C++?)
struct POD_OnlyStruct
{
int a;
char b;
};
POD_OnlyStruct t = {}; // OK
POD_OnlyStruct t;
memset(&t, 0, sizeof t); // OK as well
In this case writing a POD_OnlyStruct t = {} or POD_OnlyStruct t; memset(&t, 0, sizeof t) doesn't make much difference, as the only difference we have here is the alignment bytes being set to zero-value in case of memset used. Since you don't have access to those bytes normally, there's no difference for you.
On the other hand, since you've tagged your question as C++, let's try another example, with member types different from POD:
struct TestStruct
{
int a;
std::string b;
};
TestStruct t = {}; // OK
{
TestStruct t1;
memset(&t1, 0, sizeof t1); // ruins member 'b' of our struct
} // Application crashes here
In this case using an expression like TestStruct t = {} is good, and using a memset on it will lead to crash. Here's what happens if you use memset - an object of type TestStruct is created, thus creating an object of type std::string, since it's a member of our structure. Next, memset sets the memory where the object b was located to certain value, say zero. Now, once our TestStruct object goes out of scope, it is going to be destroyed and when the turn comes to it's member std::string b you'll see a crash, as all of that object's internal structures were ruined by the memset.
So, the reality is, those things are very different, and although you sometimes need to memset a whole structure to zeroes in certain cases, it's always important to make sure you understand what you're doing, and not make a mistake as in our second example.
My vote - use memset on objects only if it is required, and use the default initialization x = {} in all other cases.

Depending on the structure members, the two variants are not necessarily equivalent. memset will set the structure to all-bits-zero whereas value initialization will initialize all members to the value zero. The C standard guarantees these to be the same only for integral types, not for floating-point values or pointers.
Also, some APIs require that the structure really be set to all-bits-zero. For instance, the Berkeley socket API uses structures polymorphically, and there it is important to really set the whole structure to zero, not just the values that are apparent. The API documentation should say whether the structure really needs to be all-bits-zero, but it might be deficient.
But if neither of these, or a similar case, applies, then it's up to you. I would, when defining the structure, prefer value initialization, as that communicates the intent more clearly. Of course, if you need to zeroize an existing structure, memset is the only choice (well, apart from initializing each member to zero by hand, but that wouldn't normally be done, especially for large structures).

If your struct contains things like :
int a;
char b;
int c;
Then bytes of padding will be inserted between b and c. memset will zero those, the other way will not, so there will be 3 bytes of garbage (if your ints are 32 bits). If you intend to use your struct to read/write from a file, this might be important.

I would use value initialization because it looks clean and less error prone as you mentioned. I don't see any drawback in doing it.
You might rely on memset to zero out the struct after it has been used though.

Not that it's common, but I guess the second way also has the benefit of initializing floats to zero, while doing a memset would certainly not.

The value initialization is prefered because it can be done at compile time.
Also it correctly 0 initializes all POD types.
The memset is done at runtime.
Also using memset is suspect if the struct is not POD.
Does not correctly initialize (to zero) non int types.

In some compilers STRUCT theStruct = {}; would translate to memset( &theStruct, 0, sizeof( STRUCT ) ); in the executable. Some C functions are already linked in to do runtime setup so the compiler have these library functions like memset/memcpy available to use.

If there are lots of pointer members and you are likely to add more in the future, it can help to use memset. Combined with appropriate assert(struct->member) calls you can avoid random crashes from trying to deference a bad pointer that you forgot to initialize. But if you're not as forgetful as me, then member-initialization is probably the best!
However, if your struct is being used as part of a public API, you should get client code to use memset as a requirement. This helps with future proofing, because you can add new members and the client code will automatically NULL them out in the memset call, rather than leaving them in a (possibly dangerous) uninitialized state. This is what you do when working with socket structures for example.

Related

Is it safe to memset the plain struct with user-defined default constructor?

I know that if a C++ struct is Plain Old Data ("POD") then this guarantees there is no magic in its memory structure, so it means a memcpy to an array of bytes and memcpy back is safe.
I also know that in the standard a POD struct should not have user-defined constructor. In the project I own now, there are some plain structs (with only data fields) with a default constructor defined which initializies the data members to 0. I saw other clients would use memset(&obj, 0, sizeof obj); before using the struct.
Is it ok or safe to memset the non-POD struct before I use it?
Having a constructor does not make a struct non-POD.
An aggregate class is called a POD if it has no user-defined copy-assignment operator and destructor and none of its nonstatic members is a non-POD class, array of non-POD, or a reference.
Given that, it is perfectly safe to call
memset(&obj, 0, sizeof obj);
on an object of a struct that has a constructor as long as it is a POD.
Whether it is OK or not, depends. If the default constructor of the struct wants a member to be initialized to 1 for sane behavior, the above call to memset may impact the behavior of code that depends on 1 being the default value.
Take the example of the following struct:
struct Direction
{
Direction() : x(1.0), y(0.0), z(0.0) {}
double x;
double y;
double z;
};
An object of type Direction expects that at least of one of the components will be non-zero. You can't define a direction when all the components are zero. If you use memset to set everything to 0, code will likely break.
EDIT
It appears, from the comments below, as though the definition of a POD has changed from C++03 to C++11.
Using memset(&obj, 0, sizeof obj); may not be safe after all.
IMO this depends on the use case. I have seen memset used to set the data with white space character on few mainframe appications i.e
memset(&obj, ' ', sizeof(obj));
In case the struct defines a const variable and initializes the value, memset would override such value. So it depends and most cases safe to use memset to initialize for PODS. thats my 2 cents.

Why can it be dangerous to use this POD struct as a base class?

I had this conversation with a colleague, and it turned out to be interesting. Say we have the following POD class
struct A {
void clear() { memset(this, 0, sizeof(A)); }
int age;
char type;
};
clear is intended to clear all members, setting to 0 (byte wise). What could go wrong if we use A as a base class? There's a subtle source for bugs here.
The compiler is likely to add padding bytes to A. So sizeof(A) extends beyond char type (until the end of the padding). However in case of inheritance the compiler might not add the padded bytes. So the call to memset will overwrite part of the subclass.
In addition to the other notes, sizeof is a compile-time operator, so clear() will not zero out any members added by derived classes (except as noted due to padding weirdness).
There's nothing really "subtle" about this; memset is a horrible thing to be using in C++. In the rare cases where you really can just fill memory with zeros and expect sane behaviour, and you really need to fill the memory with zeros, and zero-initializing everything via the initializer list the civilized way is somehow unacceptable, use std::fill instead.
In theory, the compiler can lay out base classes differently. C++03 ยง10 paragraph 5 says:
A base class subobject might have a layout (3.7) different from the layout of a most derived object of the same type.
As StackedCrooked mentioned, this might happen by the compiler adding padding to the end of the base class A when it exists as its own object, but the compiler might not add that padding when it's a base class. This would cause A::clear() to overwrite the first few bytes of the members of the subclass.
However in practice, I have not been able to get this to happen with either GCC or Visual Studio 2008. Using this test:
struct A
{
void clear() { memset(this, 0, sizeof(A)); }
int age;
char type;
};
struct B : public A
{
char x;
};
int main(void)
{
B b;
printf("%d %d %d\n", sizeof(A), sizeof(B), ((char*)&b.x - (char*)&b));
b.x = 3;
b.clear();
printf("%d\n", b.x);
return 0;
}
And modifying A, B, or both to be 'packed' (with #pragma pack in VS and __attribute__((packed)) in GCC), I couldn't get b.x to be overwritten in any case. Optimizations were enabled. The 3 values printed for the sizes/offsets were always 8/12/8, 8/9/8, or 5/6/5.
The clear method of the base class will only set the values of the class members.
According to alignment rules, the compiler is allowed to insert padding so that the next data member will occur on the aligned boundary. Thus there will be padding after the type data member. The first data member of the descendant will occupy this slot and be free from the effects of memset, since the sizeof the base class does not include the size of the descendant. Size of parent != size of child (unless child has no data members). See slicing.
Packing of structures is not a part of the language standard. Hopefully, with a good compiler, the size of a packed structure does not include any extra bytes after the last. Even so, a packed descendant inheriting from a packed parent should produce the same result: parent sets only the data members in the parent.
Briefly: It seems to me that the only one potentional problem is in that I can not found any info about "padding bytes" guarantees in C89, C2003 standarts....Do they have some extraordinary volatile or readonly behavior - I can not find even what does term "padding bytes" mean by the standarts...
Detailed:
For objects of POD types it is guaranteed by the C++2003 standard that:
when you memcpy the contents of your object into an array of char or unsigned char, and then memcpy the contents back into your object, the object will hold its original value
guaranteed that there will be no padding in the beginning of a POD object
can break C++ rules about: goto statement, lifetime
For C89 there is also exist some guarantees about structures:
When used for a mixture of union structures if structs have same begining, then first compoments have perfect mathing
sizeof structures in C is equal to the amount of memory to store all the components, the place under the padding between the components, place padding under the following structures
In C components of the structure are given addresses. There is a guarantee that the components of the address are in ascending order. And the address of the first component coincides with the start address of the structure. Regardless of which endian the computer where the program runs
So It seems to me that such rules is appropriate to C++ also, and all is fine. I really think that in hardware level nobody will restrict you from write in padding bytes for non-const object.

C++, Resetting class members without per-member-assignment

In plain C it's common to reset a struct after instantiation:
struct MyClass obj;
memset( &obj, 0, sizeof(struct MyClass) );
This is convenient - especially when using an object oriented paradigm, since all members are guaranteed to be reset to null etc. no matter how many members are added over time.
I'm looking for a way to do the same in C++. Obviously you can't simply reset the memory since the vtable is part of it. Also, in my particular case I can't use templates.
One solution I've seen is to declare a struct with all members, which you in turn can reset in a single blow:
class MyClass{
MyClass(){ memset(&m, 0, sizeof(m)); }
struct{
int member;
} m;
};
I'm however not very fond of this solution.
I guess "hacks" are available, and if you know one, please also say something about the risks of using it, e.g. if it can differ between compilers etc.
Thanks
If you want to assure that you allocated a memory block with zeros you can use a placement new operator:
size_t sz = sizeof(MyClass);
char *buf = new char[sz];
memset(buf, 0, sz);
MyClass* instance = new (buf) MyClass;
Not to be a spoilsport, but why don't you simply add an initializer list for the members that need a defined value for the object to be valid, and let the compiler figure out the ideal way to initialize the object.
Depending on the type of the object, this can save quite an amount of code size and/or time, plus it is safe even if the "empty" representation for a certain type is not all-zeros. For example, I have hacked a compiler once that the NULL pointer is 0x02000000, and converting between integers and pointers XORs with that value; your program would then initialize any pointer members to non-NULL values.

Empty Data Member Optimization: would it be possible?

In C++, most of the optimizations are derived from the as-if rule. That is, as long as the program behaves as-if no optimization had taken place, then they are valid.
The Empty Base Optimization is one such trick: in some conditions, if the base class is empty (does not have any non-static data member), then the compiler may elide its memory representation.
Apparently it seems that the standard forbids this optimization on data members, that is even if a data member is empty, it must still take at least one byte worth of place: from n3225, [class]
4 - Complete objects and member subobjects of class type shall have nonzero size.
Note: this leads to the use of private inheritance for Policy Design in order to have EBO kick in when appropriate
I was wondering if, using the as-if rule, one could still be able to perform this optimization.
edit: following a number of answers and comments, and to make it clearer what I am wondering about.
First, let me give an example:
struct Empty {};
struct Foo { Empty e; int i; };
My question is, why is sizeof(Foo) != sizeof(int) ? In particular, unless you specify some packing, chances are due to alignment issues that Foo will be twice the size of int, which seems ridiculously inflated.
Note: my question is not why is sizeof(Foo) != 0, this is not actually required by EBO either
According to C++, it is because no sub-object may have a zero size. However a base is authorized to have a zero size (EBO) therefore:
struct Bar: Empty { int i; };
is likely (thanks to EBO) to obey sizeof(Bar) == sizeof(int).
Steve Jessop seems to be of an opinion that it is so that no two sub-objects would have the same address. I thought about it, however it doesn't actually prevent the optimization in most cases:
If you have "unused" memory, then it is trivial:
struct UnusedPadding { Empty e; Empty f; double d; int i; };
// chances are that the layout will leave some memory after int
But in fact, it's even "worse" than that, because Empty space is never written to (you'd better not if EBO kicks in...) and therefore you could actually place it at an occupied place that is not the address of another object:
struct Virtual { virtual ~Virtual() {} Empty e; Empty f; int i; };
// most compilers will reserve some space for a virtual pointer!
Or, even in our original case:
struct Foo { Empty e; int i; }; // deja vu!
One could have (char*)foo.e == (char*)foo.i + 1 if all we wanted were different address.
It is coming to c++20 with the [[no_unique_address]] attribute.
The proposal P0840r2 has been accepted into the draft standard. It has this example:
template<typename Key, typename Value, typename Hash, typename Pred, typename Allocator>
class hash_map {
[[no_unique_address]] Hash hasher;
[[no_unique_address]] Pred pred;
[[no_unique_address]] Allocator alloc;
Bucket *buckets;
// ...
public:
// ...
};
Under the as-if rule:
struct A {
EmptyThing x;
int y;
};
A a;
assert((void*)&(a.x) != (void*)&(a.y));
The assert must not be triggered. So I don't see any benefit in secretly making x have size 0, when you'd just need to add padding to the structure anyway.
I suppose in theory a compiler could track whether pointers might be taken to the members, and make the optimization only if they definitely aren't. This would have limited use, since there'd be two different versions of the struct with different layouts: one for the optimized case and one for general code.
But for example if you create an instance of A on the stack, and do something with it that is entirely inlined (or otherwise visible to the optimizer), yes, parts of the struct could be completely omitted. This isn't specific to empty objects, though - an empty object is just a special case of an object whose storage isn't accessed, and therefore could in some situations never be allocated at all.
C++ for technical reasons mandates that empty classes should have non-zero size.
This is to enforce that distinct objects have distinct memory addresses. So compilers silently insert a byte into "empty" objects.
This constraint does not apply to base class parts of derived classes as they are not free-standing.
Because Empty is a POD-type, you can use memcpy to overwrite its "representation", so it better not share it with another C++ object or useful data.
Given struct Empty { }; consider what happens if sizeof(Empty) == 0. Generic code that allocates heap for Empty objects could easily behave differently, as - for example - a realloc(p, n * sizeof(T)), where T is Empty, is then equivalent to free(p). If sizeof(Empty) != 0 then things like memset/memcpy etc. would try to work on memory regions that weren't in use by the Empty objects. So, the compiler would need to stitch up things like sizeof(Empty) on the basis of the eventual usage of the value - that sounds close to impossible to me.
Separately, under current C++ rules the assurance that each member has a distinct address means you can use those addresses to encode some state about those fields - e.g. a textual field name, whether some member function of the field object should be visited etc.. If addresses suddenly coincide, any existing code reliant on these keys could break.

C memset seems to not write to every member

I wrote a small coordinate class to handle both int and float coordinates.
template <class T>
class vector2
{
public:
vector2() { memset(this, 0, sizeof(this)); }
T x;
T y;
};
Then in main() I do:
vector2<int> v;
But according to my MSVC debugger, only the x value is set to 0, the y value is untouched. Ive never used sizeof() in a template class before, could that be whats causing the trouble?
No don't use memset -- it zeroes out the size of a pointer (4 bytes on my x86 Intel machine) bytes starting at the location pointed by this. This is a bad habit: you will also zero out virtual pointers and pointers to virtual bases when using memset with a complex class. Instead do:
template <class T>
class vector2
{
public:
// use initializer lists
vector2() : x(0), y(0) {}
T x;
T y;
};
As others are saying, memset() is not the right way to do this.
There are some subtleties, however, about why not.
First, your attempt to use memset() is only clearing sizeof(void *) bytes. For your sample case, that apparently is coincidentally the bytes occupied by the x member.
The simple fix would be to write memset(this, 0, sizeof(*this)), which in this case would set both x and y.
However, if your vector2 class has any virtual methods and the usual mechanism is used to represent them by your compiler, then that memset will destroy the vtable and break the instance by setting the vtable pointer to NULL. Which is bad.
Another problem is that if the type T requires some constructor action more complex than just settings its bits to 0, then the constructors for the members are not called, but their effect is ruined by overwriting the content of the members with memset().
The only correct action is to write your default constructor as
vector2(): x(0), y(0), {}
and to just forget about trying to use memset() for this at all.
Edit: D.Shawley pointed out in a comment that the default constructors for x and y were actually called before the memset() in the original code as presented. While technically true, calling memset() overwrites the members, which is at best really, really bad form, and at worst invokes the demons of Undefined Behavior.
As written, the vector2 class is POD, as long as the type T is also plain old data as would be the case if T were int or float.
However, all it would take is for T to be some sort of bignum value class to cause problems that could be really hard to diagnose. If you were lucky, they would manifest early through access violations from dereferencing the NULL pointers created by memset(). But Lady Luck is a fickle mistress, and the more likely outcome is that some memory is leaked, and the application gets "shaky". Or more likely, "shakier".
The OP asked in a comment on another answer "...Isn't there a way to make memset work?"
The answer there is simply, "No."
Having chosen the C++ language, and chosen to take full advantage of templates, you have to pay for those advantages by using the language correctly. It simply isn't correct to bypass the constructor (in the general case). While there are circumstances under which it is legal, safe, and sensible to call memset() in a C++ program, this just isn't one of them.
The problem is this is a Pointer type, which is 4 bytes (on 32bit systems), and ints are 4 bytes (on 32bit systems). Try:
sizeof(*this)
Edit: Though I agree with others that initializer lists in the constructor are probably the correct solution here.
Don't use memset. It'll break horribly on non-POD types (and won't necessarily be easy to debug), and in this case, it's likely to be much slower than simply initializing both members to zero (two assignments versus a function call).
Moreover, you do not usually want to zero out all members of a class. You want to zero out the ones for which zero is a meaningful default value. And you should get into the habit of initializing your members to a meaningful value in any case. Blanket zeroing everything and pretending the problem doesn't exist just guarantees a lot of headaches later. If you add a member to a class, decide whether that member should be initialized, and how.
If and when you do want memset-like functionality, at least use std::fill, which is compatible with non-POD types.
If you're programming in C++, use the tools C++ makes available. Otherwise, call it C.
dirkgently is correct. However rather that constructing x and y with 0, an explicit call to the default constructor will set intrinsic types to 0 and allow the template to be used for structs and classes with a default constructor.
template <class T>
class vector2
{
public:
// use initializer lists
vector2() : x(), y() {}
T x;
T y;
};
Don't try to be smarter than the compiler. Use the initializer lists as intended by the language. The compiler knows how to efficiently initialize basic types.
If you would try your memset hack on a class with virtual functions you would most likely overwrite the vtable ending up in a disaster. Don't use hack like that, they are a maintenance nightmare.
This might work instead:
char buffer[sizeof(vector2)];
memset(buffer, 0, sizeof(buffer));
vector2 *v2 = new (buffer) vector2();
..or replacing/overriding vector2::new to do something like that.
Still seems weird to me though.
Definitely go with
vector2(): x(0), y(0), {}