C++, Resetting class members without per-member-assignment - c++

In plain C it's common to reset a struct after instantiation:
struct MyClass obj;
memset( &obj, 0, sizeof(struct MyClass) );
This is convenient - especially when using an object oriented paradigm, since all members are guaranteed to be reset to null etc. no matter how many members are added over time.
I'm looking for a way to do the same in C++. Obviously you can't simply reset the memory since the vtable is part of it. Also, in my particular case I can't use templates.
One solution I've seen is to declare a struct with all members, which you in turn can reset in a single blow:
class MyClass{
MyClass(){ memset(&m, 0, sizeof(m)); }
struct{
int member;
} m;
};
I'm however not very fond of this solution.
I guess "hacks" are available, and if you know one, please also say something about the risks of using it, e.g. if it can differ between compilers etc.
Thanks

If you want to assure that you allocated a memory block with zeros you can use a placement new operator:
size_t sz = sizeof(MyClass);
char *buf = new char[sz];
memset(buf, 0, sz);
MyClass* instance = new (buf) MyClass;

Not to be a spoilsport, but why don't you simply add an initializer list for the members that need a defined value for the object to be valid, and let the compiler figure out the ideal way to initialize the object.
Depending on the type of the object, this can save quite an amount of code size and/or time, plus it is safe even if the "empty" representation for a certain type is not all-zeros. For example, I have hacked a compiler once that the NULL pointer is 0x02000000, and converting between integers and pointers XORs with that value; your program would then initialize any pointer members to non-NULL values.

Related

Reinterpret_cast vs placement new

From reading this post, it is clear that placement news in c++ are used to call a class constructor on a pre-allocated memory location.
In the case that the memory is already initialized, is a placement new or a reinterpret_cast more appropriate?
For example, let's say I read a raw stream of bytes representing a framed message from a TCP socket. I put this stream into a framesync and retrieve a buffer of a known size that represents my class, which I'll call Message. I know of two ways to proceed.
Create a constructor that takes a flag telling the class not to initialize. Do a placement new on the buffer passing the "don't initialize" flag.
Message::Message( bool initialize )
{
//
// Initialize if requested
//
if( initialize )
{
Reset( );
}
}
void Message::Reset( void )
{
m_member1 = 1;
m_member2 = 2;
}
Message* message = new ( buffer ) Message( false );
Use a reinterpret_cast
Message* message = reinterpret_cast< Message* > ( buffer );
I believe that both of these will produce an identical result. Is one preferred over the other as more correct, more OO, safer, easier to read, or better style?
The only meaningful rule is this:
If an instance of some type T has already been constructed at address a, then reinterpret_cast<T*>(a) to get a pointer to the object that already exists.
If an instance of some type T has not yet been constructed at address a, then use placement new to construct an instance of type T at addres a.
They are completely different operations.
The question you need to ask is very, very simple: "does the object already exist?"
If yes, you can access it (via a cast). If no, then you need to construct it (via placement new)
The two operations have nothing to do with each others.
It's not a question of which one you should prefer, because they do different things. You should prefer the one which does what you want.
I would say neither.
Using placement new and having a special method of construction seems like a hack. For one thing the standard says that, for example, an int class member that's not initialized has 'indeterminate value' and accessing it 'may' result in undefined behavior. It's not specified that the int will assume the value of the unmodified underlying bytes interpreted as an int. I don't think that there's anything that prevents a conforming implementation from zero initializing the memory before calling the constructor.
For this use of reinterpret_cast to be well defined you have to jump through some hoops, and even then using the resulting object will probably violate strict aliasing rules.
More practically, if you directly send the implementation-specified representation of a class across the network you'll be relying on the the communicating systems having compatible layouts (compatible representations, alignment, etc.).
Instead you should do real serialization and deserialization, for example by using memcpy() and ntoh() to get the data from the buffer into the members of an existing object.
struct Message {
uint32_t m_member1;
uint16_t m_member2;
};
extern char *buffer;
Message m;
memcpy(&m.m_member1, buffer, sizeof m.m_member1);
m.m_member1 = ntohl(m.m_member1);
buffer += sizeof m.m_member1;
memcpy(&m.m_member2, buffer, sizeof m.m_member2);
m.m_member2 = ntohs(m.m_member2);
buffer += sizeof m.m_member2;
If you don't just use a preexisting library you'll probably want to wrap this stuff up in a framework of your own.
This way you don't have to deal with alignment issues, the network representation is well defined and can be passed between differing implementations, and the program doesn't use technically undefined behavior.

Allocate a struct containing a string in a single allocation

I'm working on a program that stores a vital data structure as an unstructured string with program-defined delimiters (so we need to walk the string and extract the information we need as we go) and we'd like to convert it to a more structured data type.
In essence, this will require a struct with a field describing what kind of data the struct contains and another field that's a string with the data itself. The length of the string will always be known at allocation time. We've determined through testing that doubling the number of allocations required for each of these data types is an unnacceptable cost. Is there any way to allocate the memory for the struct and the std::string contained in the struct in a single allocation? If we were using cstrings I'd just have a char * in the struct and point it to the end of the struct after allocating a block big enough for the struct and string, but we'd prefer std::string if possible.
Most of my experience is with C, so please forgive any C++ ignorance displayed here.
If you have such rigorous memory needs, then you're going to have to abandon std::string.
The best alternative is to find or write an implementation of basic_string_ref (a proposal for the next C++ standard library), which is really just a char* coupled with a size. But it has all of the (non-mutating) functions of std::basic_string. Then you use a factory function to allocate the memory you need (your struct size + string data), and then use placement new to initialize the basic_string_ref.
Of course, you'll also need a custom deletion function, since you can't just pass the pointer to "delete".
Given the previously linked to implementation of basic_string_ref (and its associated typedefs, string_ref), here's a factory constructor/destructor, for some type T that needs to have a string on it:
template<typename T> T *Create(..., const char *theString, size_t lenstr)
{
char *memory = new char[sizeof(T) + lenstr + 1];
memcpy(memory + sizeof(T), theString, lenstr);
try
{
return new(memory) T(..., string_ref(theString, lenstr);
}
catch(...)
{
delete[] memory;
throw;
}
}
template<typename T> T *Create(..., const std::string & theString)
{
return Create(..., theString.c_str(), theString.length());
}
template<typename T> T *Create(..., const string_ref &theString)
{
return Create(..., theString.data(), theString.length());
}
template<typename T> void Destroy(T *pValue)
{
pValue->~T();
char *memory = reinterpret_cast<char*>(pValue);
delete[] memory;
}
Obviously, you'll need to fill in the other constructor parameters yourself. And your type's constructor will need to take a string_ref that refers to the string.
If you are using std::string, you can't really do one allocation for both structure and string, and you also can't make the allocation of both to be one large block. If you are using old C-style strings it's possible though.
If I understand you correctly, you are saying that through profiling you have determined that the fact that you have to allocate a string and another data member in your data structure imposes an unacceptable cost to you application.
If that's indeed the case I can think of a couple solutions.
You could pre-allocate all of these structures up front, before your program starts. Keep them in some kind of fixed collection so they aren't copy-constructed, and reserve enough buffer in your strings to hold your data.
Controversial as it may seem, you could use old C-style char arrays. It seems like you are fogoing much of the reason to use strings in the first place, which is the memory management. However in your case, since you know the needed buffer sizes at start up, you could handle this yourself. If you like the other facilities that string provides, bear in mind that much of that is still available in the <algorithm>s.
Take a look at Variable Sized Struct C++ - the short answer is that there's no way to do it in vanilla C++.
Do you really need to allocate the container structs on the heap? It might be more efficient to have those on the stack, so they don't need to be allocated at all.
Indeed two allocations can seem too high. There are two ways to cut them down though:
Do a single allocation
Do a single dynamic allocation
It might not seem so different, so let me explain.
1. You can use the struct hack in C++
Yes this is not typical C++
Yes this requires special care
Technically it requires:
disabling the copy constructor and assignment operator
making the constructor and destructor private and provide factory methods for allocating and deallocating the object
Honestly, this is the hard-way.
2. You can avoid allocating the outer struct dynamically
Simple enough:
struct M {
Kind _kind;
std::string _data;
};
and then pass instances of M on the stack. Move operations should guarantee that the std::string is not copied (you can always disable copy to make sure of it).
This solution is much simpler. The only (slight) drawback is in memory locality... but on the other hand the top of the stack is already in the CPU cache anyway.
C-style strings can always be converted to std::string as needed. In fact, there's a good chance that your observations from profiling are due to fragmentation of your data rather than simply the number of allocations, and creating an std::string on demand will be efficient. Of course, not knowing your actual application this is just a guess, and really one can't know this until it's tested anyways. I imagine a class
class my_class {
std::string data() const { return self._data; }
const char* data_as_c_str() const // In case you really need it!
{ return self._data; }
private:
int _type;
char _data[1];
};
Note I used a standard clever C trick for data layout: _data is as long as you want it to be, so long as your factory function allocates the extra space for it. IIRC, C99 even gave a special syntax for it:
struct my_struct {
int type;
char data[];
};
which has good odds of working with your C++ compiler. (Is this in the C++11 standard?)
Of course, if you do do this, you really need to make all of the constructors private and friend your factory function, to ensure that the factory function is the only way to actually instantiate my_class -- it would be broken without the extra memory for the array. You'll definitely need to make operator= private too, or otherwise implement it carefully.
Rethinking your data types is probably a good idea.
For example, one thing you can do is, rather than trying to put your char arrays into a structured data type, use a smart reference instead. A class that looks like
class structured_data_reference {
public:
structured_data_reference(const char *data):_data(data) {}
std::string get_first_field() const {
// Do something interesting with _data to get the first field
}
private:
const char *_data;
};
You'll want to do the right thing with the other constructors and assignment operator too (probably disable assignment, and implement something reasonable for move and copy). And you may want reference counted pointers (e.g. std::shared_ptr) throughout your code rather than bare pointers.
Another hack that's possible is to just use std::string, but store the type information in the first entry (or first several). This requires accounting for that whenever you access the data, of course.
I'm not sure if this exactly addressing your problem. One way you can optimize the memory allocation in C++ by using a pre-allocated buffer and then using a 'placement new' operator.
I tried to solve your problem as I understood it.
unsigned char *myPool = new unsigned char[10000];
struct myStruct
{
myStruct(char* aSource1, char* aSource2)
{
original = new (myPool) string(aSource1); //placement new
data = new (myPool) string(aSource2); //placement new
}
~myStruct()
{
original = NULL; //no deallocation needed
data = NULL; //no deallocation needed
}
string* original;
string* data;
};
int main()
{
myStruct* aStruct = new (myPool) myStruct("h1", "h2");
// Use the struct
aStruct = NULL; // No need to deallocate
delete [] myPool;
return 0;
}
[Edit] After, the comment from NicolBolas, the problem is bit more clear. I decided to write one more answer, eventhough in reality it is not that much advantageous than using a raw character array. But, I still believe that this is well within the stated constraints.
Idea would be to provide a custom allocater for the string class as specified in this SO question.
In the implementation of the allocate method, use the placement new as
pointer allocate(size_type n, void * = 0)
{
// fail if we try to allocate too much
if((n * sizeof(T))> max_size()) { throw std::bad_alloc(); }
//T* t = static_cast<T *>(::operator new(n * sizeof(T)));
T* t = new (/* provide the address of the original character buffer*/) T[n];
return t;
}
The constraint is that for the placement new to work, the original string address should be known to the allocater at run time. This can be achieved by external explicit setting before the new string member creation. However, this is not so elegant.
In essence, this will require a struct with a field describing what kind of data the struct contains and another field that's a string with the data itself.
I have a feeling that may you are not exploiting C++'s type-system to its maximum potential here. It looks and feels very C-ish (that is not a proper word, I know). I don't have concrete examples to post here since I don't have any idea about the problem you are trying to solve.
Is there any way to allocate the memory for the struct and the std::string contained in the struct in a single allocation?
I believe that you are worrying about the structure allocation followed by a copy of the string to the structure member? This ideally shouldn't happen (but of course, this depends on how and when you are initializng the members). C++11 supports move construction. This should take care of any extra string copies that you are worried about.
You should really, really post some code to make this discussion worthwhile :)
a vital data structure as an unstructured string with program-defined delimiters
One question: Is this string mutable? If not, you can use a slightly different data-structure. Don't store copies of parts of this vital data structure but rather indices/iterators to this string which point to the delimiters.
// assume that !, [, ], $, % etc. are your program defined delims
const std::string vital = "!id[thisisdata]$[moredata]%[controlblock]%";
// define a special struct
enum Type { ... };
struct Info {
size_t start, end;
Type type;
// define appropriate ctors
};
// parse the string and return Info obejcts
std::vector<Info> parse(const std::string& str) {
std::vector<Info> v;
// loop through the string looking for delims
for (size_t b = 0, e = str.size(); b < e; ++b) {
// on hitting one such delim create an Info
switch( str[ b ] ) {
case '%':
...
case '$;:
// initializing the start and then move until
// you get the appropriate end delim
}
// use push_back/emplace_back to insert this newly
// created Info object back in the vector
v.push_back( Info( start, end, kind ) );
}
return v;
}

Creating dynamically sized objects

Removed the C tag, seeing as that was causing some confusion (it shouldn't have been there to begin with; sorry for any inconvenience there. C answer as still welcome though :)
In a few things I've done, I've found the need to create objects that have a dynamic size and a static size, where the static part is your basic object members, the dynamic part is however an array/buffer appended directly onto the class, keeping the memory contiguous, thus decreasing the amount of needed allocations (these are non-reallocatable objects), and decreasing fragmentation (though as a down side, it may be harder to find a block of a big enough size, however that is a lot more rare - if it should even occur at all - than heap fragmenting. This is also helpful on embedded devices where memory is at a premium(however I don't do anything for embedded devices currently), and things like std::string need to be avoided, or can't be used like in the case of trivial unions.
Generally the way I'd go about this would be to (ab)use malloc(std::string is not used on purpose, and for various reasons):
struct TextCache
{
uint_32 fFlags;
uint_16 nXpos;
uint_16 nYpos;
TextCache* pNext;
char pBuffer[0];
};
TextCache* pCache = (TextCache*)malloc(sizeof(TextCache) + (sizeof(char) * nLength));
This however doesn't sit too well with me, as firstly I would like to do this using new, and thus in a C++ environment, and secondly, it looks horrible :P
So next step was a templated C++ varient:
template <const size_t nSize> struct TextCache
{
uint_32 fFlags;
uint_16 nXpos;
uint_16 nYpos;
TextCache<nSize>* pNext;
char pBuffer[nSize];
};
This however has the problem that storing a pointer to a variable sized object becomes 'impossible', so then the next work around:
class DynamicObject {};
template <const size_t nSize> struct TextCache : DynamicObject {...};
This however still requires casting, and having pointers to DynamicObject all over the place becomes ambiguous when more that one dynamically sized object derives from it (it also looks horrible and can suffer from a bug that forces empty classes to still have a size, although that's probably an archaic, extinct bug...).
Then there was this:
class DynamicObject
{
void* operator new(size_t nSize, size_t nLength)
{
return malloc(nSize + nLength);
}
};
struct TextCache : DynamicObject {...};
which looks a lot better, but would interfere with objects that already have overloads of new(it could even affect placement new...).
Finally I came up with placement new abusing:
inline TextCache* CreateTextCache(size_t nLength)
{
char* pNew = new char[sizeof(TextCache) + nLength];
return new(pNew) TextCache;
}
This however is probably the worst idea so far, for quite a few reasons.
So are there any better ways to do this? or would one of the above versions be better, or at least improvable? Is doing even considered safe and/or bad programming practice?
As I said above, I'm trying to avoid double allocations, because this shouldn't need 2 allocations, and cause this makes writing(serializing) these things to files a lot easier.
The only exception to the double allocation requirement I have is when its basically zero overhead. the only cause where I have encountered that is where I sequentially allocate memory from a fixed buffer(using this system, that I came up with), however its also a special exception to prevent superflous copying.
I'd go for a compromise with the DynamicObject concept. Everything that doesn't depend on the size goes into the base class.
struct TextBase
{
uint_32 fFlags;
uint_16 nXpos;
uint_16 nYpos;
TextBase* pNext;
};
template <const size_t nSize> struct TextCache : public TextBase
{
char pBuffer[nSize];
};
This should cut down on the casting required.
C99 blesses the 'struct hack' - aka flexible array member.
§6.7.2.1 Structure and union specifiers
¶16 As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. With two
exceptions, the flexible array member is ignored. First, the size of the structure shall be
equal to the offset of the last element of an otherwise identical structure that replaces the
flexible array member with an array of unspecified length.106) Second, when a . (or ->)
operator has a left operand that is (a pointer to) a structure with a flexible array member
and the right operand names that member, it behaves as if that member were replaced
with the longest array (with the same element type) that would not make the structure
larger than the object being accessed; the offset of the array shall remain that of the
flexible array member, even if this would differ from that of the replacement array. If this
array would have no elements, it behaves as if it had one element but the behavior is
undefined if any attempt is made to access that element or to generate a pointer one past
it.
¶17 EXAMPLE Assuming that all array members are aligned the same, after the declarations:
struct s { int n; double d[]; };
struct ss { int n; double d[1]; };
the three expressions:
sizeof (struct s)
offsetof(struct s, d)
offsetof(struct ss, d)
have the same value. The structure struct s has a flexible array member d.
106) The length is unspecified to allow for the fact that implementations may give array members different
alignments according to their lengths.
Otherwise, use two separate allocations - one for the core data in the structure and the second for the appended data.
That may look heretic regarding to the economic mindset of C or C++ programmers, but the last time I had a similar problem to solve I chosed to put a fixed size static buffer in my struct and access it through a pointer indirection. If the given struct grew larger than my static buffer the indirection pointer was then allocated dynamically (and the internal buffer unused). That was very simple to implement and solved the kind of issues you raised like fragmentation, as static buffer was used in more than 95% of actual use case, with the remaining 5% needing really large buffers, hence I didn't cared much of the small loss of internal buffer.
I believe that, in C++, technically this is Undefined Behavior (due to alignment issues), although I suspect it can be made to work for probably every existing implementation.
But why do that anyway?
You could use placement new in C++:
char *buff = new char[sizeof(TextCache) + (sizeof(char) * nLength)];
TextCache *pCache = new (buff) TextCache;
The only caveat being that you need to delete buff instead of pCache and if pCache has a destructor you'll have to call it manually.
If you are intending to access this extra area using pBuffer I'd recommend doing this:
struct TextCache
{
...
char *pBuffer;
};
...
char *buff = new char[sizeof(TextCache) + (sizeof(char) * nLength)];
TextCache *pCache = new (buff) TextCache;
pCache->pBuffer = new (buff + sizeof(TextCache)) char[nLength];
...
delete [] buff;
There's nothing wrong with managing your own memory.
template<typename DerivedType, typename ElemType> struct appended_array {
ElemType* buffer;
int length;
~appended_array() {
for(int i = 0; i < length; i++)
buffer->~ElemType();
char* ptr = (char*)this - sizeof(DerivedType);
delete[] ptr;
}
static inline DerivedType* Create(int extra) {
char* newbuf = new char[sizeof(DerivedType) + (extra * sizeof(ElemType))];
DerivedType* ptr = new (newbuf) DerivedType();
ElemType* extrabuf = (ElemType*)newbuf[sizeof(DerivedType)];
for(int i = 0; i < extra; i++)
new (&extrabuf[i]) ElemType();
ptr->lenghth = extra;
ptr->buffer = extrabuf;
return ptr;
}
};
struct TextCache : appended_array<TextCache, char>
{
uint_32 fFlags;
uint_16 nXpos;
uint_16 nYpos;
TextCache* pNext;
// IT'S A MIRACLE! We have a buffer of size length and pointed to by buffer of type char that automagically appears for us in the Create function.
};
You should consider, however, that this optimization is premature and there are way better ways of doing it, like having an object pool or managed heap. Also, I didn't count for any alignment, however it's my understanding that sizeof() returns the aligned size. Also, this will be a bitch to maintain for non-trivial construction. Also, this is totally untested. A managed heap is a way better idea. But you shouldn't be afraid of managing your own memory- if you have custom memory requirements, you need to manage your own memory.
Just occurred to me that I destructed but not deleted the "extra" memory.

Using memset on structures in C++

I am working on fixing older code for my job. It is currently written in C++. They converted static allocation to dynamic but didn't edit the memsets/memcmp/memcpy. This is my first programming internship so bare with my newbe-like question.
The following code is in C, but I want to have it in C++ ( I read that malloc isn't good practice in C++). I have two scenarios: First, we have f created. Then you use &f in order to fill with zero. The second is a pointer *pf. I'm not sure how to set pf to all 0's like the previous example in C++.
Could you just do pf = new foo instead of malloc and then call memset(pf, 0, sizeof(foo))?
struct foo { ... } f;
memset( &f, 0, sizeof(f) );
//or
struct foo { ... } *pf;
pf = (struct foo*) malloc( sizeof(*pf) );
memset( pf, 0, sizeof(*pf) );
Yes, but only if foo is a POD. If it's got virtual functions or anything else remotely C++ish, don't use memset on it since it'll stomp all over the internals of the struct/class.
What you probably want to do instead of memset is give foo a constructor to explicitly initialise its members.
If you want to use new, don't forget the corresponding delete. Even better would be to use shared_ptr :)
Can you? Yes, probably. Should you? No.
While it will probably work, you're losing the state that the constructor has built for you. Adding to this, what happens when you decide to implement a subclass of this struct? Then you lose the advantage of reuseable code that C++ OOP offers.
What you ought to do instead is create a constructor that initializes the members for you. This way, when you sublass this struct later on down the line, you just use this constructor to aid you in constructing the subclasses. This is free, safe code! use it!
Edit: The caveat to this is that if you have a huge code base already, don't change it until you start subclassing the structs. It works as it is now.
Yes, that would work. However, I don't think malloc is necessarily bad practice, and I wouldn't change it just to change it. Of course, you should make sure you always match the allocation mechanisms properly (new->delete, malloc->free, etc.).
You could also add a constructor to the struct and use that to initialize the fields.
You could new foo (as is the standard way in C++) and implement a constructor which initialises foo rather than using memset.
E.g.
struct Something
{
Something()
: m_nInt( 5 )
{
}
int m_nInt;
};
Also don't forget if you use new to call delete when you are finished with the object otherwise you will end up with memory leaks.

memset() or value initialization to zero out a struct?

In Win32 API programming it's typical to use C structs with multiple fields. Usually only a couple of them have meaningful values and all others have to be zeroed out. This can be achieved in either of the two ways:
STRUCT theStruct;
memset( &theStruct, 0, sizeof( STRUCT ) );
or
STRUCT theStruct = {};
The second variant looks cleaner - it's a one-liner, it doesn't have any parameters that could be mistyped and lead to an error being planted.
Does it have any drawbacks compared to the first variant? Which variant to use and why?
Those two constructs a very different in their meaning. The first one uses a memset function, which is intended to set a buffer of memory to certain value. The second to initialize an object. Let me explain it with a bit of code:
Lets assume you have a structure that has members only of POD types ("Plain Old Data" - see What are POD types in C++?)
struct POD_OnlyStruct
{
int a;
char b;
};
POD_OnlyStruct t = {}; // OK
POD_OnlyStruct t;
memset(&t, 0, sizeof t); // OK as well
In this case writing a POD_OnlyStruct t = {} or POD_OnlyStruct t; memset(&t, 0, sizeof t) doesn't make much difference, as the only difference we have here is the alignment bytes being set to zero-value in case of memset used. Since you don't have access to those bytes normally, there's no difference for you.
On the other hand, since you've tagged your question as C++, let's try another example, with member types different from POD:
struct TestStruct
{
int a;
std::string b;
};
TestStruct t = {}; // OK
{
TestStruct t1;
memset(&t1, 0, sizeof t1); // ruins member 'b' of our struct
} // Application crashes here
In this case using an expression like TestStruct t = {} is good, and using a memset on it will lead to crash. Here's what happens if you use memset - an object of type TestStruct is created, thus creating an object of type std::string, since it's a member of our structure. Next, memset sets the memory where the object b was located to certain value, say zero. Now, once our TestStruct object goes out of scope, it is going to be destroyed and when the turn comes to it's member std::string b you'll see a crash, as all of that object's internal structures were ruined by the memset.
So, the reality is, those things are very different, and although you sometimes need to memset a whole structure to zeroes in certain cases, it's always important to make sure you understand what you're doing, and not make a mistake as in our second example.
My vote - use memset on objects only if it is required, and use the default initialization x = {} in all other cases.
Depending on the structure members, the two variants are not necessarily equivalent. memset will set the structure to all-bits-zero whereas value initialization will initialize all members to the value zero. The C standard guarantees these to be the same only for integral types, not for floating-point values or pointers.
Also, some APIs require that the structure really be set to all-bits-zero. For instance, the Berkeley socket API uses structures polymorphically, and there it is important to really set the whole structure to zero, not just the values that are apparent. The API documentation should say whether the structure really needs to be all-bits-zero, but it might be deficient.
But if neither of these, or a similar case, applies, then it's up to you. I would, when defining the structure, prefer value initialization, as that communicates the intent more clearly. Of course, if you need to zeroize an existing structure, memset is the only choice (well, apart from initializing each member to zero by hand, but that wouldn't normally be done, especially for large structures).
If your struct contains things like :
int a;
char b;
int c;
Then bytes of padding will be inserted between b and c. memset will zero those, the other way will not, so there will be 3 bytes of garbage (if your ints are 32 bits). If you intend to use your struct to read/write from a file, this might be important.
I would use value initialization because it looks clean and less error prone as you mentioned. I don't see any drawback in doing it.
You might rely on memset to zero out the struct after it has been used though.
Not that it's common, but I guess the second way also has the benefit of initializing floats to zero, while doing a memset would certainly not.
The value initialization is prefered because it can be done at compile time.
Also it correctly 0 initializes all POD types.
The memset is done at runtime.
Also using memset is suspect if the struct is not POD.
Does not correctly initialize (to zero) non int types.
In some compilers STRUCT theStruct = {}; would translate to memset( &theStruct, 0, sizeof( STRUCT ) ); in the executable. Some C functions are already linked in to do runtime setup so the compiler have these library functions like memset/memcpy available to use.
If there are lots of pointer members and you are likely to add more in the future, it can help to use memset. Combined with appropriate assert(struct->member) calls you can avoid random crashes from trying to deference a bad pointer that you forgot to initialize. But if you're not as forgetful as me, then member-initialization is probably the best!
However, if your struct is being used as part of a public API, you should get client code to use memset as a requirement. This helps with future proofing, because you can add new members and the client code will automatically NULL them out in the memset call, rather than leaving them in a (possibly dangerous) uninitialized state. This is what you do when working with socket structures for example.