Using memset on structures in C++ - c++

I am working on fixing older code for my job. It is currently written in C++. They converted static allocation to dynamic but didn't edit the memsets/memcmp/memcpy. This is my first programming internship so bare with my newbe-like question.
The following code is in C, but I want to have it in C++ ( I read that malloc isn't good practice in C++). I have two scenarios: First, we have f created. Then you use &f in order to fill with zero. The second is a pointer *pf. I'm not sure how to set pf to all 0's like the previous example in C++.
Could you just do pf = new foo instead of malloc and then call memset(pf, 0, sizeof(foo))?
struct foo { ... } f;
memset( &f, 0, sizeof(f) );
//or
struct foo { ... } *pf;
pf = (struct foo*) malloc( sizeof(*pf) );
memset( pf, 0, sizeof(*pf) );

Yes, but only if foo is a POD. If it's got virtual functions or anything else remotely C++ish, don't use memset on it since it'll stomp all over the internals of the struct/class.
What you probably want to do instead of memset is give foo a constructor to explicitly initialise its members.
If you want to use new, don't forget the corresponding delete. Even better would be to use shared_ptr :)

Can you? Yes, probably. Should you? No.
While it will probably work, you're losing the state that the constructor has built for you. Adding to this, what happens when you decide to implement a subclass of this struct? Then you lose the advantage of reuseable code that C++ OOP offers.
What you ought to do instead is create a constructor that initializes the members for you. This way, when you sublass this struct later on down the line, you just use this constructor to aid you in constructing the subclasses. This is free, safe code! use it!
Edit: The caveat to this is that if you have a huge code base already, don't change it until you start subclassing the structs. It works as it is now.

Yes, that would work. However, I don't think malloc is necessarily bad practice, and I wouldn't change it just to change it. Of course, you should make sure you always match the allocation mechanisms properly (new->delete, malloc->free, etc.).
You could also add a constructor to the struct and use that to initialize the fields.

You could new foo (as is the standard way in C++) and implement a constructor which initialises foo rather than using memset.
E.g.
struct Something
{
Something()
: m_nInt( 5 )
{
}
int m_nInt;
};
Also don't forget if you use new to call delete when you are finished with the object otherwise you will end up with memory leaks.

Related

C++, Resetting class members without per-member-assignment

In plain C it's common to reset a struct after instantiation:
struct MyClass obj;
memset( &obj, 0, sizeof(struct MyClass) );
This is convenient - especially when using an object oriented paradigm, since all members are guaranteed to be reset to null etc. no matter how many members are added over time.
I'm looking for a way to do the same in C++. Obviously you can't simply reset the memory since the vtable is part of it. Also, in my particular case I can't use templates.
One solution I've seen is to declare a struct with all members, which you in turn can reset in a single blow:
class MyClass{
MyClass(){ memset(&m, 0, sizeof(m)); }
struct{
int member;
} m;
};
I'm however not very fond of this solution.
I guess "hacks" are available, and if you know one, please also say something about the risks of using it, e.g. if it can differ between compilers etc.
Thanks
If you want to assure that you allocated a memory block with zeros you can use a placement new operator:
size_t sz = sizeof(MyClass);
char *buf = new char[sz];
memset(buf, 0, sz);
MyClass* instance = new (buf) MyClass;
Not to be a spoilsport, but why don't you simply add an initializer list for the members that need a defined value for the object to be valid, and let the compiler figure out the ideal way to initialize the object.
Depending on the type of the object, this can save quite an amount of code size and/or time, plus it is safe even if the "empty" representation for a certain type is not all-zeros. For example, I have hacked a compiler once that the NULL pointer is 0x02000000, and converting between integers and pointers XORs with that value; your program would then initialize any pointer members to non-NULL values.

c++ string in C struct, is it illegal?

struct run_male_walker_struct {
string male_user_name;
string show_name;
};
typedef struct run_male_walker_struct run_male_walker_struct_t;
in another function:
run_male_walker_struct_t *p = malloc(sizeof(struct run_male_walker_struct));
question, is it illegal? As the string is a class, it's size can't be determined by sizeof().
This is illegal, but not for the reasons you're thinking.
The difference between std::malloc()/std::free() and new/delete is that the latter will call constructors/destructors, while the former won't. The expression
void* p = std::malloc(sizeof(run_male_walker_struct))
will return a blob of uninitialized memory on which no constructor is called. You shouldn't touch it with a ten foot pole - except for invoking a constructor on it:
run_male_walker_struct* pw = new(p) run_male_walker_struct;
If you do this, you will have to do the reverse, too:
pw->~run_male_walker_struct();
before you free the memory:
std::free(p);
However, that leaves the question why you want to do that.
The only reason to do this should be when you want to separate memory allocation from construction (like, for example, in a pool allocator). But if you need that, it's best hidden behind some interface. A natural one would be overloading new and delete per class. Also, std::vector does this internally.
Not really sure what you're asking here... Just to be clear, the struct keyword is a valid C++ designation, that functions nearly identically to class except for the default privacy. So if you're compiling with g++, and including the string library, this is a valid statement.
However, calling with malloc() will just give you the memory, not actually construct the values inside that struct. You could more appropriately instantiate it by calling it's default constructor.
The struct definition itself is fine. It results is a non-POD aggregate. But you should prefer the use of new and delete over malloc and free because these handle construction and destruction properly. If you want to keep using malloc and free you have to use the placement-new to properly construct the object and invoke the destructor manually to destroy it before you free it:
#include <new>
...
run_male_walker_struct *p = (run_male_walker_struct*)
malloc(sizeof(run_male_walker_struct));
new(p) run_male_walker_struct; // <-- placement-new
...
p->~run_male_walker_struct(); // <-- pseudo destructor call
free(p);
Or simply:
run_male_walker_struct *p = new run_male_walker_struct;
...
delete p;
BTW: the typedef is not necessary in C++
Try not to use malloc, if you are in C++.
Using NEW is a better alternative, when you browse into the NEW() code, you will realize it does call malloc!!!
The pros of using NEW is it will call the constructor of your class instantiated.
Another minor comment, the code you provided should not be compilable:
run_male_walker_struct_t *p = malloc(sizeof(struct run_male_walker_struct));
Should be
run_male_walker_struct_t *p = (run_male_walker_struct_t*)malloc(sizeof(struct run_male_walker_struct));
this is due to malloc will return a void*.
Using malloc() would work, but using it will only create enough space for your struct.
This means that you will not be able to use your strings properly, because they weren't initialised with their constructors.
Note that string classes don't have their contents in stack memory, but in dynamic memory, which doesn't affect the size of the struct. All classes and structs have a static size, that are known at compile-time (if the struct/class was defined).
I would suggest using new. Using malloc will stuff up the strings.
This raises a question of my own, how did constructors get called on dynamically allocated instantiation in C (were there no such things as constructors in C?). If so, yet another reason against using pure C.
How about
run_male_walker_struct_t * p = new run_male_walker_struct_t:
I'm fairly sure this is legal because the size of the std::string object will be known even if the lengths of the strings are not known. The results may not be what you expect though because malloc won't call constructors.
Try this:
std::string testString1("babab");
std::string testString2("12345678");
std::string testString3;
std::cout <<" sizeof(testString1)" <<sizeof(testString1) << std::endl;
std::cout <<" sizeof(testString2)" <<sizeof(testString2) << std::endl;
std::cout <<" sizeof(testString3)" <<sizeof(testString3) << std::endl;
On my machine this gives me the following output:
sizeof(testString1)8
sizeof(testString2)8
sizeof(testString3)8
Also is there some reason you are not using:
run_male_walker_struct_t *p = new(struct run_male_walker_struct);
This is the correct way to do it in c++, using malloc is almost certainly a mistake.
EDIT: see this page for a more detailed explanation of new vs malloc in c++:
http://www.codeproject.com/KB/tips/newandmalloc.aspx
The answer depends on what you mean by a "C struct".
If you mean "a struct that is valid under the C language", then the answer is obviously: it contains a datatype that isn't valid C, and so the struct itself isn't valid either.
If you mean a C++ POD type, then the answer is no, it is not illegal, but the struct is no longer a POD type (because in order to be POD, all its members must be POD as well, and std::string isn't)

Size of class instance

I'm working with a class for which the new operator has been made private, so that the only way to get an instance is to write
Foo foo = Foo()
Writing
Foo* foo = new Foo()
does not work.
But because I really want a pointer to it, I simulate that with the following :
Foo* foo = (Foo*)malloc(sizeof(Foo));
*foo = Foo();
so that can test whether the pointer is null to know whether is has already been initialized.
It looks like it works, from empirical tests, but is it possible that not enough space had been allocated by malloc ? Or that something else gets funny ?
--- edit ---
A didn't mention the context because I was not actually sure about why they the new operator was disabled. This class is part of a constraint programming library (gecode), and I thought it may be disabled in order to enforced the documented way of specifying a model.
I didn't know about the Concrete Data Type idiom, which looks like a more plausible reason.
That allocation scheme may be fine when specifying a standard model --- in which everything is specified as CDTs in the Space-derived class --- but in my case, these instance are each created by specific classes and then passed by reference to the constructor of the class that reprensents the model.
About the reason i'm not using the
Foo f;
Foo *pf = &f;
it would be like doing case 1 below, which throws a "returning reference to local variable" warning
int& f() { int a=5; return a; } // case 1
int& f() { int a=5; int* ap=&a; return *ap; }
int& f() { int* ap=(int*)malloc(sizeof(int)); *ap=5; return *ap; }
this warning disappears when adding a pointer in case 2, but I guess it is because the compiler loses tracks.
So the only option left is case 3 (not mentioning that additionaly, ap is a member of a class that will be initialized only once when f is called, will be null otherwise, and is the only function returning a reference to it. That way, I am sure that ap in this case when lose its meaning because of the compilier optimizing it away (may that happen ?)
But I guess this reaches far too much beyond the scope of the original question now...
Don't use malloc with C++ classes. malloc is different from new in the very important respect that new calls the class' constructor, but malloc does not.
You can get a pointer in a couple ways, but first ask yourself why? Are you trying to dynamically allocate the object? Are you trying to pass pointers around to other functions?
If you're passing pointers around, you may be better off passing references instead:
void DoSomething(Foo& my_foo)
{
my_foo.do_it();
}
If you really need a pointer (maybe because you can't change the implementation of DoSomething), then you can simply take the pointer to an automatic:
Foo foo;
DoSomething(&foo);
If you need to dynamically allocate the Foo object, things get a little trickier. Someone made the new operation private for a reason. Probably a very good reason. There may be a factory method on Foo like:
class Foo
{
public:
static Foo* MakeFoo();
private:
};
..in which case you should call that. Otherwise you're going to have to edit the implementation of Foo itself, and that might not be easy or a good thing to do.
Be careful about breaking the Concrete Data Type idiom.
You are trying to circumvent the fact that the new operator has been made private, i.e. the Concrete Data Type idiom/pattern. The new operator was probably made private for specific reasons, e.g. another part of the design may depend on this restriction. Trying to get around this to dynamically allocate an instance of the class is trying to circumvent the design and may cause other problems or other unexpected behavior. I wouldn't suggest trying to circumvent this without studying the code thoroughly to ensure you understand the impact to other parts of the class/code.
Concrete Data Type
http://users.rcn.com/jcoplien/Patterns/C++Idioms/EuroPLoP98.html#ConcreteDataType
Solutions
...
Objects that represent abstractions that live "inside" the program, closely tied to the computational model, the implementation, or the programming language, should be declared as local (automatic or static) instances or as member instances. Collection classes (string, list, set) are examples of this kind of abstraction (though they may use heap data, they themselves are not heap objects). They are concrete data types--they aren't "abstract," but are as concrete as int and double.
class ScopedLock
{
private:
static void * operator new (unsigned int size); // Disallow dynamic allocation
static void * operator new (unsigned int size, void * mem); // Disallow placement new as well.
};
int main (void)
{
ScopedLock s; // Allowed
ScopedLock * sl = new ScopedLock (); // Standard new and nothrow new are not allowed.
void * buf = ::operator new (sizeof (ScopedLock));
ScopedLock * s2 = new(buf) ScopedLock; // Placement new is also not allowed
}
ScopedLock object can't be allocated dynamically with standard uses of new operator, nothrow new, and the placement new.
The funny thing that would happen results from the constructor not being called for *foo. It will only work if it is a POD (simple built-in types for members + no constructor). Otherwise, when using assignment, it may not work out right, if the left-hand side is not already a valid instance of the class.
It seems, you can still validly allocate an instance on the heap with
Foo* p = ::new Foo;
To restrict how a class instance can be created, you will probably be better off declaring the constructor(s) private and only allow factory functions call them.
Wrap it:
struct FooHolder {
Foo foo;
operator Foo*() { return &foo; }
};
I don't have full understanding of the underlying code. If other things are ok, the code above is correct. Enough space will be allocated from malloc() and anything funny will not happen. But avoid using strange code and work straighforward:
Foo f;
Foo *pf = &f;

memset() or value initialization to zero out a struct?

In Win32 API programming it's typical to use C structs with multiple fields. Usually only a couple of them have meaningful values and all others have to be zeroed out. This can be achieved in either of the two ways:
STRUCT theStruct;
memset( &theStruct, 0, sizeof( STRUCT ) );
or
STRUCT theStruct = {};
The second variant looks cleaner - it's a one-liner, it doesn't have any parameters that could be mistyped and lead to an error being planted.
Does it have any drawbacks compared to the first variant? Which variant to use and why?
Those two constructs a very different in their meaning. The first one uses a memset function, which is intended to set a buffer of memory to certain value. The second to initialize an object. Let me explain it with a bit of code:
Lets assume you have a structure that has members only of POD types ("Plain Old Data" - see What are POD types in C++?)
struct POD_OnlyStruct
{
int a;
char b;
};
POD_OnlyStruct t = {}; // OK
POD_OnlyStruct t;
memset(&t, 0, sizeof t); // OK as well
In this case writing a POD_OnlyStruct t = {} or POD_OnlyStruct t; memset(&t, 0, sizeof t) doesn't make much difference, as the only difference we have here is the alignment bytes being set to zero-value in case of memset used. Since you don't have access to those bytes normally, there's no difference for you.
On the other hand, since you've tagged your question as C++, let's try another example, with member types different from POD:
struct TestStruct
{
int a;
std::string b;
};
TestStruct t = {}; // OK
{
TestStruct t1;
memset(&t1, 0, sizeof t1); // ruins member 'b' of our struct
} // Application crashes here
In this case using an expression like TestStruct t = {} is good, and using a memset on it will lead to crash. Here's what happens if you use memset - an object of type TestStruct is created, thus creating an object of type std::string, since it's a member of our structure. Next, memset sets the memory where the object b was located to certain value, say zero. Now, once our TestStruct object goes out of scope, it is going to be destroyed and when the turn comes to it's member std::string b you'll see a crash, as all of that object's internal structures were ruined by the memset.
So, the reality is, those things are very different, and although you sometimes need to memset a whole structure to zeroes in certain cases, it's always important to make sure you understand what you're doing, and not make a mistake as in our second example.
My vote - use memset on objects only if it is required, and use the default initialization x = {} in all other cases.
Depending on the structure members, the two variants are not necessarily equivalent. memset will set the structure to all-bits-zero whereas value initialization will initialize all members to the value zero. The C standard guarantees these to be the same only for integral types, not for floating-point values or pointers.
Also, some APIs require that the structure really be set to all-bits-zero. For instance, the Berkeley socket API uses structures polymorphically, and there it is important to really set the whole structure to zero, not just the values that are apparent. The API documentation should say whether the structure really needs to be all-bits-zero, but it might be deficient.
But if neither of these, or a similar case, applies, then it's up to you. I would, when defining the structure, prefer value initialization, as that communicates the intent more clearly. Of course, if you need to zeroize an existing structure, memset is the only choice (well, apart from initializing each member to zero by hand, but that wouldn't normally be done, especially for large structures).
If your struct contains things like :
int a;
char b;
int c;
Then bytes of padding will be inserted between b and c. memset will zero those, the other way will not, so there will be 3 bytes of garbage (if your ints are 32 bits). If you intend to use your struct to read/write from a file, this might be important.
I would use value initialization because it looks clean and less error prone as you mentioned. I don't see any drawback in doing it.
You might rely on memset to zero out the struct after it has been used though.
Not that it's common, but I guess the second way also has the benefit of initializing floats to zero, while doing a memset would certainly not.
The value initialization is prefered because it can be done at compile time.
Also it correctly 0 initializes all POD types.
The memset is done at runtime.
Also using memset is suspect if the struct is not POD.
Does not correctly initialize (to zero) non int types.
In some compilers STRUCT theStruct = {}; would translate to memset( &theStruct, 0, sizeof( STRUCT ) ); in the executable. Some C functions are already linked in to do runtime setup so the compiler have these library functions like memset/memcpy available to use.
If there are lots of pointer members and you are likely to add more in the future, it can help to use memset. Combined with appropriate assert(struct->member) calls you can avoid random crashes from trying to deference a bad pointer that you forgot to initialize. But if you're not as forgetful as me, then member-initialization is probably the best!
However, if your struct is being used as part of a public API, you should get client code to use memset as a requirement. This helps with future proofing, because you can add new members and the client code will automatically NULL them out in the memset call, rather than leaving them in a (possibly dangerous) uninitialized state. This is what you do when working with socket structures for example.

Erase all members of a class

Yesterday I read some code of a colleague and came across this:
class a_class
{
public:
a_class() {...}
int some_method(int some_param) {...}
int value_1;
int value_2;
float value_3;
std::vector<some_other_class*> even_more_values;
/* and so on */
}
a_class a_instances[10];
void some_function()
{
do_stuff();
do_more_stuff();
memset(a_instances, 0, 10 * sizeof(a_class)); // <===== WTF?
}
Is that legal (the WTF line, not the public attributes)? To me it smells really, really bad...
The code ran fine when compiled with VC8, but it throws an "unexpected exception" when compiled with VC9 when calling a_instances[0].event_more_values.push_back(whatever), but when accessing any of the other members. Any insights?
EDIT: Changed the memset from memset(&a_instances... to memset(a_instances.... Thanks for pointing it out Eduard.
EDIT2: Removed the ctor's return type. Thanks litb.
Conclusion: Thanks folks, you confirmed my suspicion.
This is a widely accepted method for initialization for C structs.
In C++ it doesn't work ofcourse because you can't assume anything about vectors internal structure. Zeroing it out is very likely to leave it in an illegal state which is why your program crashes.
He uses memset on a non-POD class type. It's invalid, because C++ only allows it for the simplest cases: Where a class doesn't have a user declared constructor, destructor, no virtual functions and several more restrictions. An array of objects of it won't change that fact.
If he removes the vector he is fine with using memset on it though. One note though. Even if it isn't C++, it might still be valid for his compiler - because if the Standard says something has undefined behavior, implementations can do everything they want - including blessing such behavior and saying what happens. In his case, what happens is probably that you apply memset on it, and it would silently clear out any members of the vector. Possible pointers in it, that would point to the allocated memory, will now just contain zero, without it knowing that.
You can recommend him to clear it out using something like this:
...
for(size_t i=0; i < 10; i++)
objects[i].clear();
And write clear using something like:
void clear() {
a_object o;
o.swap(*this);
}
Swapping would just swap the vector of o with the one of *this, and clear out the other variables. Swapping a vector is especially cheap. He of course needs to write a swap function then, that swaps the vector (even_more_values.swap(that.even_more_values)) and the other variables.
I am not sure, but I think the memset would erase internal data of the vector.
When zeroing out a_instances, you also zero out the std_vector within. Which probably allocates a buffer when constructed. Now, when you try to push_back, it sees the pointer to the buffer being NULL (or some other internal member) so it throws an exception.
It's not legitimate if you ask. That's because you can't overload writing via pointers as you can overload assignment operators.
The worst part of it is that if the vector had anything in it, that memory is now lost because the constructor wasn't called.
NEVER over-write a C++ object. EVER. If it was a derived object (and I don't know the specifics of std::vector), this code also over-writes the object's vtable making it crashy as well as corrupted.
Whoever wrote this doesn't understand what objects are and needs you to explain what they are and how they work so that they don't make this kind of mistake in the future.
You shouldn't do memset on C++ objects, because it doesn't call the proper constructor or destructor.
Specifically in this case, the destructor of even_more_values member of all a_instances's elements is not called.
Actually, at least with the members that you listed (before /* and so on */), you don't need to call memset or create any special destructor or clear() function. All these members are deleted automatically by the default destructor.
You should implement a method 'clear' in your class
void clear()
{
value1=0;
value2=0;
value_3=0f;
even_more_values.clear();
}
What you have here might not crash, but it probably won't do what you want either! Zeroing out the vector won't call the destructor for each a_class instance. It will also overwrite the internal data for a_class.even_more_values (so if your push_back() is after the memset() you are likely to get an access violation).
I would do two things differently:
Use std::vector for your storage both in a_class and in some_function().
Write a destructor for a_class that cleans up properly
If you do this, the storage will be managed for you by the compiler automatically.
For instance:
class a_class
{
public:
a_class() {...}
~a_class() { /* make sure that even_more_values gets cleaned up properly */ }
int some_method(int some_param) {...}
int value_1;
int value_2;
float value_3;
std::vector<some_other_class*> even_more_values;
/* and so on */
}
void some_function()
{
std::vector<a_class> a_instances( 10 );
// Pass a_instances into these functions by reference rather than by using
// a global. This is re-entrant and more likely to be thread-safe.
do_stuff( a_instances );
do_more_stuff( a_instances );
// a_instances will be cleaned up automatically here. This also allows you some
// weak exception safety.
}
Remember that if even_more_values contains pointers to other objects, you will need to delete those objects in the destructor of a_class. If possible, even_more_values should contain the objects themselves rather than pointers to those objects (that way you may not have to write a destructor for a_class, the one the compiler provides for you may be sufficient).