C memset seems to not write to every member - c++

I wrote a small coordinate class to handle both int and float coordinates.
template <class T>
class vector2
{
public:
vector2() { memset(this, 0, sizeof(this)); }
T x;
T y;
};
Then in main() I do:
vector2<int> v;
But according to my MSVC debugger, only the x value is set to 0, the y value is untouched. Ive never used sizeof() in a template class before, could that be whats causing the trouble?

No don't use memset -- it zeroes out the size of a pointer (4 bytes on my x86 Intel machine) bytes starting at the location pointed by this. This is a bad habit: you will also zero out virtual pointers and pointers to virtual bases when using memset with a complex class. Instead do:
template <class T>
class vector2
{
public:
// use initializer lists
vector2() : x(0), y(0) {}
T x;
T y;
};

As others are saying, memset() is not the right way to do this.
There are some subtleties, however, about why not.
First, your attempt to use memset() is only clearing sizeof(void *) bytes. For your sample case, that apparently is coincidentally the bytes occupied by the x member.
The simple fix would be to write memset(this, 0, sizeof(*this)), which in this case would set both x and y.
However, if your vector2 class has any virtual methods and the usual mechanism is used to represent them by your compiler, then that memset will destroy the vtable and break the instance by setting the vtable pointer to NULL. Which is bad.
Another problem is that if the type T requires some constructor action more complex than just settings its bits to 0, then the constructors for the members are not called, but their effect is ruined by overwriting the content of the members with memset().
The only correct action is to write your default constructor as
vector2(): x(0), y(0), {}
and to just forget about trying to use memset() for this at all.
Edit: D.Shawley pointed out in a comment that the default constructors for x and y were actually called before the memset() in the original code as presented. While technically true, calling memset() overwrites the members, which is at best really, really bad form, and at worst invokes the demons of Undefined Behavior.
As written, the vector2 class is POD, as long as the type T is also plain old data as would be the case if T were int or float.
However, all it would take is for T to be some sort of bignum value class to cause problems that could be really hard to diagnose. If you were lucky, they would manifest early through access violations from dereferencing the NULL pointers created by memset(). But Lady Luck is a fickle mistress, and the more likely outcome is that some memory is leaked, and the application gets "shaky". Or more likely, "shakier".
The OP asked in a comment on another answer "...Isn't there a way to make memset work?"
The answer there is simply, "No."
Having chosen the C++ language, and chosen to take full advantage of templates, you have to pay for those advantages by using the language correctly. It simply isn't correct to bypass the constructor (in the general case). While there are circumstances under which it is legal, safe, and sensible to call memset() in a C++ program, this just isn't one of them.

The problem is this is a Pointer type, which is 4 bytes (on 32bit systems), and ints are 4 bytes (on 32bit systems). Try:
sizeof(*this)
Edit: Though I agree with others that initializer lists in the constructor are probably the correct solution here.

Don't use memset. It'll break horribly on non-POD types (and won't necessarily be easy to debug), and in this case, it's likely to be much slower than simply initializing both members to zero (two assignments versus a function call).
Moreover, you do not usually want to zero out all members of a class. You want to zero out the ones for which zero is a meaningful default value. And you should get into the habit of initializing your members to a meaningful value in any case. Blanket zeroing everything and pretending the problem doesn't exist just guarantees a lot of headaches later. If you add a member to a class, decide whether that member should be initialized, and how.
If and when you do want memset-like functionality, at least use std::fill, which is compatible with non-POD types.
If you're programming in C++, use the tools C++ makes available. Otherwise, call it C.

dirkgently is correct. However rather that constructing x and y with 0, an explicit call to the default constructor will set intrinsic types to 0 and allow the template to be used for structs and classes with a default constructor.
template <class T>
class vector2
{
public:
// use initializer lists
vector2() : x(), y() {}
T x;
T y;
};

Don't try to be smarter than the compiler. Use the initializer lists as intended by the language. The compiler knows how to efficiently initialize basic types.
If you would try your memset hack on a class with virtual functions you would most likely overwrite the vtable ending up in a disaster. Don't use hack like that, they are a maintenance nightmare.

This might work instead:
char buffer[sizeof(vector2)];
memset(buffer, 0, sizeof(buffer));
vector2 *v2 = new (buffer) vector2();
..or replacing/overriding vector2::new to do something like that.
Still seems weird to me though.
Definitely go with
vector2(): x(0), y(0), {}

Related

Using memmove to initialize entire object in constructor in C++

Is it safe to use memmove/memcpy to initialize an object with constructor parameters?
No-one seems to use this method but it works fine when I tried it.
Does parameters being passed in a stack cause problems?
Say I have a class foo as follows,
class foo
{
int x,y;
float z;
foo();
foo(int,int,float);
};
Can I initialize the variables using memmove as follows?
foo::foo(int x,int y,float z)
{
memmove(this,&x, sizeof(foo));
}
This is undefined behavior.
The shown code does not attempt to initialize class variables. It attempts to memmove() onto the class pointer, and assumes that the size of the class is 2*sizeof(int)+sizeof(float). The C++ standard does not guarantee that.
Furthermore, the shown code also assumes the layout of the parameters that are passed to the constructor will be the same layout as the layout of the members of this POD. That, again, is not specified by the C++ standard.
It is safe to use memmove to initialize individual class members. For example, the following is safe:
foo::foo(int x_,int y_,float z_)
{
memmove(&x, &x_, sizeof(x));
memmove(&y, &y_, sizeof(y));
memmove(&z, &z_, sizeof(z));
}
Of course, this does nothing useful, but this would be safe.
No it is not safe, because based on the standard the members are not guaranteed to be immediately right after each other due to alignment/padding.
After your update, this is even worse because the location of passed arguments and their order are not safe to use.
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. - Donald Knuth
You should not try to optimize a code you are not sure you need to. I would suggest you to profile your code before you are able to perform this kind of optimizations. This way you don't lose time improving the performance of some code that is not going to impact the overall performance of your application.
Usually, compilers are smart enough to guess what are you trying to do with your code, and generate high efficient code that will keep the same functionality. For that purpose, you should be sure that you are enabling compiler optimizations (-Olevel flag or toggling individual ones through compiler command arguments).
For example, I've seen that some compilers transform std::copy into a memcpy when the compiler is sure that doing so is straightforward (e.g. data is contiguous).
No it is not safe. It is undefined behavior.
And the code
foo::foo(int x,int y,float z)
{
memmove(this,&x, sizeof(foo));
}
is not even saving you any typing compared to using an initializer list
foo::foo(int x,int y,float z) : x(x), y(y), z(z)
{ }

Is it safe to memset the plain struct with user-defined default constructor?

I know that if a C++ struct is Plain Old Data ("POD") then this guarantees there is no magic in its memory structure, so it means a memcpy to an array of bytes and memcpy back is safe.
I also know that in the standard a POD struct should not have user-defined constructor. In the project I own now, there are some plain structs (with only data fields) with a default constructor defined which initializies the data members to 0. I saw other clients would use memset(&obj, 0, sizeof obj); before using the struct.
Is it ok or safe to memset the non-POD struct before I use it?
Having a constructor does not make a struct non-POD.
An aggregate class is called a POD if it has no user-defined copy-assignment operator and destructor and none of its nonstatic members is a non-POD class, array of non-POD, or a reference.
Given that, it is perfectly safe to call
memset(&obj, 0, sizeof obj);
on an object of a struct that has a constructor as long as it is a POD.
Whether it is OK or not, depends. If the default constructor of the struct wants a member to be initialized to 1 for sane behavior, the above call to memset may impact the behavior of code that depends on 1 being the default value.
Take the example of the following struct:
struct Direction
{
Direction() : x(1.0), y(0.0), z(0.0) {}
double x;
double y;
double z;
};
An object of type Direction expects that at least of one of the components will be non-zero. You can't define a direction when all the components are zero. If you use memset to set everything to 0, code will likely break.
EDIT
It appears, from the comments below, as though the definition of a POD has changed from C++03 to C++11.
Using memset(&obj, 0, sizeof obj); may not be safe after all.
IMO this depends on the use case. I have seen memset used to set the data with white space character on few mainframe appications i.e
memset(&obj, ' ', sizeof(obj));
In case the struct defines a const variable and initializes the value, memset would override such value. So it depends and most cases safe to use memset to initialize for PODS. thats my 2 cents.

Empty Data Member Optimization: would it be possible?

In C++, most of the optimizations are derived from the as-if rule. That is, as long as the program behaves as-if no optimization had taken place, then they are valid.
The Empty Base Optimization is one such trick: in some conditions, if the base class is empty (does not have any non-static data member), then the compiler may elide its memory representation.
Apparently it seems that the standard forbids this optimization on data members, that is even if a data member is empty, it must still take at least one byte worth of place: from n3225, [class]
4 - Complete objects and member subobjects of class type shall have nonzero size.
Note: this leads to the use of private inheritance for Policy Design in order to have EBO kick in when appropriate
I was wondering if, using the as-if rule, one could still be able to perform this optimization.
edit: following a number of answers and comments, and to make it clearer what I am wondering about.
First, let me give an example:
struct Empty {};
struct Foo { Empty e; int i; };
My question is, why is sizeof(Foo) != sizeof(int) ? In particular, unless you specify some packing, chances are due to alignment issues that Foo will be twice the size of int, which seems ridiculously inflated.
Note: my question is not why is sizeof(Foo) != 0, this is not actually required by EBO either
According to C++, it is because no sub-object may have a zero size. However a base is authorized to have a zero size (EBO) therefore:
struct Bar: Empty { int i; };
is likely (thanks to EBO) to obey sizeof(Bar) == sizeof(int).
Steve Jessop seems to be of an opinion that it is so that no two sub-objects would have the same address. I thought about it, however it doesn't actually prevent the optimization in most cases:
If you have "unused" memory, then it is trivial:
struct UnusedPadding { Empty e; Empty f; double d; int i; };
// chances are that the layout will leave some memory after int
But in fact, it's even "worse" than that, because Empty space is never written to (you'd better not if EBO kicks in...) and therefore you could actually place it at an occupied place that is not the address of another object:
struct Virtual { virtual ~Virtual() {} Empty e; Empty f; int i; };
// most compilers will reserve some space for a virtual pointer!
Or, even in our original case:
struct Foo { Empty e; int i; }; // deja vu!
One could have (char*)foo.e == (char*)foo.i + 1 if all we wanted were different address.
It is coming to c++20 with the [[no_unique_address]] attribute.
The proposal P0840r2 has been accepted into the draft standard. It has this example:
template<typename Key, typename Value, typename Hash, typename Pred, typename Allocator>
class hash_map {
[[no_unique_address]] Hash hasher;
[[no_unique_address]] Pred pred;
[[no_unique_address]] Allocator alloc;
Bucket *buckets;
// ...
public:
// ...
};
Under the as-if rule:
struct A {
EmptyThing x;
int y;
};
A a;
assert((void*)&(a.x) != (void*)&(a.y));
The assert must not be triggered. So I don't see any benefit in secretly making x have size 0, when you'd just need to add padding to the structure anyway.
I suppose in theory a compiler could track whether pointers might be taken to the members, and make the optimization only if they definitely aren't. This would have limited use, since there'd be two different versions of the struct with different layouts: one for the optimized case and one for general code.
But for example if you create an instance of A on the stack, and do something with it that is entirely inlined (or otherwise visible to the optimizer), yes, parts of the struct could be completely omitted. This isn't specific to empty objects, though - an empty object is just a special case of an object whose storage isn't accessed, and therefore could in some situations never be allocated at all.
C++ for technical reasons mandates that empty classes should have non-zero size.
This is to enforce that distinct objects have distinct memory addresses. So compilers silently insert a byte into "empty" objects.
This constraint does not apply to base class parts of derived classes as they are not free-standing.
Because Empty is a POD-type, you can use memcpy to overwrite its "representation", so it better not share it with another C++ object or useful data.
Given struct Empty { }; consider what happens if sizeof(Empty) == 0. Generic code that allocates heap for Empty objects could easily behave differently, as - for example - a realloc(p, n * sizeof(T)), where T is Empty, is then equivalent to free(p). If sizeof(Empty) != 0 then things like memset/memcpy etc. would try to work on memory regions that weren't in use by the Empty objects. So, the compiler would need to stitch up things like sizeof(Empty) on the basis of the eventual usage of the value - that sounds close to impossible to me.
Separately, under current C++ rules the assurance that each member has a distinct address means you can use those addresses to encode some state about those fields - e.g. a textual field name, whether some member function of the field object should be visited etc.. If addresses suddenly coincide, any existing code reliant on these keys could break.

memset() or value initialization to zero out a struct?

In Win32 API programming it's typical to use C structs with multiple fields. Usually only a couple of them have meaningful values and all others have to be zeroed out. This can be achieved in either of the two ways:
STRUCT theStruct;
memset( &theStruct, 0, sizeof( STRUCT ) );
or
STRUCT theStruct = {};
The second variant looks cleaner - it's a one-liner, it doesn't have any parameters that could be mistyped and lead to an error being planted.
Does it have any drawbacks compared to the first variant? Which variant to use and why?
Those two constructs a very different in their meaning. The first one uses a memset function, which is intended to set a buffer of memory to certain value. The second to initialize an object. Let me explain it with a bit of code:
Lets assume you have a structure that has members only of POD types ("Plain Old Data" - see What are POD types in C++?)
struct POD_OnlyStruct
{
int a;
char b;
};
POD_OnlyStruct t = {}; // OK
POD_OnlyStruct t;
memset(&t, 0, sizeof t); // OK as well
In this case writing a POD_OnlyStruct t = {} or POD_OnlyStruct t; memset(&t, 0, sizeof t) doesn't make much difference, as the only difference we have here is the alignment bytes being set to zero-value in case of memset used. Since you don't have access to those bytes normally, there's no difference for you.
On the other hand, since you've tagged your question as C++, let's try another example, with member types different from POD:
struct TestStruct
{
int a;
std::string b;
};
TestStruct t = {}; // OK
{
TestStruct t1;
memset(&t1, 0, sizeof t1); // ruins member 'b' of our struct
} // Application crashes here
In this case using an expression like TestStruct t = {} is good, and using a memset on it will lead to crash. Here's what happens if you use memset - an object of type TestStruct is created, thus creating an object of type std::string, since it's a member of our structure. Next, memset sets the memory where the object b was located to certain value, say zero. Now, once our TestStruct object goes out of scope, it is going to be destroyed and when the turn comes to it's member std::string b you'll see a crash, as all of that object's internal structures were ruined by the memset.
So, the reality is, those things are very different, and although you sometimes need to memset a whole structure to zeroes in certain cases, it's always important to make sure you understand what you're doing, and not make a mistake as in our second example.
My vote - use memset on objects only if it is required, and use the default initialization x = {} in all other cases.
Depending on the structure members, the two variants are not necessarily equivalent. memset will set the structure to all-bits-zero whereas value initialization will initialize all members to the value zero. The C standard guarantees these to be the same only for integral types, not for floating-point values or pointers.
Also, some APIs require that the structure really be set to all-bits-zero. For instance, the Berkeley socket API uses structures polymorphically, and there it is important to really set the whole structure to zero, not just the values that are apparent. The API documentation should say whether the structure really needs to be all-bits-zero, but it might be deficient.
But if neither of these, or a similar case, applies, then it's up to you. I would, when defining the structure, prefer value initialization, as that communicates the intent more clearly. Of course, if you need to zeroize an existing structure, memset is the only choice (well, apart from initializing each member to zero by hand, but that wouldn't normally be done, especially for large structures).
If your struct contains things like :
int a;
char b;
int c;
Then bytes of padding will be inserted between b and c. memset will zero those, the other way will not, so there will be 3 bytes of garbage (if your ints are 32 bits). If you intend to use your struct to read/write from a file, this might be important.
I would use value initialization because it looks clean and less error prone as you mentioned. I don't see any drawback in doing it.
You might rely on memset to zero out the struct after it has been used though.
Not that it's common, but I guess the second way also has the benefit of initializing floats to zero, while doing a memset would certainly not.
The value initialization is prefered because it can be done at compile time.
Also it correctly 0 initializes all POD types.
The memset is done at runtime.
Also using memset is suspect if the struct is not POD.
Does not correctly initialize (to zero) non int types.
In some compilers STRUCT theStruct = {}; would translate to memset( &theStruct, 0, sizeof( STRUCT ) ); in the executable. Some C functions are already linked in to do runtime setup so the compiler have these library functions like memset/memcpy available to use.
If there are lots of pointer members and you are likely to add more in the future, it can help to use memset. Combined with appropriate assert(struct->member) calls you can avoid random crashes from trying to deference a bad pointer that you forgot to initialize. But if you're not as forgetful as me, then member-initialization is probably the best!
However, if your struct is being used as part of a public API, you should get client code to use memset as a requirement. This helps with future proofing, because you can add new members and the client code will automatically NULL them out in the memset call, rather than leaving them in a (possibly dangerous) uninitialized state. This is what you do when working with socket structures for example.

Ok to provide constructor for behaviorless aggregates (bundle-o-data) in C++?

Please refer to rule #41 of C++ Coding Standards or Sutter's Gotw #70, which states that:
Make data members private, except in behaviorless aggregates (C-style structs).
I often would like to to add a simple constructor to these C-style structs, for the sake of convenience. For example:
struct Position
{
Position(double lat=0.0, double lon=0.0) : latitude(lat), longitude(lon) {}
double latitude;
double longitude;
};
void travelTo(Position pos) {...}
main()
{
travelTo(Position(12.34, 56.78));
}
While making it easier to construct a Position on the fly, the constructor also kindly zero-initializes default Position objects for me.
Maybe I can follow std::pair's example and provide a "makePosition" free function? NRVO should make it as fast as the constructor, right?
Position makePosition(double lat, double lon)
{
Position p;
p.latitude = lat;
p.longitude = lon;
return p;
}
travelTo(makePosition(12.34, 56.78));
Am I going against the spirit of the "behaviorless aggregate" concept by adding that measly little constructor?
EDIT:
Yes, I was aware of Position p={12.34, 56.78}. But I can't do travelTo({12.34, 56.78}) with pure C structs.
EDIT 2:
For those curious about POD types: What are POD types in C++?
FOLLOW-UP:
I've asked a follow-up question here that is closely related to this one.
We regularly define constructors for our aggregate types, with no adverse effects. In fact the only adverse effects I can think of are that in performance critical situations you cannot avoid default initialisation and that you can't use the type in unions.
The alternatives are the curly brace style of initialisation
Position p = {a,b};
or a free "make" function
Position makePosition(double a, double b)
{
Position p = {a,b};
return p;
}
the problem with the former is that you can't use it to instantiate a temporary to pass into a function
void func(Position p)
{
// ...
}
// func({a,b}) is an error
the latter is fine in this case, but is very slightly more typing for the lazy programmer.
The problem with the latter form (a make function) is that it leaves the possibility that you forget to initialise your data structure. Because uninitialised variables leave me feeling rather uncomfortable I prefer to define a constructor for my aggregate types.
The main reason std::make_pair exists is actually not for this reason (std::pair has constructors), but in fact because to call the constructor of a template type you have to pass the template arguments - which is inconvenient:
std::pair<int,int> func()
{
return std::pair<int,int>(1,2);
}
Finally, in your example, you should at least make your constructor explicit
explicit Position(double lat=0.0, double lon=0.0)
otherwise you allow an implicit cast to a Position from a double
Position p = 0.0;
which might be lead to unintended behaviour. In fact I would define two constructors, one to initialise to zero and one to initialise with two values because the Position construct probably doesn't make much sense without both a latitude and a longitude.
I routinely provide structs with a constructor, with no problems. However, if the constructor is "non-trivial", then the struct is no longer considered to be a POD type, and there will be restrictions on what you can do with it. If this is an issue for you (it never has been for me), then a make_XXXX function is obviously the way to go.
Note that without the constructor, the following already does what you need:
int main()
{
Position pos = { 12.34, 56.78 };
travelTo(pos);
Position pos2 = {}; // zero initialises
travelTo(pos2);
}
Conceptually, it's fine - you aren't going against the spirit of a "behaviorless aggregate".The problem is that the struct is no longer a POD type, so the standard makes fewer guarantees about its behaviour and it can't be stored in a union.
Have you considered this instead?
Position p = {12.34, 56.78};
Since I posted that question, I've been bitten in the behind for defining constructors for my aggregates. Using my Position example above, GCC complains when I do this:
const Position pos[] =
{
{12.34, 56.78},
{23.45, 67.89},
};
warning: extended initializer lists only available with -std=c++0x or -std=gnu++0x|
Instead I have to do this:
const Position pos[] =
{
Position(12.34, 56.78),
Position(23.45, 67.89)
};
With that workaround, I'm worried that in embedded systems, my constant table would not be stored in flash/ROM.
EDIT:
I tried removing the Position constructor, and the pos array has indeed moved from the .bss to the .rodata segment.