I have a massive structure where the entire contents is scalar variables, enumerations, and arrays of scalars (stack-based) with the exception of one std::string variable.
Now, here's my question...
Can I memset the structure to 0 for its whole size (like I would if it was simply all scalars), or is that not possible with the std::string being in there? I'm not sure what memset would do to its internal representation.
And if you're going to say its good/bad please explain why - I'd like to know why it is the way it is :)
No, you cant, it would overwrite the internal state of the string and make bad things happen. You could wrap all the POD stuff in a seperate struct and put that in your current one, that way you could memset that and let the string default construct.
Edit: Just to clarify, the string will almost certainly be storing a pointer to the memory its allocated for storage. The string's constructor will always have run before you can memset it (even if you memset this in the constructor of your type, the string constructor would run first). So you would be overwriting this pointer value, and instead of pointing to its storage, it would a pointer to NULL, or some other almost definitely invalid value.
Here's an exotic idea: Suppose your class Foo has lots of primitive members which remain uninitialized in Foo's constructor, with the exception of one string:
class Foo
{
int a;
double b;
std::string s;
};
The constructor Foo::Foo() will correctly initialize the string, but it won't care for anything else. So, let's zero out the memory before we construct!
void * addr = ::operator new(sizeof(Foo));
std::memset(addr, 0, sizeof(Foo));
Foo * p = new (addr) Foo;
// later
p->~Foo();
::operator delete(addr);
Of course it would be cleaner to just initialize all the members to zero in the constructor, but perhaps you have your own reasons that you don't want to create a custom constructor.
Zeroing std::string member is very bad idea, it will cause memory leak! Never do this!
Related
Why do we need special algorithms to write to uninitialized (but allocated) memory? Won't the normal modifying algorithms do? Or does uninitialized memory mean something different from what the name itself conveys?
Take std::copy and std::uninitialized_copy for a range of std::strings.
Regular copy will assume there already exists a string there. The copy assignment operator of string will try to use any existing space in the string if possible for the copy.
However, if there wasn't already a string there, as in the case of an uninitialized memory, the copy assignment operator will access garbage memory and behavior is undefined.
Uninitialized copy on the other hand will create the string there instead of assigning to it, so it can be used in a memory that does not already have a string in it.
Essentially, the regular versions will have a *it = value; in them, and uninitialized versions will have something like a new (&(*it)) T(value);.
It is essentially about the object lifecycle.
After memory is allocated, it must be initialized by running the class' constructor. When the object is finished with, the class' destructor must be run.
The standard algorithms assume they are always accessing initialized memory and so objects can be created, copied, swapped and moved and deleted etc... based on that assumption.
When dealing with uninitialized memory however, the algorithms have to make sure they do not run a destructor on memory that was never initialized with the constructor. They have to avoid moving and swapping with non-existent objects by initializing the memory first when needed etc...
They have to deal with the extra step in the object lifecycle (initialization) that is unnecessary with already initialized memory.
It's the difference between construction and assignment:
struct foo { /* whatever */ };
foo f;
unsigned char buf[sizeof foo];
foo *foo_ptr = (foo*) buf;
*foo_ptr = f; // undefined behavior; *foo_ptr does not point at a valid object
new (foo_ptr) foo; // okay; initializes raw memory
*foo_ptr = f; // okay; assignment to an existing object
I have tried some interesting code(at least for me !). Here it is.
#include <iostream>
struct myStruct{
int one;
/*Destructor: Program crashes if the below code uncommented*/
/*
~myStruct(){
std::cout<<"des\n";
}
*/
};
struct finalStruct {
int noOfChars;
int noOfStructs;
union {
myStruct *structPtr;
char *charPtr;
}U;
};
int main(){
finalStruct obj;
obj.noOfChars = 2;
obj.noOfStructs = 1;
int bytesToAllocate = sizeof(char)*obj.noOfChars
+ sizeof(myStruct)*obj.noOfStructs;
obj.U.charPtr = new char[bytesToAllocate];
/*Now both the pointers charPtr and structPtr points to same location*/
delete []obj.U.structPtr;
}
I have allocated memory to charPtr and deleted with structPtr. It is crashing when I add a destructor to myStruct otherwise no issues.
What exactly happens here. As I know delete[] will call the destructor as many times as number given in new[]. Why it is not crashing when there is no destructor in myStruct?
First off, storing one member of a union and then reading another in the way you're doing it is Undefined Behaviour, plain and simple. It's just wrong and anything could happen.
That aside, it's quite likely the type pun you're attempting with the union actually works (but remember it's not guaranteed). If that's the case, the following happens:
You allocate an array of bytesToAllocate objects of type char and store the address in the unionised pointer.
Then, you call delete[] on the unionised pointer typed as myStruct*. Which means that it assumes it's an array of myStruct objects, and it will invoke the destructor on each of these objects. However, the array does not contain any myStruct objects, it contains char objects. In fact, the size in bytes of the array is not even a multiple of the size of myStruct! The delete implementation must be thoroughly confused. It probably interprets the first sizeof(myStruct) bytes as one myStruct object and calls the destructor in those bytes. Then, there's less than sizeof(myStruct) bytes left, but there are still some left, so the destructor is called on those incomplete bytes, reaches beyond the array, and hilarity ensues.
Of course, since this is just UB, my guess at the behaviour above could be way off. Plain and simple, you've confused it, so it acts confused.
delete makes two things, call destructor and deallocate memory.
You allocate data for one type, but delete if faking another type.
You shouldn't do it. There are many things one could do in C/C++, take a look at IOCCC for more inspirations :-)
A struct in C++ without any function and having only plain old data is itself a POD. It never calls a constructor/destructor when created/deleted.
Even not standard c-tors/d-tors. Just for performance reasons.
A Struct having (EDIT) user-defined copy-assignment operator, virtual function or d-tor is internally a little bit more complicated. It has a table of member function pointers.
If you allocate the memory block with chars, this table is not initialized. When you try to delete this memory block using a not POD-type, it first calls the destructor. And as the destructor function pointer is not initialized, it calls any memory block in your memory space, thinking it was the function. That's why it crashes.
It works because myStruct does not have a destructor. [Edit: I now see that you tried that, and it does crash. I would find the question interesting why it crashes with that dtor, since the dtor does not access any memory of the object.]
As others said, the second function of free[] besides potentially calling the elements' dtors (which doesn't happen here, as described) is to free the memory.
That works perfectly in your implementation because typically free store implementations just allocate a block of memory for that purpose whose size is kept in a book keeping location in that very memory. The size (once allocated) is type independent, i.e. is not derived from the pointer type on free. Cf. How does delete[] "know" the size of the operand array?. The malloc like, type agnostic allocator returns the chunk of memory and is happy.
Note that, of course, what you do is bogous and don't do that at home and don't publish it and make ppl sign non liability disagreements and don't use it in nuclear facilities and always return int from main().
the problem is that obj.U.structPtr points to a struct, which can have a constructor and destructor.
delete also requires the correct type, otherwise it cannot call the destructor.
So it is illegal to create a char array with new and delete it as an struct pointer.
It would be okay if you use malloc and free. This won't call the constructor and destructor.
Suppose in C++, I have the following code:
class Foo {
private:
double* myData;
public:
Foo(double data[]) {
myData = data;
}
}
int main() {
double mainData[] = {1.0};
Foo myfoo(mainData);
}
As far as my knowledge can tell, mainData is treated as a pointer when passed into the Foo constructor, so the line myData = data only assigns the pointer address. So no extra memory is allocated here, right? But then, is the Foo class responsible for providing a destructor that deallocates myData's memory? Or do we have a dynamic array pointer that actually points to stack memory?
Also, if I want to protect Foo's myData from changing when mainData is changed, is there a simple way to force the Foo constructor to copy it? Ideally myData would be a simple array, not a pointer, but changing the line double* myData to double myData[] doesn't seem to work because the size of the array is unknown until runtime.
The parameter here is not a dynamic array:
Foo(double data[])
In fact the declaration is equivalent to this:
Foo(double * data)
Even decltype will tell you they are the same thing, and those two signatures will conflict as overloads.
So, there is no allocation. You are only passing a pointer to the first element of the array.
Also, the only place where C++ will automatically copy an array is when it is a member of a class, and the empty bracket [] syntax for indeterminate size is not allowed for members. (Or if it is, the size is already determined by the time the class type is complete, before the copy constructor or assignment operator is generated.)
Also, if I want to protect Foo's myData from changing when mainData is changed, is there a simple way to force the Foo constructor to copy it? Ideally myData would be a simple array, not a pointer, but changing the line double* myData to double myData[] doesn't seem to work because the size of the array is unknown until runtime.
You can keep a copy of the data, but you will need a pointer if its size (or at least an upper bound) is unknown at compile time. I would recommend std::vector over a naked pointer, or at least std::unique_ptr< double[] >.
In this case myData points to an address on the stack, which calls the destructor for Foo when the function goes out of scope. Generally arrays are described as being dynamic when you use the keyword new to allocated them.
As for your second question, you're probably going to have to pass into the constructor a pointer to the array and the length of the array. You then need to dynamically create a double array (pointed to by myData), using the length that was passed in, and then make a copy.
Don't forget to delete the memory in the destructor.
A pointer only holds a memory address, without new or delete involved a pointer has nothing to do with allocation or deallocation. Thus your code wont invoke any memory allocation.
In order to delete an (dynamically allocated) array you have to do delete[] foo;
Only dynamically allocated objects must be deleted, if you class takes ownership (it manages the array, calls delete on destruction) passing an array with automatic storage duration is a very bad idea.
Yes, it does not allocate additional memory.
No, the destructor won't do anything with the class field if it hadn't been told so.
in Foo class instances will have pointer to data that is allocated/managed by other classes this is very bad design. The best thing is to make the Foo constructor make a copy and store it in the pointer. Then in the desctructor free that one. This would require passing the length of the array to the Foo constructor. I hope that helps.
I have a massive structure where the entire contents is scalar variables, enumerations, and arrays of scalars (stack-based) with the exception of one std::string variable.
Now, here's my question...
Can I memset the structure to 0 for its whole size (like I would if it was simply all scalars), or is that not possible with the std::string being in there? I'm not sure what memset would do to its internal representation.
And if you're going to say its good/bad please explain why - I'd like to know why it is the way it is :)
No, you cant, it would overwrite the internal state of the string and make bad things happen. You could wrap all the POD stuff in a seperate struct and put that in your current one, that way you could memset that and let the string default construct.
Edit: Just to clarify, the string will almost certainly be storing a pointer to the memory its allocated for storage. The string's constructor will always have run before you can memset it (even if you memset this in the constructor of your type, the string constructor would run first). So you would be overwriting this pointer value, and instead of pointing to its storage, it would a pointer to NULL, or some other almost definitely invalid value.
Here's an exotic idea: Suppose your class Foo has lots of primitive members which remain uninitialized in Foo's constructor, with the exception of one string:
class Foo
{
int a;
double b;
std::string s;
};
The constructor Foo::Foo() will correctly initialize the string, but it won't care for anything else. So, let's zero out the memory before we construct!
void * addr = ::operator new(sizeof(Foo));
std::memset(addr, 0, sizeof(Foo));
Foo * p = new (addr) Foo;
// later
p->~Foo();
::operator delete(addr);
Of course it would be cleaner to just initialize all the members to zero in the constructor, but perhaps you have your own reasons that you don't want to create a custom constructor.
Zeroing std::string member is very bad idea, it will cause memory leak! Never do this!
In one C++ open source project, I see this.
struct SomeClass {
...
size_t data_length;
char data[1];
...
}
What are the advantages of doing so rather than using a pointer?
struct SomeClass {
...
size_t data_length;
char* data;
...
}
The only thing I can think of is with the size 1 array version, users aren't expected to see NULL. Is there anything else?
With this, you don't have to allocate the memory elsewhere and make the pointer point to that.
No extra memory management
Accesses to the memory will hit the memory cache (much) more likely
The trick is to allocate more memory than sizeof (SomeClass), and make a SomeClass* point to it. Then the initial memory will be used by your SomeClass object, and the remaining memory can be used by the data. That is, you can say p->data[0] but also p->data[1] and so on up until you hit the end of memory you allocated.
Points can be made that this use results in undefined behavior though, because you declared your array to only have one element, but access it as if it contained more. But real compilers do allow this with the expected meaning because C++ has no alternative syntax to formulate these means (C99 has, it's called "flexible array member" there).
This is usually a quick(and dirty?) way of avoiding multiple memory allocations and deallocations, though it's more C stylish than C++.
That is, instead of this:
struct SomeClass *foo = malloc(sizeof *foo);
foo->data = malloc(data_len);
memcpy(foo->data,data,data_len);
....
free(foo->data);
free(foo);
You do something like this:
struct SomeClass *foo = malloc(sizeof *foo + data_len);
memcpy(foo->data,data,data_len);
...
free(foo);
In addition to saving (de)allocation calls, this can also save a bit of memory as there's no space for a pointer and you could even use space that otherwise could have been struct padding.
Usually you see this as the final member of a structure. Then whoever mallocs the structure, will allocate all the data bytes consecutively in memory as one block to "follow" the structure.
So if you need 16 bytes of data, you'd allocate an instance like this:
SomeClass * pObj = malloc(sizeof(SomeClass) + (16 - 1));
Then you can access the data as if it were an array:
pObj->data[12] = 0xAB;
And you can free all the stuff with one call, of course, as well.
The data member is a single-item array by convention because older C compilers (and apparently the current C++ standard) doesn't allow a zero-sized array. Nice further discussion here: http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
They are semantically different in your example.
char data[1] is a valid array of char with one uninitialized element allocated on the stack. You could write data[0] = 'w' and your program would be correct.
char* data; simply declares a pointer that is invalid until initialized to point to a valid address.
The structure can be simply allocated as a single block of memory instead of multiple allocations that must be freed.
It actually uses less memory because it doesn't need to store the pointer itself.
There may also be performance advantages with caching due to the memory being contiguous.
The idea behind this particular thing is that the rest of data fits in memory directly after the struct. Of course, you could just do that anyway.