std::string and placement new - c++

I found this example of using placement new in C++, and it doesn't make sense to me.
It is my view that this code is exception-prone, since more memory than what was allocated may be used.
char *buf = new char[sizeof(string)];
string *p = new (buf) string("hi");
If "string" is the C++ STD::string class,then buf will get an allocation
the size of an empty string object (which with my compiler gives 28 bytes),
and then the way I see it if you initialize your string with more chars you might
exceed the memory allocated. For example:
string *p = new (buf) string("hiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii");
On my VS this seems to be working nevertheless, and I'm not sure if this is because
the exception is somehow waived or I simply don't understand how string works.
Can someone help clarify?

You're misunderstanding the (typical) internal implementation of std::string. Usually it's implemented something like this:
class string {
protected:
char *buffer;
size_t capacity;
size_t length;
public:
// normal interface methods
};
The key point is that there are two distinct blocks of memory: one for the string object itself, containing the members shown above, and one for the content of the string. When you do your placement new, it's only the string object that is placed into the provided memory, not the memory for buffer, where the content of the string is stored. That is allocated separately, automatically, by the string class as needed.

The size returned by sizeof is the number of bytes required to store the members of the class, with some implementation-defined padding. That memory must be allocated before the constructor of std::string can be called.
However, when the constructor runs, it may allocate a larger amount of memory, which indeed it must in order to store large strings. That amount of memory is not part of the sizeof size, and you don't need to allocate it yourself.

Related

How can array of doubles be placed in the char buffer using new? [duplicate]

Sorry if this question will sound stupid, but I'm just starting to learn C++ and there is something confusing me about the placement new
I've been reading C++ Primer (which I find is a very good book to learn C++), and in the placement new section there is an example given. The example uses a char array to provide memory space for the placement new
const int BUF = 512;
const int N = 5;
char buffer[BUF];
double * pd1;
pd1 = new (buffer) double[N];
My question is why is it using a char array to provide memory space for the placement new? Also the last line in the code above is allocating memory for an array of double, how is that possible when the original memory space contains a char array? If the placement new is using the memory space of the char array, does this mean when we allocate the double array it overwrites the char array in that memory?
Again sorry if the question is strange, but hope I've made it quite clear.
why is it using a char array to provide memory space for the placement new?
Why not? char is the smallest type that C++ defines, and on virtually every implementation, it is one byte in size. Therefore, it makes a good type to use when you need to allocate a block of memory of a certain size.
C++ also has very specific mechanics about how arrays of char (and only char are allocated. A new char[*], for example, will not be aligned to the alignment of char. It will be aligned to the maximum normal alignment for any type. Thus, you could use it to allocate memory and then construct any type into that memory.
Also the last line in the code above is allocating memory for an array of double, how is that possible when the original memory space contains a char array?
It is not allocating anything. It is constructing an array, using the memory you have given it. That's what placement new does, it constructs an object in the memory provided.
If the placement new is using the memory space of the char array, does this mean when we allocate the double array it overwrites the char array in that memory?
Yes.
Yes, the char array and the double array would overlap, more specifically they would start at the same address in memory, i.e. (long)buffer and (long)pd1 would be the same. We can emphasize the overlap even more by making the byte sizes match (assuming sizeof(char) == 1):
const int N = 5;
char buffer[N * sizeof(double)];
double *pd1 = new (buffer) double[N];
Yes, if you modify the data pd1 points to, then the data buffer points to would also be modified. And the other way round as well. (See also the GCC flag -fstrict-aliasing to learn about how compiler optimizations work with such an overlap.)
There are no stupid questions.
It uses char probably because it makes you think about raw bytes (char is usually 1 byte long). The last line on the code is not allocating memory, it just places a double array over the mentioned buffer. If it was an object, it would also call the constructor. Yes, the char array gets overwritten.
Memory is memory. The machine doesn't care what type of data is stored there, that's up to the language to define and enforce. The answer to "why" is "because C++ is designed to let you."
char buffer[BUF]; is just some memory. There's no type information attached to the bytes composing buffer. Only the compiler knows, that this memory region is supposed to hold characters. You could use any type, even double:
double buffer[BUF];
double *pd1 = new (buffer) double[N];

C++ Memory Allocation for Struct With String

If I have a string member within a struct that's then stored into an array, how does memory get allocated?
struct garage {
int ncars;
int nspaces;
int nmechanics;
string name;
}
But for that last member, name, string is basically a typedef of basic_string, so its memory gets allocated when it gets defined, right? For example: garage.name = "Cool Cars";
But if I don't define that member YET, and store the struct in an array:
garage nearby_garages[15];
garage g0, g1, g2;
nearby_garages[0] = g0; nearby_garages[1] = g1; nearby_garages[2] = g2;
garage current;
current = nearby_garage[1];
current.name = "Jack's Garage";
string size can vary depending on the length of the string/data. struct size can vary depending on string size, which means the array size can vary depending on struct size, but then the array would fall apart if it was pre-allocated. The only way I can see this working is if string is a pointer to a memory location not sandwiched within the struct. But I don't think that is what's happening here. Help please?
Your garage only has references so your array can be allocated on the stack with no problem. Internally however, std::string does new/malloc to create memory for your data.
Your garage then holds a reference to a string which holds a pointer to a chunk of memory containing your data. Nothing breaks here because the garage knows at creation that the string will have a pointer to data so the pointer already has a space for it.
When you include literals such as "Jack's Garage", the compiler creates a special place to hold those strings, they are not allocated in the same memory segment.
Finally, when you call current.name = "Jack's Garage", C++ will determine that it needs a conversion between a const char* to a std::string. Fortunately for all of us, such a conversion exists. Your assignment is then transformed to
current.name = std::string("Jack's Garage");
Then the assignment operator of std::string will copy the value to current.name. New memory will be allocated inside garage to hold that value and (probably) that memcpy will be called at a lower level.
std::string is similar in implementation to an std::vector: Essentially a pointer and size, two pointers (begin and end), or one pointer and the ability to query allocator block sizes.
In some cases, it may also implement SSO (Small String Optimization) where the string structure itself has a small buffer for short strings, and switches to using a pointer for longer strings.
Without SSO, the backing store for characters owned by an std::string is allocated upon construction or assignment with a literal (or with another string, if the implementation isn't COW), or re-allocated during a concatenation.
In your code above, current.name = "Jack's Garage", would be the allocation site (without SSO in this case).

C++: struct and new keyword

I'm a beginner to C++, I've got the following piece of code:
struct Airline {
string Name;
int diameter;
int weight;
};
Airline* myPlane = new Airline;
my question is when I call the method new it allocates memory, if I recall correctly. How does the PC know how much memory to allocate,especially given that there is a string type in there?
Thanks
An std::string object is fixed-size; it contains a pointer to an actual buffer of characters along with its length. std::string's definition looks something like
class string
{
char *buffer;
size_t nchars;
public:
// interface
};
It follows that your Airline objects also have a fixed size.
Now, new does not only allocate; it also initializes your object, including the std::string, which means it probably sets the char pointer to 0 because the string is empty.
You can also get the size of the structure, by using sizeof:
cout << "sizeof(Airline) = " << sizeof(Airline) << endl;
This is because the compiler knows the fields inside the structure, and adds up the sizes of each structure member.
The string object is no different than your structure. It is actually a class in the standard library, and not a special type like int or float that is handled by the compiler. Like your structure, the string class contains fields that the compiler knows the size of, and so it knows the size of your complete structure and uses that when you use new.
The call to new will allocate sizeof(Airline) which is what is needed to hold an object of type Airline.
As of the management for strings, the string object holds some internal data to manage the memory of the actual data stored, but not the data itself (unless the small object optimization is in use). While the idea is the same that has been pointed by others with stores a pointer to the actual string, that is not precise enough, as it implementations will store that pointer plus extra data required to hold the size() and capacity() (and others, like reference counts in reference counting implementations).
The memory for the string may or may not be within class string. Possible (and probably), class string will manage its own memory, having only a pointer to the memory used to store the data. Example:
struct Airlane {
String Name {
char *data; // size = 4
size_t size; // size = 4
}
int diameter; // size = 4
int weight; // size = 4
}; // size = 16
Note that those are not necessarily actual sizes, they are just for example.
Also note that in C++ (unlike C, for example), for every class T, sizeof T is a compile time constant, meaning that objects can never have dynamic size. This in effect means: As soon as you need runtime dynamic sized data, there have to be external (w.r.t. the object) memory areas. This may imply the use of standard containers like std::string or std::vector, or even manually managed resources.
This in turn means, operator new does not need to know the dynamic size of all members, recursively, but only the size of the outermost class, the one that you allocate. When this outer class needs more memory, it has to manage it itself. Some exemplary p-code:
Airline* myPlane = new Airline {
Name = {
data = new char[some-size]
...
}
...
}
The inner allocations are done by the holding constructors:
Airline::Airline() : string(), ... {}
string::string () : data(new char[...] ... {}
operator new does nothing else but to allocate some fixed size memory as the "soil" for Airline (see first p-code), and then "seeds" Airlines constructor, which itself has to manage its lifetime in that restricted volume of "soil", by invoking the string constructor (implicitly or explicitly), which itself does another new.
When you allocate Airline, new will allocate enough space on the heap for two ints, string and its fields.
A string will always be the same size on the stack. However, internally, the string stores a pointer to a character array.

Why use array size 1 instead of pointer?

In one C++ open source project, I see this.
struct SomeClass {
...
size_t data_length;
char data[1];
...
}
What are the advantages of doing so rather than using a pointer?
struct SomeClass {
...
size_t data_length;
char* data;
...
}
The only thing I can think of is with the size 1 array version, users aren't expected to see NULL. Is there anything else?
With this, you don't have to allocate the memory elsewhere and make the pointer point to that.
No extra memory management
Accesses to the memory will hit the memory cache (much) more likely
The trick is to allocate more memory than sizeof (SomeClass), and make a SomeClass* point to it. Then the initial memory will be used by your SomeClass object, and the remaining memory can be used by the data. That is, you can say p->data[0] but also p->data[1] and so on up until you hit the end of memory you allocated.
Points can be made that this use results in undefined behavior though, because you declared your array to only have one element, but access it as if it contained more. But real compilers do allow this with the expected meaning because C++ has no alternative syntax to formulate these means (C99 has, it's called "flexible array member" there).
This is usually a quick(and dirty?) way of avoiding multiple memory allocations and deallocations, though it's more C stylish than C++.
That is, instead of this:
struct SomeClass *foo = malloc(sizeof *foo);
foo->data = malloc(data_len);
memcpy(foo->data,data,data_len);
....
free(foo->data);
free(foo);
You do something like this:
struct SomeClass *foo = malloc(sizeof *foo + data_len);
memcpy(foo->data,data,data_len);
...
free(foo);
In addition to saving (de)allocation calls, this can also save a bit of memory as there's no space for a pointer and you could even use space that otherwise could have been struct padding.
Usually you see this as the final member of a structure. Then whoever mallocs the structure, will allocate all the data bytes consecutively in memory as one block to "follow" the structure.
So if you need 16 bytes of data, you'd allocate an instance like this:
SomeClass * pObj = malloc(sizeof(SomeClass) + (16 - 1));
Then you can access the data as if it were an array:
pObj->data[12] = 0xAB;
And you can free all the stuff with one call, of course, as well.
The data member is a single-item array by convention because older C compilers (and apparently the current C++ standard) doesn't allow a zero-sized array. Nice further discussion here: http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
They are semantically different in your example.
char data[1] is a valid array of char with one uninitialized element allocated on the stack. You could write data[0] = 'w' and your program would be correct.
char* data; simply declares a pointer that is invalid until initialized to point to a valid address.
The structure can be simply allocated as a single block of memory instead of multiple allocations that must be freed.
It actually uses less memory because it doesn't need to store the pointer itself.
There may also be performance advantages with caching due to the memory being contiguous.
The idea behind this particular thing is that the rest of data fits in memory directly after the struct. Of course, you could just do that anyway.

How do I over-allocate memory using new to allocate variables within a struct?

So I have a couple of structs...
struct myBaseStruct
{
};
struct myDerivedStruct : public myBaseStruct
{
int a, b, c, d;
unsigned char* ident;
};
myDerivedStruct* pNewStruct;
...and I want to dynamically allocate enough space so that I can 'memcpy' in some data, including a zero-terminated string. The size of the base struct is apparently '1' (I assume because it can't be zero) and the size of the derived is 20, which seems to make sense (5 x 4).
So, I have a data buffer which is a size of 29, the first 16 bytes being the ints and the remaining 13 being the string.
How can I allocate enough memory for pNewStruct so that there is enough for the string? Ideally, I just want to go:
allocate 29 bytes at pNewStruct;
memcpy from buffer into pNewStruct;
Thanks,
You go back to C or abandon these ideas and actually use C++ as it's intended.
Use the constructor to allocate memory and destructor to delete it.
Don't let some other code write into your memory space, create a function that will ensure memory is allocated.
Use a std:string or std::vector to hold the data rather than rolling your own container class.
Ideally you should just say:
myDerivedClass* foo = new myDerivedClass(a, b, c, d, ident);
In the current C++ standard, myDerivedStruct is non-POD, because it has a base class. The result of memcpying anything into it is undefined.
I've heard that C++0x will relax the rules, so that more classes are POD than in C++98, but I haven't looked into it. Also, I doubt that very many compilers would lay out your class in a way that's incompatible with PODs. I expect you'd only have trouble with something that didn't do the empty base class optimisation. But there it is.
If it was POD, or if you're willing to take your chances with your implementation, then you could use malloc(sizeof(myStruct)+13) or new char[sizeof(myStruct)+13] to allocate enough space, basically the same as you would in C. The motivation presumably is to avoid the memory and time overhead of just putting a std::string member in your class, but at the cost of having to write the code for the manual memory management.
You can overallocate for any class instance, but it implies a certain amount of management overhead. The only valid way to do this is by using a custom memory allocation call. Without changing the class definition, you can do this.
void* pMem = ::operator new(sizeof(myDerivedStruct) + n);
myDerivedStruct* pObject = new (pMem) myDerivedStruct;
Assuming that you don't overload operator delete in the hierarchy then delete pObject will be a correct way to destroy pObject and deallocate the allocated memory. Of course, if you allocate any objects in the excess memory area then you must correctly free them before deallocating the memory.
You then have access to n bytes of raw memory at this address: void* p = pObject + 1. You can memcpy data to and from this area as you like. You can assign to the object itself and shouldn't need to memcpy its data.
You can also provide a custom memory allocator in the class itself that takes an extra size_t describing the amount of excess memory to allocate enabling you to do the allocation in a single new expression, but this requires more overhead in the class design.
myDerivedStruct* pObject = new (n) myDerivedStruct;
and
struct myDerivedStruct
{
// ...
void* operator new(std::size_t objsize, std::size_t excess storage);
// other operator new and delete overrides to make sure that you have no memory leaks
};
You can allocate any size you want with malloc:
myDerivedStruct* pNewStruct = (myDerivedStruct*) malloc(
sizeof(myDerivedStruct) + sizeof_extra data);
You have a different problem though, in that myDerivedStruct::ident is a very ambigous construct. It is a pointer to a char (array), then the structs ends with the address where the char array starts? ident can point to anywhere and is very ambigous who owns the array ident points to. It seems to me that you expect the struct to end with the actual char array itself and the struct owns the extra array. Such structures usualy have a size member to keep track of teir own size so that API functions can properly manage them and copy them, and the extra data starts, by convention, after the structure ends. Or they end with a 0 length array char ident[0] although that creates problems with some compilers. For many reasons, there is no place for inheritance in such structs:
struct myStruct
{
size_t size;
int a, b, c, d;
char ident[0];
};
Mixing memcpy and new seems like a terrible idea in this context. Consider using malloc instead.
You can dynamically allocate space by doing:
myDerivedStruct* pNewStruct = reinterpret_cast<myDerivedStruct*>(new char[size]);
however
Are you sure you want to do this?
Also, note that if you are intending to use ident as the pointer to the start of your string, that would be incorrect. You infact need &ident, since the ident variable is itself at the start of your unused space, interpreting what is at that space as a pointer is most likely going to be meaningless. Hence, it would make more sense if ident were unsigned char or char rather than unsigned char*.
[edit again]
I'd just like to emphasise that what you're doing is really a really really bad idea.
char* buffer = [some data here];
myDerivedStruct* pNewStruct = new myDerivedStruct();
memcpy(buffer,pNewStruct,4*sizeof(int));
pNewStruct->ident = new char[ strlen(buffer+(4*sizeof int)) ];
strcpy(pNewStruct->ident,buffer+(4*sizeof int));
Something like that.
Is the buffer size known at compile time? A statically allocated array would be an easier solution in that case. Otherwise, see Remus Rusanu's answer above. That's how the win32 api manages variable sized structs.
struct myDerivedStruct : public myBaseStruct
{
int a, b, c, d;
unsigned char ident[BUFFER_SIZE];
};
Firstly, I don't get what's the point of having a myBaseStruct base. You proivided no explanation.
Secondly, what you declared in your original post will no work with the data layout you described. For what you described in the OP, you need the last member of the struct to be an array, not a pointer
struct myDerivedStruct : public myBaseStruct {
int a, b, c, d;
unsigned char ident[1];
};
Array size doesn't matter, but it should be greater than 0. Arrays of size 0 are explicitly illegal in C++.
Thirdly, if you for some reason want to use new specifically, you'll have to allocate a buffer of char objects of required size and then convert the resultant pointer to your pointer type
char *raw_buffer = new char[29];
myDerivedStruct* pNewStruct = reinterpret_cast<myDerivedStruct*>(raw_buffer);
After that you can do your memcpy, assuming that the size is right.