C++ string / container allocation

C++ string / container allocation - c++

This is probably obvious to a C++ non-noob, but it's stumping me a bit - does a string member of a class allocate a variable amount of space in that class? Or does it just allocate a pointer internally to some other space in memory? E.g. in this example:
class Parent {
public:
vector<Child> Children;
}
class Child {
public:
string Name;
}
How is that allocated on the heap if I create a "new Parent()" and add some children with varying length strings? Is Parent 4 bytes, Child 4 bytes (or whatever the pointer size, plus fixed size internal data), and then a random pile of strings somewhere else on the heap? Or is it all bundled together in memory?
I guess in general, are container types always fixed size themselves, and just contain pointers to their variable-sized data, and is that data always on the heap?

Classes in C++ are always fixed size. When there is a variable sized component, e.g., the elements of a vector or the characters in a string, they may be allocated on the heap (for small strings they may also be embedded in the string itself; this is known as the small string optimization). That is, your Parent object would contain a std::vector<Child> where the Child objects are allocated on the heap (the std::vector<...> object itself probably keeps three words to its data but there are several ways things may be laid out). The std::string objects in Child allocate their own memory. That is, there may be quite a few memory allocations.
The C++ 2011 standard thoroughly defines allocators to support passing an allocation mechanism to an object and all its children. Of course, the classes need to also support this mechanism. If your Parent and Child classes had suitable constructors taking an allocator and would pass this allocator to all members doing allocations, it would be propagated through the system. This way, allocation of objects belong together can be arranged to be in reasonably close proximity.

Classes in C++ always have a fixed size. Therefore vector and string can only contain pointers to heap allocated memory* (although they contain typically more data then one pointer, since it also needs to store the length). Therefore the object itself always has a fixed length.
*For string this is not entirely correct. Often an optimization technique called short string optimization is used. In that case small strings are embedded inside the object (in the place where otherwise the pointer to heap data would be stored) and heap memory is only allocated if the string is too long.

Yes -- using your words -- container types always fixed size themselves, and just contain pointers to their variable-sized data.
If we have vector<int> vi;, the size of vi is always fixed, sizeof(vector<int>) to be exact, irrespective of the number of int's in vi.

does a string member of a class allocate a variable amount of space in that class?
No, it does not.
Or does it just allocate a pointer internally to some other space in memory?
No, it does not.
An std::string allocates wahtever sizeof(std::string) is.
Do not confuse
the size of an object
the size of the resources, that an object is responsible for.

Related

How the memory allocation works for nested containers?

For example, i have std::vector<std::string>, how the allocators for vector and string work together?
Say the allocator for vector allocates a chunk of memory ChunkVec, does the allocator for string allocate memory inside ChunkVec so that the memory allocated for each string sums to ChunkVec? Or the allocator for string allocates memory outside ChunkVec?
Is the answer the same for other nested containers?
And is there a difference between C++ and C++11?

i have std::vector < std::string >
On my Ubuntu 15.04, 64 bit, a std::string is 8 bytes, regardless of contents.
(using std::string s1; I am comparing sizeof(std::string) versus s1.size(). Then append to the string and then print them both again.)
I have not noticed or found a way to specify what allocator to use when the string allocates its data from the heap, therefore, I believe it must use some standard allocator, probably new, but I have never looked into the std::string code. And that standard allocator would know nothing about your vector.
does the allocator for string allocate memory inside ChunkVec so that
the memory allocated for each string sums to ChunkVec?
I believe the part of the string in a vector element is only the 8 byte pointer to where the string 'proper' resides in the heap. So no.
Or the allocator for string allocates memory outside ChunkVec?
Yes, I believe so.
You can confirm this by printing the addresses of the vector elements i, and i+1, and the address of the some of the chars of element i.
By the way, on my implementation (g++ 4.9.2) , sizeof(std::vector) is 24 bytes, regardless of the number of data elements (vec.size()) and regardless of element size. Note also, that I have read about some implementations where some of a small vector might actually reside in the 24 bytes. Implementation details can be tedious, but helpful. Still, some might be interested in why you want to know this.
Be aware we are talking about implementation details (I think) ... so your exploration might vary from mine.
Is the answer the same for other nested containers?
I have not explored every container (but I have used many "std::vector< std::string >").
Generally, and without much thought, I would guess not.
And is there a difference between C++ and C++11?
Implementation details change for various reasons, including language feature changes. What have you tried?

ChunkVec stores only the pointer to the data allocated by string.(in this case it stores a std::string object which stores pointer). Its a totally different allocation. A Good way to understand it is to analyze the tree structure in programming.
struct node
{
int data;
struct node* left;
struct node* right;
};
left and right are different memory allocations than node. You can remove them without removing this very node.

std::string has two things to store--the size of the string and the content. If I allocate one on the stack, the size will be on the stack as well. For short strings, the character data itself will also be on the stack. These two items make up the "control structure". std::string only uses its allocator for long strings that don't fit in its fixed-size control structure.
std::vector allocates memory to store the control structure of the std::string. Any allocation required by std::string to store long strings could be in a completely different area of memory than the vector. Short strings will be entirely managed be the allocator of std::vector.

How do strings allocate memory in c++?

I know that dynamic memory has advantages over setting a fixed size array and and using a portion of it. But in dynamic memory you have to enter the amount data that you want to store in the array. When using strings you can type as many letters as you want(you can even use strings for numbers and then use a function to convert them). This fact makes me think that dynamic memory for character arrays is obsolete compared to strings.
So i wanna know what are the advantages and disadvantages when using strings? When is the space occupied by strings freed? Is maybe the option to free your dynamically allocated memory with delete an advantage over strings? Please explain.

The short answer is "no, there is no drawbacks, only advantages" with std::string over character arrays.
Of course, strings do USE dynamic memory, it just hides the fact behind the scenes so you don't have to worry about it.
In answer to you question: When is the space occupied by strings freed? this post may be helpful. Basically, std::strings are freed once they go out of scope. Often the compiler can decide when to allocate and release the memory.

std::string usually contains an internal dynamically allocated buffer. When you assign data, or if you push back new data, and the current buffer size is not sufficient, a new buffer is allocated with an increased size and the old data is copied or moved to the new buffer. The old buffer is then deallocated.
The main buffer is deallocated when the string goes out of scope. If the string object is a local variable in a function (on the stack), it will deallocate at the end of the current code block. If it's a function parameter, when the function exits. If it's a class member, whenever the class is destroyed.
The advantage of strings is flexibility (increases in size automatically) and safety (harder to go over the bounds of an array). A fixed-size char array on the stack is faster as no dynamic allocation is required. But you should worry about that if you have a performance problem, and not before.

well, your question got me thinking, and then i understood that you are talking about syntax differences, because both ways are dynamic allocating char arrays. the only difference is in the need:
if you need to create a string containing a sentence then you can, and
that's fine, not to use malloc
if you want an array and to "play" with it, meaning change or set the cells cording to some method, or changing it's size, then initiating it with malloc would be the appropriate way
the only reason i see to a static allocating char a[17] (for example) is for a single purpose string that you need, meaning only when you know the exact size you'll need and it won't change
and one important point the i found:
In dynamic memory allocation, if the memory is being continually allocated but the one allocated for objects that are not in use, is not released, then it can lead to stack overflow condition or memory leak which is a big disadvantage.

c++ maximum std::string length is dictated by stack size or heap size?

as asked in the question.
std::string myVar; the maximum character it can hold is dictated by stack or heap?
Thank you

By default, the memory allocated for std::string is allocated dynamically.
Note that std::string has a max_size() function returning the maximum number of character supported by the implementation. The usefulness of this is questionable, though, as it's a implementation maximum, and doesn't take into consideration other resources, like memory. Your real limit is much lower. (Try allocating 4GB of contiguous memory, or take into account memory exhaustion elsewhere.)

A std::string object will be allocated the same way an int or any other type must be: on the stack if it's a local variable, or it might be static, or on the heap if new std::string is used or new X where X contains the string etc..
But, that std::string object may contain at least a pointer to additional memory provided by the allocator with which basic_string<> was instantiated - for the std::string typedef that means heap-allocated memory. Either directly in the original std::string object memory or in pointed-to heap you can expect to find:
a string size member,
possibly some manner of reference counter or links,
the textual data the string stores (if any)
Some std::string implementations have "short string" optimisations where they pack strings of only a few characters directly into the string object itself (for memory efficiency, often using some kind of union with fields that are used for other purposes when the strings are longer). But, for other string implementations, and even for those with short-string optimisations when dealing with strings that are too long to fit directly in the std::string object, they will have to follow pointers/references to the textual data which is stored in the allocator-provided (heap) memory.

Why is it not possible to access the size of a new[]'d array?

When you allocate an array using new [], why can't you find out the size of that array from the pointer? It must be known at run time, otherwise delete [] wouldn't know how much memory to free.
Unless I'm missing something?

In a typical implementation the size of dynamic memory block is somehow stored in the block itself - this is true. But there's no standard way to access this information. (Implementations may provide implementation-specific ways to access it). This is how it is with malloc/free, this is how it is with new[]/delete[].
In fact, in a typical implementation raw memory allocations for new[]/delete[] calls are eventually processed by some implementation-specific malloc/free-like pair, which means that delete[] doesn't really have to care about how much memory to deallocate: it simply calls that internal free (or whatever it is named), which takes care of that.
What delete[] does need to know though is how many elements to destruct in situations when array element type has non-trivial destructor. And this is what your question is about - the number of array elements, not the size of the block (these two are not the same, the block could be larger than really required for the array itself). For this reason, the number of elements in the array is normally also stored inside the block by new[] and later retrieved by delete[] to perform the proper array element destruction. There are no standard ways to access this number either.
(This means that in general case, a typical memory block allocated by new[] will independently, simultaneously store both the physical block size in bytes and the array element count. These values are stored by different levels of C++ memory allocation mechanism - raw memory allocator and new[] itself respectively - and don't interact with each other in any way).
However, note that for the above reasons the array element count is normally only stored when the array element type has non-trivial destructor. I.e. this count is not always present. This is one of the reasons why providing a standard way to access that data is not feasible: you'd either have to store it always (which wastes memory) or restrict its availability by destructor type (which is confusing).
To illustrate the above, when you create an array of ints
int *array = new int[100];
the size of the array (i.e. 100) is not normally stored by new[] since delete[] does not care about it (int has no destructor). The physical size of the block in bytes (like, 400 bytes or more) is normally stored in the block by the raw memory allocator (and used by raw memory deallocator invoked by delete[]), but it can easily turn out to be 420 for some implementation-specific reason. So, this size is basically useless for you, since you won't be able to derive the exact original array size from it.

You most likely can access it, but it would require intimate knowledge of your allocator and would not be portable. The C++ standard doesn't specify how implementations store this data, so there's no consistent method for obtaining it. I believe it's left unspecified because different allocators may wish to store it in different ways for efficiency purposes.

It makes sense, as for example the size of the allocated block may not necessarily be the same size as the array. While it is true that new[] may store the number of elements (calling each elements destructor), it doesn't have to as it wouldn't be required for a empty destructor. There is also no standard way (C++ FAQ Lite 1, C++ FAQ Lite 2) of implementing where new[] stores the array length as each method has its pros and cons.
In other words, it allows allocations to as fast an cheap as possible by not specifying anything about the implementation. (If the implementation has to store the size of the array as well as the size of the allocated block every time, it wastes memory that you may not need).

Simply put, the C++ standard does not require support for this. It is possible that if you know enough about the internals of your compiler, you can figure out how to access this information, but that would generally be considered bad practice. Note that there may be a difference in memory layout for heap-allocated arrays and stack-allocated arrays.
Remember that essentially what you are talking about here are C-style arrays, too -- even though new and delete are C++ operators -- and the behavior is inherited from C. If you want a C++ "array" that is sized, you should be using the STL (e.g. std::vector, std::deque).

Can C++ automatic variables vary in size?

In the following C++ program:
#include <string>
using namespace std;
int main()
{
string s = "small";
s = "bigger";
}
is it more correct to say that the variable s has a fixed size or that the variable s varies in size?

It depends on what you mean by "size".
The static size of s (as returned by sizeof(s)) will be the same.
However, the size occupied on the heap will vary between the two cases.
What do you want to do with the information?

i'll say yes and no.
s will be the same string instance but it's internal buffer (which is preallocated depending on your STL implementation) will contain a copy of the constant string you wanted to affect to it.
Should the constant string (or any other char* or string) have a bigger size than the internal preallocated buffer of s, s buffer will be reallocated depending on string buffer reallocation algorithm implemented in your STL implmentation.

This is going to lead to a dangerous discussion because the concept of "size" is not well defined in your question.
The size of a class s is known at compile time, it's simply the sum of the sizes of it's members + whatever extra information needs to be kept for classes (I'll admit I don't know all the details) The important thing to get out of this, however is the sizeof(s) will NOT change between assignments.
HOWEVER, the memory footprint of s can change during runtime through the use of heap allocations. So as you assign the bigger string to s, it's memory footprint will increase because it will probably need more space allocated on the heap. You should probably try and specify what you want.

The std::string variable never changes its size. It just refers to a different piece of memory with a different size and different data.

Neither, exactly. The variable s is referring to a string object.
#include <string>
using namespace std;
int main()
{
string s = "small"; //s is assigned a reference to a new string object containing "small"
s = "bigger"; //s is modified using an overloaded operator
}
Edit, corrected some details and clarified point
See: http://www.cplusplus.com/reference/string/string/ and in particular http://www.cplusplus.com/reference/string/string/operator=/
The assignment results in the original content being dropped and the content of the right side of the operation being copied into the object. similar to doing s.assign("bigger"), but assign has a broader range of acceptable parameters.
To get to your original question, the contents of the object s can have variable size. See http://www.cplusplus.com/reference/string/string/resize/ for more details on this.

A variable is an object we refer to by a name. The "physical" size of an object -- sizeof(s) in this case -- doesn't change, ever. They type is still std::string and the size of a std::string is always constant. However, things like strings and vectors (and other containers for that matter) have a "logical size" that tells us how many elements of some type they store. A string "logically" stores characters. I say "logically" because a string object doesn't really contain the characters directly. Usually it has only a couple of pointers as "physical members". Since the string objects manages a dynamically allocated array of characters and provides proper copy semantics and convenient access to the characters we can thing of those characters as members ("logical members"). Since growing a string is a matter of reallocating memory and updating pointers we don't even need sizeof(s) to change.

i would say this is string object , And it has capability to grow dynamically and vice-versa

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js