How the memory allocation works for nested containers? - c++

For example, i have std::vector<std::string>, how the allocators for vector and string work together?
Say the allocator for vector allocates a chunk of memory ChunkVec, does the allocator for string allocate memory inside ChunkVec so that the memory allocated for each string sums to ChunkVec? Or the allocator for string allocates memory outside ChunkVec?
Is the answer the same for other nested containers?
And is there a difference between C++ and C++11?

i have std::vector < std::string >
On my Ubuntu 15.04, 64 bit, a std::string is 8 bytes, regardless of contents.
(using std::string s1; I am comparing sizeof(std::string) versus s1.size(). Then append to the string and then print them both again.)
I have not noticed or found a way to specify what allocator to use when the string allocates its data from the heap, therefore, I believe it must use some standard allocator, probably new, but I have never looked into the std::string code. And that standard allocator would know nothing about your vector.
does the allocator for string allocate memory inside ChunkVec so that
the memory allocated for each string sums to ChunkVec?
I believe the part of the string in a vector element is only the 8 byte pointer to where the string 'proper' resides in the heap. So no.
Or the allocator for string allocates memory outside ChunkVec?
Yes, I believe so.
You can confirm this by printing the addresses of the vector elements i, and i+1, and the address of the some of the chars of element i.
By the way, on my implementation (g++ 4.9.2) , sizeof(std::vector) is 24 bytes, regardless of the number of data elements (vec.size()) and regardless of element size. Note also, that I have read about some implementations where some of a small vector might actually reside in the 24 bytes. Implementation details can be tedious, but helpful. Still, some might be interested in why you want to know this.
Be aware we are talking about implementation details (I think) ... so your exploration might vary from mine.
Is the answer the same for other nested containers?
I have not explored every container (but I have used many "std::vector< std::string >").
Generally, and without much thought, I would guess not.
And is there a difference between C++ and C++11?
Implementation details change for various reasons, including language feature changes. What have you tried?

ChunkVec stores only the pointer to the data allocated by string.(in this case it stores a std::string object which stores pointer). Its a totally different allocation. A Good way to understand it is to analyze the tree structure in programming.
struct node
{
int data;
struct node* left;
struct node* right;
};
left and right are different memory allocations than node. You can remove them without removing this very node.

std::string has two things to store--the size of the string and the content. If I allocate one on the stack, the size will be on the stack as well. For short strings, the character data itself will also be on the stack. These two items make up the "control structure". std::string only uses its allocator for long strings that don't fit in its fixed-size control structure.
std::vector allocates memory to store the control structure of the std::string. Any allocation required by std::string to store long strings could be in a completely different area of memory than the vector. Short strings will be entirely managed be the allocator of std::vector.

Related

delete[] a specific array block allocated with new in c++

I am writing a simple stack program in c++. I have dynamically allocated the array.
I was wondering if there's any way to delete(deallocate) the specific block of an array.
int *arr = NULL ;
arr = new int[some_size];
.
.
.
.
//now suppose I want to deallocate a specific block
//suppose
int i = 4;
//can I do something like
delete[i] arr;
I know this is sort of dumb question but still I would like to know.
Thank you.
It is only possible to deallocate the entire allocation - not parts of it. There is no such syntax as delete[i]. Only a pointer returned directly from new[] may be passed to delete[].
However, you can implement your own allocator that can work on a single block from the global allocator, giving smaller allocations that can be deallocated individually. This is an advanced technique: Not for beginners. It is (probably) not something that you need to do to implement any program. But in some cases it can be useful for optimisation if you know what you are doing.
how am I going to implement my own allocator
Allocate some memory M from the global allocator.
Write a function that takes the number of bytes as argument. In the function, use some data structure to keep track of which parts of M have previously been allocated. Pick a range of bytes from M which has not been allocated. Mark it as allocated in the data structure, Return a pointer to the beginning of that range.
Write another function that takes a pointer as argument. Mark the memory that was reserved for that allocation as being free.
Even better, implement a type using these functions that conforms to the Allocator concept in the standard library so that the allocator can be used as a template argument to standard containers.
Arrays a kind of data structure that can store a fixed-size sequential collection of elements of the same type.
Arrays have a very special property of continuous memory allocation which means compiler allocates consecutive blocks of memory to a array.It gives array advantages such as
random access of elements
binary search
less memory consumption(as no pointer pointing to next element is required)
So , in general it is not possible to delete a specific block from a contiguous memory. So for this specific problem Linked List was invented.
But in C++ we have std:vector these are not static array but vectors which occupies more space than a static array.
Assuming that question is modified to vectors we can use pop_back function or erase function.
1.pop_back: Simply deletes the last vector .
int main()
{
std::vector<int> numbers;
print(numbers);
numbers.push_back(5);
numbers.push_back(3);
numbers.push_back(4);
print(numbers);
numbers.pop_back();
print(numbers);
}
2.erase: Erases the specified elements from the container.
Removes the element at pos.
Removes the elements in the range [first, last).
https://www.cplusplus.com/reference/string/string/erase/

C++ string / container allocation

This is probably obvious to a C++ non-noob, but it's stumping me a bit - does a string member of a class allocate a variable amount of space in that class? Or does it just allocate a pointer internally to some other space in memory? E.g. in this example:
class Parent {
public:
vector<Child> Children;
}
class Child {
public:
string Name;
}
How is that allocated on the heap if I create a "new Parent()" and add some children with varying length strings? Is Parent 4 bytes, Child 4 bytes (or whatever the pointer size, plus fixed size internal data), and then a random pile of strings somewhere else on the heap? Or is it all bundled together in memory?
I guess in general, are container types always fixed size themselves, and just contain pointers to their variable-sized data, and is that data always on the heap?
Classes in C++ are always fixed size. When there is a variable sized component, e.g., the elements of a vector or the characters in a string, they may be allocated on the heap (for small strings they may also be embedded in the string itself; this is known as the small string optimization). That is, your Parent object would contain a std::vector<Child> where the Child objects are allocated on the heap (the std::vector<...> object itself probably keeps three words to its data but there are several ways things may be laid out). The std::string objects in Child allocate their own memory. That is, there may be quite a few memory allocations.
The C++ 2011 standard thoroughly defines allocators to support passing an allocation mechanism to an object and all its children. Of course, the classes need to also support this mechanism. If your Parent and Child classes had suitable constructors taking an allocator and would pass this allocator to all members doing allocations, it would be propagated through the system. This way, allocation of objects belong together can be arranged to be in reasonably close proximity.
Classes in C++ always have a fixed size. Therefore vector and string can only contain pointers to heap allocated memory* (although they contain typically more data then one pointer, since it also needs to store the length). Therefore the object itself always has a fixed length.
*For string this is not entirely correct. Often an optimization technique called short string optimization is used. In that case small strings are embedded inside the object (in the place where otherwise the pointer to heap data would be stored) and heap memory is only allocated if the string is too long.
Yes -- using your words -- container types always fixed size themselves, and just contain pointers to their variable-sized data.
If we have vector<int> vi;, the size of vi is always fixed, sizeof(vector<int>) to be exact, irrespective of the number of int's in vi.
does a string member of a class allocate a variable amount of space in that class?
No, it does not.
Or does it just allocate a pointer internally to some other space in memory?
No, it does not.
An std::string allocates wahtever sizeof(std::string) is.
Do not confuse
the size of an object
the size of the resources, that an object is responsible for.

Allocate constant strings in container contiguously

Lets say I have a std::vector of const std::strings.
std::vector<const std::string> strs;
Now the default behavior here is that the actual string containers can be allocated anywhere on the heap, which pretty much disables any prefetching of data when iterating over the contained strings.
strs.push_back("Foo"); // allocates char block on heap
strs.push_back("Boo"); // allocates char block on heap
However, since the strings are "const" I would like the char blocks to be allocated contiguously or close to each other (when possible) in order to have the most efficient cache behavior when iterating over the strings.
Is there any way to achieve this behavior?
You need a custom allocator known as a memory region allocator. You can look on Wikipedia or Google for more information, but the basic idea is something akin to the hardware stack- allocate one large chunk and then simply increment the pointer to mark it as used. It can serve many contiguous requests very quickly but can't deal with frees and allocations- all freeing is done at once.
If it really is that simple - pushing strings that will never change, it is easy to write your own allocator. Allocate a large block of memory, set a pointer free to offset 0 in the block. When you need storage for a new string strncpy it to free and increase free with the strlen. Keep track of the end of the memory block and allocate another block when needed.
Not really.
std::string isn't a POD, it doesn't keep its contents "inside of the object". What's more - it doesn't even require to store its contents in a single memory block.
Also a std::vector (as all arrays) needs its contents to be of one type (= of equal size), so you can't make a literal "array" of strings of different lengths.
Your best shot is to assume a length and use std::vector<std::array<char, N> >
If you need really different lengths, an alternative is just a std::vector<char> for the data plus a std::vector<unsigned> for the indices where consecutive strings start.
Rolling your own allocator for the string is a tempting idea, you could base it on std::vector<char> and then roll up your own std::basic_string on it, then make a collection of those.
Note that you are actually depending much on a specific std::string implementation. Some do have an internal buffer of N chars and only allocate memory externally if the string length is bigger than the buffer. If that's the case on your implementation, you still wouldn't get a contiguous memory for the whole buffer of strings.
On that grounds, I conclude that with std::string you won't be generally able to accomplish what you want (unless you rely on a specific STL implementation) and you need to provide another string implementation to suit your needs.
A custom allocator is great, but why not store all the strings in a single std::vector<char> or std::string, and access the original strings by offset?
Simple and effective.
You can always write a private allocator (second template parameter for std::vector) that will allocate all the strings from a continuous pool. Also you can use std::basic_string instead of std::string (which is a private case of std::basic_string), which allows specifying your own allocator similarly. Generally I would say its a case of "premature optimization", but I trust you've measured and saw a performance hit here... I believe the price to pay would be some memory wasted, though.
A vector is guaranteed to be contiguous memory and is
interoperable with an array. It is not a singly linked list.
"Contiguity is in fact part of the vector abstraction. It’s so important, in fact, the C++03 standard was amended to explicitly add the guarantee."
Source : http://herbsutter.com/2008/04/07/cringe-not-vectors-are-guaranteed-to-be-contiguous/
Use reserve() to force it to be contiguous and not reallocate.
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <iterator>
using namespace std;
int main()
{
// create empty vector for strings
vector<const string> sentence;
// reserve memory for five elements to avoid reallocation
sentence.reserve(5);
// append some elements
sentence.push_back("Hello,");
sentence.push_back("how");
sentence.push_back("are");
sentence.push_back("you");
sentence.push_back("?");
// print elements separated with spaces
copy (sentence.begin(), sentence.end(),
ostream_iterator<string>(cout," "));
cout << endl;
return 0;
}

Is this nested array using stack or heap memory?

Say I have this declaration and use of array nested in a vector
const int MAX_LEN = 1024;
typedef std::tr1::array<char, MAX_LEN> Sentence;
typedef std::vector<Sentence> Paragraph;
Paragraph para(256);
std::vector<Paragraph> book(2000);
I assume that the memory for Sentence is on the stack. Is that right?
What about the memory for vector para? Is that on the stack i.e. should I worry if my para gets too large?
And finaly what about the memory for book? That has to be on the heap I guess but the nested arrays are on the stack, aren't they?
Additional questions
Is the memory for Paragraph contiguous?
Is the memory for book contiguous?
There is no stack. Don't think about a stack. What matters is whether a given container class performs any dynamic allocation or not.
std::array<T,N> doesn't use any dynamic allocation, it is a very thing wrapper around an automatically allocated T[N].
Anything you put in a vector will however be allocated by the vector's own allocator, which in the default case (usually) performs dynamic allocation with ::operator new().
So in short, vector<array<char,N>> is very simiar to vector<int>: The allocator simply allocates memory for as many units of array<char,N> (or int) as it needs to hold and constructs the elements in that memory. Rinse and repeat for nested vectors.
For your "additional questions": vector<vector<T>> is definitely not contiguous for T at all. It is merely contiguous for vector<T>, but that only contains the small book-keeping part of the inner vector. The actual content of the inner vector is allocated by the inner vector's allocator, and separately for each inner vector. In general, vector<S> is contiguous for the type S, and nothing else.
I'm not actually sure about vector<array<U,N>> -- it might be contiguous for U, because the array has no reason to contain any data besides the contained U[N], but I'm not sure if that's mandatory.
You might want to ask that as a separate question, it's a good question!
As a side note, it might be helpful to use gdb. It lets you manually examine your memory, including the locations of your variables. You can check yourself precisely what memory you are using.
Your code example:
const int MAX_LEN = 1024;
typedef std::tr1::array<char, MAX_LEN> Sentence;
typedef std::vector<Sentence> Paragraph;
Paragraph para(256);
std::vector<Paragraph> book(2000);
"I assume that the memory for Sentence is on the stack. Is that right?"
No. Whether something is allocated on the stack depends on the declaration context. You have omitted the context, hence nothing can be said. If an object is local and non-static, then you get stack allocation for the object itself, but not necessarily for parts that it refers to internally. By the way, since another answer here claimed "there is no stack", just disregard that urban legend about what kinds of systems C++ must support. It came originally from a misunderstanding of how a rather unsuccessful hardware level optimized computer worked, that some people erroneously thought that it didn't have a simple hardware-supported array-like stack implementation. It is quite a stretch from "not simple" to "not there", and even the "not simple" was utterly wrong, not just factually but logically (ultimately a self-contradiction). I.e. it was a not-too-smart beginner's mistake, even though the myth has been propagated by at least one person with some experience. Anyway, C++ guarantees an abstract stack, and on all extant computers that guaranteed abstract stack is implemented in terms of a hardware-assisted array-like simple stack
"What about the memory for vector para? Is that on the stack"
Again, that depends on the declaration context, which you don't show. And again, even if the object itself is allocated on the stack, parts that it refer to internally will not (in general) be allocated on the stack.
"i.e. should I worry if my para gets too large?"
No, there's no need to worry. A std::vector allocates its buffer dynamically. It's not limited by available stack space.
"And finaly what about the memory for book? That has to be on the heap I guess but the nested arrays are on the stack, aren't they?"
No and no.
"Is the memory for Paragraph contiguous?"
No. But the vector's buffer is contiguous. That's because std::array is guaranteed contiguous, and a std::vector's buffer is guaranteed contiguous.
"Is the memory for book contiguous?"
No.

c++ maximum std::string length is dictated by stack size or heap size?

as asked in the question.
std::string myVar; the maximum character it can hold is dictated by stack or heap?
Thank you
By default, the memory allocated for std::string is allocated dynamically.
Note that std::string has a max_size() function returning the maximum number of character supported by the implementation. The usefulness of this is questionable, though, as it's a implementation maximum, and doesn't take into consideration other resources, like memory. Your real limit is much lower. (Try allocating 4GB of contiguous memory, or take into account memory exhaustion elsewhere.)
A std::string object will be allocated the same way an int or any other type must be: on the stack if it's a local variable, or it might be static, or on the heap if new std::string is used or new X where X contains the string etc..
But, that std::string object may contain at least a pointer to additional memory provided by the allocator with which basic_string<> was instantiated - for the std::string typedef that means heap-allocated memory. Either directly in the original std::string object memory or in pointed-to heap you can expect to find:
a string size member,
possibly some manner of reference counter or links,
the textual data the string stores (if any)
Some std::string implementations have "short string" optimisations where they pack strings of only a few characters directly into the string object itself (for memory efficiency, often using some kind of union with fields that are used for other purposes when the strings are longer). But, for other string implementations, and even for those with short-string optimisations when dealing with strings that are too long to fit directly in the std::string object, they will have to follow pointers/references to the textual data which is stored in the allocator-provided (heap) memory.