C++ Growth of containers containing containers? - c++

If I have a std::vector<std::set<int>>. The vector will reallocate if you insert past its capacity. In the case where you have another resizable type inside the vector, is the vector only holding a pointer to the said type?
In particular I want to know about how memory is allocated if a vector is holding an arbitrary type.
std::vector<int> a(10); //Size will be sizeof(int) * 10
std::vector<std::set<int>> b(10);
b[0] = {0, 0, 0, 0, 0, 0, 0, .... }; //Is b's size effected by the sets inside?

C++ objects can only have one size, but may include pointers to arbitrarily sized heap memory. So, yes, container objects themselves generally include a pointer to heap memory and probably don't include any actual items. (The only typical exception is string types, which sometimes have a "small string optimization" that allows string objects to contain small strings directly in the object without allocating heap memory.)

The memory that any vector will allocate "by itself" will always be sizeof(element_type) * vector.size().
The vector can only allocate memory for element data that is visible at compile time. It doesn't care about any allocations done by the element class.
Think of a vector as an array on steroids. Like an array, a vector consists of a contiguous block of memory where all elements have the same size. To fullfill this requirement it must know at compile time how big each element will be.
Imagine a std::set to have these member variables:
struct SomeSet
{
size_t size;
SomeMagicInternalType* data;
};
So no matter how data will be allocated at runtime, the vector only allocates memory per element for what it knows at compile time:
sizeof(SomeSet::size) + sizeof(SomeSet::data)
Which would be 4 + 4 on a 32-bit machine.

Consider this example:
#include <iostream>
#include <vector>
int main() {
std::vector<int> v;
std::cout << sizeof(v) << "\n";
std::cout << v.size() << "\n";
v.push_back(3);
std::cout << sizeof(v) << "\n";
std::cout << v.size() << "\n";
}
The exact number may differ, but I get as output:
24
0
24
1
The size (size=size of the object) of a vector does not change when you add an element. The same is true for a set, thus a vector<set> does not need to reallocate if one of its elements adds or removes an element.
A set does not store its elements as members, otherwise sets with different number of elements would be different types. They are stored on the heap and as such do not contribute to the size of the set directly.

A std::vector<T> holds objects of type T. When it gets resized it copies or moves those objects as needed. A std::vector<std::set<int>> is no different; it holds objects of type std::set<int>.

Related

What does std::vector look like in memory?

I read that std::vector should be contiguous. My understanding is, that its elements should be stored together, not spread out across the memory. I have simply accepted the fact and used this knowledge when for example using its data() method to get the underlying contiguous piece of memory.
However, I came across a situation, where the vector's memory behaves in a strange way:
std::vector<int> numbers;
std::vector<int*> ptr_numbers;
for (int i = 0; i < 8; i++) {
numbers.push_back(i);
ptr_numbers.push_back(&numbers.back());
}
I expected this to give me a vector of some numbers and a vector of pointers to these numbers. However, when listing the contents of the ptr_numbers pointers, there are different and seemingly random numbers, as though I am accessing wrong parts of memory.
I have tried to check the contents every step:
for (int i = 0; i < 8; i++) {
numbers.push_back(i);
ptr_numbers.push_back(&numbers.back());
for (auto ptr_number : ptr_numbers)
std::cout << *ptr_number << std::endl;
std::cout << std::endl;
}
The result looks roughly like this:
1
some random number
2
some random number
some random number
3
So it seems as though when I push_back() to the numbers vector, its older elements change their location.
So what does it exactly mean, that std::vector is a contiguous container and why do its elements move? Does it maybe store them together, but moves them all together, when more space is needed?
Edit: Is std::vector contiguous only since C++17? (Just to keep the comments on my previous claim relevant to future readers.)
It roughly looks like this (excuse my MS Paint masterpiece):
The std::vector instance you have on the stack is a small object containing a pointer to a heap-allocated buffer, plus some extra variables to keep track of the size and and capacity of the vector.
So it seems as though when I push_back() to the numbers vector, its older elements change their location.
The heap-allocated buffer has a fixed capacity. When you reach the end of the buffer, a new buffer will be allocated somewhere else on the heap and all the previous elements will be moved into the new one. Their addresses will therefore change.
Does it maybe store them together, but moves them all together, when more space is needed?
Roughly, yes. Iterator and address stability of elements is guaranteed with std::vector only if no reallocation takes place.
I am aware, that std::vector is a contiguous container only since C++17
The memory layout of std::vector hasn't changed since its first appearance in the Standard. ContiguousContainer is just a "concept" that was added to differentiate contiguous containers from others at compile-time.
The Answer
It's a single contiguous storage (a 1d array).
Each time it runs out of capacity it gets reallocated and stored objects are moved to the new larger place — this is why you observe addresses of the stored objects changing.
It has always been this way, not since C++17.
TL; DR
The storage is growing geometrically to ensure the requirement of the amortized O(1) push_back(). The growth factor is 2 (Capn+1 = Capn + Capn) in most implementations of the C++ Standard Library (GCC, Clang, STLPort) and 1.5 (Capn+1 = Capn + Capn / 2) in the MSVC variant.
If you pre-allocate it with vector::reserve(N) and sufficiently large N, then addresses of the stored objects won't be changing when you add new ones.
In most practical applications is usually worth pre-allocating it to at least 32 elements to skip the first few reallocations shortly following one other (0→1→2→4→8→16).
It is also sometimes practical to slow it down, switch to the arithmetic growth policy (Capn+1 = Capn + Const), or stop entirely after some reasonably large size to ensure the application does not waste or grow out of memory.
Lastly, in some practical applications, like column-based object storages, it may be worth giving up the idea of contiguous storage completely in favor of a segmented one (same as what std::deque does but with much larger chunks). This way the data may be stored reasonably well localized for both per-column and per-row queries (though this may need some help from the memory allocator as well).
std::vector being a contiguous container means exactly what you think it means.
However, many operations on a vector can re-locate that entire piece of memory.
One common case is when you add element to it, the vector must grow, it can re-allocate and copy all elements to another contiguous piece of memory.
So what does it exactly mean, that std::vector is a contiguous container and why do its elements move? Does it maybe store them together, but moves them all together, when more space is needed?
That's exactly how it works and why appending elements does indeed invalidate all iterators as well as memory locations when a reallocation takes place¹. This is not only valid since C++17, it has been the case ever since.
There are a couple of benefits from this approach:
It is very cache-friendly and hence efficient.
The data() method can be used to pass the underlying raw memory to APIs that work with raw pointers.
The cost of allocating new memory upon push_back, reserve or resize boil down to constant time, as the geometric growth amortizes over time (each time push_back is called the capacity is doubled in libc++ and libstdc++, and approx. growths by a factor of 1.5 in MSVC).
It allows for the most restricted iterator category, i.e., random access iterators, because classical pointer arithmetic works out well when the data is contiguously stored.
Move construction of a vector instance from another one is very cheap.
These implications can be considered the downside of such a memory layout:
All iterators and pointers to elements are invalidate upon modifications of the vector that imply a reallocation. This can lead to subtle bugs when e.g. erasing elements while iterating over the elements of a vector.
Operations like push_front (as std::list or std::deque provide) aren't provided (insert(vec.begin(), element) works, but is possibly expensive¹), as well as efficient merging/splicing of multiple vector instances.
¹ Thanks to #FrançoisAndrieux for pointing that out.
In terms of the actual structure, an std::vector looks something like this in memory:
struct vector { // Simple C struct as example (T is the type supplied by the template)
T *begin; // vector::begin() probably returns this value
T *end; // vector::end() probably returns this value
T *end_capacity; // First non-valid address
// Allocator state might be stored here (most allocators are stateless)
};
Relevant code snippet from the libc++ implementation as used by LLVM
Printing the raw memory contents of an std::vector:
(Don't do this if you don't know what you're doing!)
#include <iostream>
#include <vector>
struct vector {
int *begin;
int *end;
int *end_capacity;
};
int main() {
union vecunion {
std::vector<int> stdvec;
vector myvec;
~vecunion() { /* do nothing */ }
} vec = { std::vector<int>() };
union veciterator {
std::vector<int>::iterator stditer;
int *myiter;
~veciterator() { /* do nothing */ }
};
vec.stdvec.push_back(1); // Add something so we don't have an empty vector
std::cout
<< "vec.begin = " << vec.myvec.begin << "\n"
<< "vec.end = " << vec.myvec.end << "\n"
<< "vec.end_capacity = " << vec.myvec.end_capacity << "\n"
<< "vec's size = " << vec.myvec.end - vec.myvec.begin << "\n"
<< "vec's capacity = " << vec.myvec.end_capacity - vec.myvec.begin << "\n"
<< "vector::begin() = " << (veciterator { vec.stdvec.begin() }).myiter << "\n"
<< "vector::end() = " << (veciterator { vec.stdvec.end() }).myiter << "\n"
<< "vector::size() = " << vec.stdvec.size() << "\n"
<< "vector::capacity() = " << vec.stdvec.capacity() << "\n"
;
}

Dynamic memory allocation in Vector

I have a doubt regarding memory allocation in vector(STL - C++). As far as I know, its capacity gets doubled dynamically every time the size of vector gets equal to its capacity. If this is the case, how come the allocation be continuous? How does it still allow to use the [] access operator for O(1) access just like arrays? Can anyone explain this behavior?
(List also has dynamic memory allocation but we cannot access its elements using [] access operator, how is it still possible with vector? )
#include<iostream>
#include<vector>
using namespace std;
int main() {
// your code goes here
vector<int> v;
for(int i=0;i<10;i++){
v.push_back(i);
cout<<v.size()<<" "<<v.capacity()<<" "<<"\n";
}
return 0;
}
Output:
1 1
2 2
3 4
4 4
5 8
6 8
7 8
8 8
9 16
10 16
As far as I know, its capacity gets doubled dynamically every time the size of vector gets equal to its capacity.
It does not need to double like in your case, it's implementation defined. So it may differ if you use another compiler.
If this is the case, how come the allocation be continuous?
If there is no more continuous memory which the vector could allocate, the vector has to move it's data to a new continuous memory block which meets it's size requirements. The old block will be marked as free, so that other can use it.
How does it still allow to use the [] access operator for O(1) access just like arrays?
Because of the facts said before the access will be possible with the [] operator or a pointer + offset. The access to the data will be O(1).
List also has dynamic memory allocation but we cannot access its elements using [] access operator, how is it still possible with vector?
A list (std::list for example) is totally different from a std::vector. In the case of a C++ std::list it saves nodes with data, a pointer to the next node and a pointer the previous node (double-linked list). So you have to walk through the list to get one specific node you want.
Vectors work like said above.
The vector has to store the objects in one continuous memory area. Thus when it needs to increase its capacity, it has to allocate a new (larger) memory area (or expand the one it already has, if that's possible), and either copy or move the objects from the "old, small" area to the newly allocated one.
This can be made apparent by using a class with a copy/move constructor with some side effect (ideone link):
#include <iostream>
#include <vector>
using std::cout;
using std::endl;
using std::vector;
#define V(p) static_cast<void const*>(p)
struct Thing {
Thing() {}
Thing(Thing const & t) {
cout << "Copy " << V(&t) << " to " << V(this) << endl;
}
Thing(Thing && t) /* noexcept */ {
cout << "Move " << V(&t) << " to " << V(this) << endl;
}
};
int main() {
vector<Thing> things;
for (int i = 0; i < 10; ++i) {
cout << "Have " << things.size() << " (capacity " << things.capacity()
<< "), adding another:\n";
things.emplace_back();
}
}
This will lead to output similar to
[..]
Have 2 (capacity 2), adding another:
Move 0x2b652d9ccc50 to 0x2b652d9ccc30
Move 0x2b652d9ccc51 to 0x2b652d9ccc31
Have 3 (capacity 4), adding another:
Have 4 (capacity 4), adding another:
Move 0x2b652d9ccc30 to 0x2b652d9ccc50
Move 0x2b652d9ccc31 to 0x2b652d9ccc51
Move 0x2b652d9ccc32 to 0x2b652d9ccc52
Move 0x2b652d9ccc33 to 0x2b652d9ccc53
[..]
This shows that, when adding a third object to the vector, the two objects it already contains are moved from one continuous area (look at the by 1 (sizeof(Thing)) increasing addresses to another continuous area. Finally, when adding the fifth object, you can see that the third object was indeed placed directly after the second.
When does it move and when copy? The move constructor is considered when it is marked as noexcept (or the compiler can deduce that). Otherwise, if it would be allowed to throw, the vector could end up in a state where some part of its objects are in the new memory area, but the rest is still in the old one.
The question should be considered at 2 different levels.
From a standard point of view, it is required to provided a continuous storage to allow the programmer to use the address of its first element as the address of the first element of an array. And it is required to let its capacity grow when you add new elements by reallocation still keeping previous elements - but their address may change.
From an implementation point of view, it can try to extend the allocated memory in place and, if it cannot, allocate a brand new piece of memory and move or copy construct existing elements in the new allocated memory zone. The size increase is not specified by the standard and is left to the implementation. But you are right, doubling allocated size on each time is the common usage.

why constant size of struct despite having a vector of int

I have defined a struct which contains a vector of integer. Then I insert 10 integers in the vector and check for the size of struct. But I see no difference.
Here is my code:
struct data
{
vector<int> points;
}
int main()
{
data d;
cout << sizeof(d) << endl;
for (int i=0; i< 10; ++i)
d.points.push_back(i)
cout << sizeof(d) << endl;
In both the cases I am getting the same result : 16
Why is it so? Shouldn't the size of struct grow?
A vector will store its elements in dynamically allocated memory (on the heap). Internally, this might be represented as:
T* elems; // Pointer memory.
size_t count; // Current number of elements.
size_t capacity; // Total number of elements that can be held.
so the sizeof(std::vector) is unaffected by the number of elements it contains as it calculating the sizeof its contained members (in this simple example roughly sizeof(T*) + (2 * sizeof(size_t))).
The sizeof operator is a compile time operation that gives you the size of the data structure used to maintain the container, not including the size of the stored elements.
While this might not seem too intuitive at first, consider that when you use a std::vector you are using a small amount of local storage (where the std::vector is created) which maintains pointers to a different region holding the actual data. When the vector grows the data block will grow, but the control structure is still the same.
The fact that sizeof will not change during it's lifetime is important, as it is the only way of making sure that the compiler can allocate space for points inside data without interfering with other possible members:
struct data2 {
int x;
std::vector<int> points;
int y;
};
If the size of the object (std::vector in this case) was allowed to grow it would expand over the space allocated for y breaking any code that might depend on its location:
data2 d;
int *p = &d.y;
d.points.push_back(5);
// does `p` still point to `&d.y`? or did the vector grow over `y`?

basic question on std::vector in C++

C++ textbooks, and threads, like these say that vector elements are physically contiguous in memory.
But when we do operations like v.push_back(3.14) I would assume the STL is using the new operator to get more memory to store the new element 3.14 just introduced into the vector.
Now say the vector of size 4 is stored in computer memory cells labelled 0x7, 0x8, 0x9, 0xA. If cell 0xB contains some other unrelated data, how will 3.14 go into this cell? Does that mean cell 0xB will be copied somewhere else, erased to make room for 3.14?
The short answer is that the entire array holding the vector's data is moved around to a location where it has space to grow. The vector class reserves a larger array than is technically required to hold the number of elements in the vector. For example:
vector< int > vec;
for( int i = 0; i < 100; i++ )
vec.push_back( i );
cout << vec.size(); // prints "100"
cout << vec.capacity(); // prints some value greater than or equal to 100
The capacity() method returns the size of the array that the vector has reserved, while the size() method returns the number of elements in the array which are actually in use. capacity() will always return a number larger than or equal to size(). You can change the size of the backing array by using the reserve() method:
vec.reserve( 400 );
cout << vec.capacity(); // returns "400"
Note that size(), capacity(), reserve(), and all related methods refer to individual instances of the type that the vector is holding. For example, if vec's type parameter T is a struct that takes 10 bytes of memory, then vec.capacity() returning 400 means that the vector actually has 4000 bytes of memory reserved (400 x 10 = 4000).
So what happens if more elements are added to the vector than it has capacity for? In that case, the vector allocates a new backing array (generally twice the size of the old array), copies the old array to the new array, and then frees the old array. In pseudo-code:
if(capacity() < size() + items_added)
{
size_t sz = capacity();
while(sz < size() + items_added)
sz*=2;
T* new_data = new T[sz];
for( int i = 0; i < size(); i++ )
new_data[ i ] = old_data[ i ];
delete[] old_data;
old_data = new_data;
}
So the entire data store is moved to a new location in memory that has enough space to store the current data plus a number of new elements. Some vectors may also dynamically decrease the size of their backing array if they have far more space allocated than is actually required.
std::vector first allocates a bigger buffer, then copies existing elements from the "old" buffer to the "new" buffer, then it deletes the "old buffer", and finally adds the new element into the "new" buffer.
Generally, std::vector implementation grow their internal buffer by doubling the capacity each time it's necessary to allocate a bigger buffer.
As Chris mentioned, every time the buffer grows, all existing iterators are invalidated.
When std::vector allocates memory for the values, it allocates more than it needs; you can find out how much by calling capacity. When that capacity is used up, it allocates a bigger chunk, again larger than it needs, and copies everything from the old memory to the new; then it releases the old memory.
If there is not enough space to add the new element, more space will be allocated (as you correctly indicated), and the old data will be copied to the new location. So cell 0xB will still contain the old value (as it might have pointers to it in other places, it is impossible to move it without causing havoc), but the whole vector in question will be moved to the new location.
A vector is an array of memory. Typical implementation is that it grabs more memory than is required. It that footprint needs to expand over any other memory - the whole lot is copied. The old stuff is freed. The vector memory is on the stack - and that should be noted. It is also a good idea to say the maximum size is required.
In C++, which comes from C, memory is not 'managed' the way you describe - Cell 0x0B's contents will not be moved around. If you did that, any existing pointers would be made invalid! (The only way this could be possible is if the language had no pointers and used only references for similar functionality.)
std::vector allocates a new, larger buffer and stores the value of 3.14 to the "end" of the buffer.
Usually, though, for optimized this->push_back()s, a std::vector allocates memory about twice its this->size(). This ensures that a reasonable amount of memory is exchanged for performance. So, it is not guaranteed 3.14 will cause a this->resize(), and may simply be put into this->buffer[this->size()++] if and only if this->size() < this->capacity().

How do I access the internal contiguous buffer of a std::vector and can I use it with memcpy, etc?

How can I access the contiguous memory buffer used within a std::vector so I can perform direct memory operations on it (e.g. memcpy)? Also, it is safe to perform operations like memcpy on that buffer?
I've read that the standard guarantees that a vector uses a contiguous memory buffer internally, but that it is not necessarily implemented as a dynamic array. I figure given that it is definitely contiguous, I should be able to use it as such - but I wasn't sure if the vector implementation stored book-keeping data as part of that buffer. If it did, then something like memcpying the vector buffer would destroy its internal state.
In practice, virtually all compilers implement vector as an array under the hood. You can get a pointer to this array by doing &somevector[0]. If the contents of the vector are POD ('plain-old-data') types, doing memcpy should be safe - however if they're C++ classes with complex initialization logic, you'd be safer using std::copy.
Simply do
&vec[0];
// or Goz's suggestion:
&vec.front();
// or
&*vec.begin();
// but I don't know why you'd want to do that
This returns the address of the first element in the vector (assuming vec has more than 0 elements), which is the address of the array it uses. vector storage is guaranteed by the standard to be contiguous, so this is a safe way to use a vector with functions that expect arrays.
Be aware that if you add, or remove elements from the vector, or [potentially] modify the vector in any way (such as calling reserve), this pointer could become invalid and point to a deallocated area of memory.
You can simply do:
&vect[0]
The memory is guaranteed contiguous so its safe to work with it with C library functions such as memcpy. However, you shouldn't persist pointers into the contiguous data because vector resizes may reallocate and copy the memory to a different location. IE the following would be bad:
std::vector<char> charVect;
// insert a bunch of stuff into charVect
...
char* bufferPtr = &charVect[0];
charVect.push_back('a'); // potential resize
// Now bufferPtr may not be valid since the resize may have moved
// the vectors contents
bufferPtr[0] = 'f'; // **CRASH**
&myvec[0]
But note that using memcpy is really only applicable if this is a vector of PODs or primitive types. Doing direct memory manipulation of anything else leads to undefined behaviour.
The simplest way is to use &v[0], where v is your vector. An example:
int write_vector(int fd, const std::vector<char>& v) {
int rval = write(fd, &v[0], v.size());
return rval;
}
Yes - since the standard guarantees contiguous placement of the vector's internal data, you can access a pointer to the first element in the vector via:
std::vector<int> my_vector;
// initialize...
int* arr = &my_vector[0];
While you can safely read the right amount of data from the underlying storage, writing there may not happen to be a good idea, depending on the design.
vlad:Code ⧴ cat vectortest.cpp
#include <vector>
#include <iostream>
int main()
{
using namespace std;
vector<char> v(2);
v.reserve(10);
char c[6]={ 'H', 'e', 'l', 'l', 'o', '\0' };
cout << "Original size is " << v.size();
memcpy( v.data(), c, 6);
cout << ", after memcpy it is " << v.size();
copy(c, c+6, v.begin());
cout << ", and after copy it is " << v.size() << endl;
cout << "Arr: " << c << endl;
cout << "Vec: ";
for (auto i = v.begin(); i!=v.end(); ++i) cout << *i;
cout << endl;
}
vlad:Code ⧴ make vectortest
make: `vectortest' is up to date.
vlad:Code ⧴ ./vectortest
Original size is 2, after memcpy it is 2, and after copy it is 2
Arr: Hello
Vec: He
vlad:Code ⧴
So if you are writing past the size(), then the new data is not accessible by class methods.
You can account for that and ensure the size is enough (e.g. vector<char> v(10)), but do you really want to make software where you are fighting the standard library?