Memory taken by a vector of vectors - c++

What is the expected difference (if any) in memory taken by vvint1 and vvint2?
Is vitest1 copied into a new memory position each time a push_back takes place?
Is vitest2 copied into a new memory position each time a push_back takes place?
typedef vector<int> vint_t;
typedef vector<vint_t> vvint_t;
size_t nvec = 2;
size_t nvvec = 3;
vvint_t vvint1(nvvec), vvint2(nvvec);
vint_t vitest2(nvec, 1);
for ( j = 0; j < nvvec; j++ ) {
vint_t vitest1(nvec, 2);
vvint1.push_back(vitest1);
vvint2.push_back(vitest2);
}

Both vvint1 and vvint2 are initially created with nvvec = 3 default-constructed members (i.e. empty vectors of int).
push_back always either copies or moves, but in this case you're not supplying rvalue references so you'll get copies. Look into std::move for more on that.
You're pushing the same number of things to both vectors. Therefore both vvint1 and vvint2 will end up being the same size.

vvint1 and vvint2 memory requirements are:
(on stack, in the example) sizeof(vector<vector<int>>) for the objects themselves, which is the same (vector is 2–3 pointers usually, regardless of the inner type);
(on heap) 2 * nvvec * sizeof(vector<int>) for the contents (nvvec initially and nvvec push_back-ed in the loop); again, that’s the same for vvint1 and vvint2;
(on heap) contents of each vector stored in these vectors. Since vectors don’t share memory, and you store them by value, nvec * nnvec * sizeof(int). Again, the same.
So the overall requirements are the same:
sizeof(vector<vector<int>>) + nvvec * sizeof(vector<int>) + nvec * nnvec * sizeof(int)
Plain vector<int> would take less space ofc as item 2 wouldn’t apply. But what is more important is that in vvint_t, inner vectors may be of different lengths, and resizing any inner vector doesn’t affect others. But that adds complexity, so unless you really need that, it’s simpler to use flat vector and calculate index; imaging libraries do it that way.
Regarding second part, both vitests are copied on each push_back. But since C++11, you can write vvint1.push_back(std::move(vitest1)); (or vvint1.emplace_back(std::move(vitest1));) to move instead. For vectors that means the newly-constructed vector takes ownership of vitest1 contents without copying it (so vitest1 becomes empty). That doesn’t change memory requirements but reduces allocations as the space allocated by vitest (at construction) would be reused instead of being freed (at destruction, at end of each iteration).

Related

Memory management when using vector

I am making a game engine and need to use the std::vector container for all of the components and entities in the game.
In a script the user might need to hold a pointer to an entity or component, perhaps to continuously check some kind of state. If something is added to the vector that the pointer points to and the capacity is exceeded, it is my understanding that the vector will allocate new memory and every pointer that points to any element in the vector will become invalid.
Considering this issue i have a couple of possible solutions. After each push_back to the vector, would it be a viable to check if a current capacity variable is exceeded by the actual capacity of the vector? And if so, fetch and overwrite the old pointers to the new ones? Would this guarantee to "catch" every case that invalidates pointers when performing a push_back?
Another solution that i've found is to instead save an index to the element and access it that way, but i suspect that is bad for performance when you need to continuously check the state of that element (every 1/60 second).
I am aware that other containers do not have this issue but i'd really like to make it work with a vector. Also it might be worth noting that i do not know in advance how many entities / components there will be.
Any input is greatly appreciated.
You shouldn't worry about performance of std::vector when you access its element only 60 times per second. By the way, in Release compilation mode std::vector::operator[] is being converted to a single lea opcode. In Debug mode it is decorated by some runtime range checks though.
If the user is going to store pointers to the objects, why even contain them in a vector?
I don't feel like it is a good idea to (poor wording)->store pointers to objects in a vector. (what I meant is to create pointers that point to vector elements, i.e. my_ptr = &my_vec[n];) The whole point of a container is to reference the contents in the normal ways that the container supports, not to create outside pointers to elements of the container.
To answer your question about whether you can detect the allocations, yes you could, but it is still probably a bad idea to reference the contents of a vector by pointers to elements.
You could also reserve space in the vector when you create it, if you have some idea of what the maximum size might grow to. Then it would never resize.
edit:
After reading other responses, and thinking about what you asked, another thought occurred. If your vector is a vector of pointers to objects, and you pass out the pointers to the objects to your clients, resizing the vector does not invalidate the pointers that the vector hold. The issue becomes keeping track of the life of the object (who owns it), which is why using shared_ptr would be useful.
For example:
vector<shared_ptr> my_vec;
my_vec.push_back(stuff);
if you pass out the pointers contained in the vector to clients...
client_ptr = my_vec[3];
There will be no problem when the vector resizes. The contents of the vector will be preserved, and whatever was at my_vec[3] will still be there. The object pointed to by my_vec[3] will still be at the same address, and my_vec[3] will still contain that address. Whomever got a copy of the pointer at my_vec[3] will still have a valid pointer.
However, if you did this:
client_ptr = &my_vec[3];
And the client is dereferencing like this:
*client_ptr->whatever();
You have a problem. Now when my_vec resized, &my_vec[3] is probably no longer valid, thus client_ptr points to nowhere.
If something is added to the vector that the pointer points to and the
capacity is exceeded, it is my understanding that the vector will
allocate new memory and every pointer that points to any element in
the vector will become invalid.
I once wrote some code to analyze what happens when a vector's capacity is exceeded. (Have you done this, yet?) What that code demonstrated on my Ubuntu with g++v5 system was that std::vector code simply a) doubles the capacity, b) moves all the elements from old to the new storage, then c) cleans up the old. Perhaps your implementation is similar. I think the details of capacity expansion is implementation dependent.
And yes, any pointer into the vector would be invalidated when push_back() causes capacity to be exceeded.
1) I simply don't use pointers-into-the-vector (and neither should you). In this way the issue is completely eliminated, as it simply can not occur. (see also, dangling pointers) The proper way to access a std::vector (or a std::array) element is to use an index (via the operator[]() method).
After any capacity-expansion, the index of all elements at indexes less than the previous capacity limit are still valid, as the push_back() installed the new element at the 'end' (I think highest memory addressed.) The elements memory location may have changed, but the element index is still the same.
2) It is my practice that I simply don't exceed the capacity. Yes, by that I mean that I have been able to formulate all my problems such that I know the required maximum-capacity. I have never found this approach to be a problem.
3) If the vector contents can not be contained in system memory (my system's best upper limit capacity is roughly 3.5 GBytes), then perhaps a vector container (or any ram based container) is inappropriate. You will have to accomplish your goal using disk storage, perhaps with vector containers acting as a cache.
update 2017-July-31
Some code to consider from my latest Game of Life.
Each Cell_t (on the 2-d gameboard) has 8 neighbors.
In my implementation, each Cell_t has a neighbor 'list,' (either std::array or std::vector, I've tried both), and after the gameboard has fully constructed, each Cell_t's init() method is run, filling it's neighbor 'list'.
// see Cell_t data attributes
std::array<int, 8> m_neighbors;
// ...
void Cell_t::void init()
{
int i = 0;
m_neighbors[i] = validCellIndx(m_row-1, m_col-1); // 1 - up left
m_neighbors[++i] = validCellIndx(m_row-1, m_col); // 2 - up
m_neighbors[++i] = validCellIndx(m_row-1, m_col+1); // 3 - up right
m_neighbors[++i] = validCellIndx(m_row, m_col+1); // 4 - right
m_neighbors[++i] = validCellIndx(m_row+1, m_col+1); // 5 - down right
m_neighbors[++i] = validCellIndx(m_row+1, m_col); // 6 - down
m_neighbors[++i] = validCellIndx(m_row+1, m_col-1); // 7 - down left
m_neighbors[++i] = validCellIndx(m_row, m_col-1); // 8 - left
// ^^^^^^^^^^^^^- returns info to quickly find cell
}
The int value in m_neighbors[i] is the index into the gameboard vector. To determine the next state of the cell, the code 'counts the neighbor's states.'
Note - Some cells are at the edge of the gameboard ... in this implementation, validCellIndx() can return a value indicating 'no-neighbor', (above top row, left of left edge, etc.)
// multiplier: for 100x200 cells,20,000 * m_generation => ~20,000,000 ops
void countNeighbors(int& aliveNeighbors, int& totalNeighbors)
{
{ /* ... initialize m_count[]s to 0 */ }
for(auto neighborIndx : m_neighbors ) { // each of 8 neighbors // 123
if(no_neighbor != neighborIndx) // 8-4
m_count[ gBoard[neighborIndx].m_state ] += 1; // 765
}
aliveNeighbors = m_count[ CellALIVE ]; // CellDEAD = 1, CellALIVE
totalNeighbors = aliveNeighbors + m_count [ CellDEAD ];
} // Cell_Arr_t::countNeighbors
init() pre-computes the index to this cells neighbors. The m_neighbors array holds index integers, not pointers. It is trivial to have NO pointers-into-the-gameboard vector.

What is the difference in the following two cases of vector usage in c++?

case 1:
std::vector< Ticker > snap_tickers_ (n_instruments);
and
case 2:
std::vector< Ticker >snap_tickers_;
snap_tickers_.resize(n_instruments);
I am getting a compilation error when am trying case 2, whereas in case 1 am not getting any build failure. Can that be related to the type of object for which the vector is created?
ANSWER:
resize in case 2 makes use of copy constructor, which was deleted for Ticker class, hence the failure.
There is no real difference.
case 1:
std::vector<int> vec(5);
allocates 5 int-elements.
case2:
std::vector<int> vec;
vec.resize(5);
here, we begin with an empty vector of ints.
When you then call resize, the function checks if the size you passed over is smaller than the actual size (wich is 0, in that case). If yes, allocate _Newsize - size() new elements. If no, pop_back (delete) size() - _Newsize elements.
So in the end, resize is slower, because there are more machine cycles (if statements, subtracting sizes...) to do.
if you want to now more, here's the resize function from vector:
void resize(size_type _Newsize)
{ // determine new length, padding as needed
if (_Newsize < size())
_Pop_back_n(size() - _Newsize);
else if (size() < _Newsize)
{ // pad as needed
_Alty _Alval(this->_Getal());
_Reserve(_Newsize - size());
_TRY_BEGIN
_Uninitialized_default_fill_n(this->_Mylast, _Newsize - size(),
_Alval);
_CATCH_ALL
_Tidy();
_RERAISE;
_CATCH_END
this->_Mylast += _Newsize - size();
}
}
as you can see, it does quite a lot.
But in the end, it's just a question about (in most cases not important) micro-seconds...
So no real difference.
According to the C++ standard (since C++03), std::vector is required to store all elements contiguously,
[...] which means that elements can be accessed not only through iterators, but also using offsets on regular pointers to elements. This means that a pointer to an element of a vector may be passed to any function that expects a pointer to an element of an array.
Because of this restriction, resizing can potentially slow down performance because of the necessity of copying elements over to a new preallocated block. In practice, this overhead is usually only seen when resizing existing vectors with a lot of items requiring the vector to copy (or move) all of the objects to a new memory location.
In the example you gave, there is no real difference because the original vector had no items in it (and many compilers pre-allocate a chunk of memory to begin). I wouldn't be surprised if the compiler did an optimization to render equivalent code.

Assigning vector size vs reserving vector size

bigvalue_t result;
result.assign(left.size() + right.size(), 0);
int carry = 0;
for(size_t i = 0; i < left.size(); i++) {
carry = 0;
for(size_t j = 0; j < right.size(); j++) {
int sum = result[i+j] + (left[i]*right[j]) + carry;
result[i+j] = sum%10;
carry = sum/10;
}
result[i+right.size()] = carry;
}
return result;
Here I used assign to allocate size of result, and result passed back normally.
When I use result.reserve(left.size()+right.size());, the function runs normally inside the both for loops. Somehow when I use print out the result.size(), it is always 0. Does reserve not allocate any space?
It is specified as
void reserve(size_type n);
Effects: A directive that informs a
vector of a planned change in size, so that it can manage the storage
allocation accordingly. After reserve(), capacity() is greater or
equal to the argument of reserve if reallocation happens; and equal to
the previous value of capacity() otherwise. Reallocation happens at
this point if and only if the current capacity is less than the
argument of reserve(). If an exception
is thrown other than by the move constructor of a non-CopyInsertable type, there are no effects.
Complexity: It does not change the size of the sequence and takes at
most linear time in the size of the sequence.
So, yes, it allocates memory, but it doesn't create any objects within the container. To actually create as much elements in the vector as you want to have later, and being able to access them via op[] you need to call resize().
reserve() is for when you want to prevent things like the vector reallocation every now and then when doing lots of push_back()s.
reserve allocates space, but doesn't really create anything. It is used in order to avoid reallocations.
For, example, if you intend to store 10000 elements, by push_back into a vector, you probably will make the vector to use re-allocations. If you use reserve before actually storing your elements, then the vector is prepared to accept about 10000 elements, thus he is prepared and the fill of the vector shall happen faster, than if you didn't use reserve.
resize, actually creates space. Note also, that resize will initialize your elements to their default values (so for an int, it will set every element to 0).
PS - In fact, when you say reserve(1000), then the vector will actually -may- allocate space for more than 1000 elements. If this happens and you store exactly 1000 elements, then the unused space remains unused (it is not de-allocated).
It is the difference between semantically increasing the size of the vector (resize/assign/push_back/etc), and physically creating more underlying memory for it to expand into (reserve).
That you see your code appear to work even with reserve is just because you're not triggering any OS memory errors (because the memory belongs to your vector), but just because you don't see any error messages or crashes doesn't mean your code is safe or correct: as far as the vector is concerned, you are writing into memory that belongs to it and not you.
If you'd used .at() instead of [] you'd have got an exception; as it is, you are simply invoking undefined behaviour.

basic question on std::vector in C++

C++ textbooks, and threads, like these say that vector elements are physically contiguous in memory.
But when we do operations like v.push_back(3.14) I would assume the STL is using the new operator to get more memory to store the new element 3.14 just introduced into the vector.
Now say the vector of size 4 is stored in computer memory cells labelled 0x7, 0x8, 0x9, 0xA. If cell 0xB contains some other unrelated data, how will 3.14 go into this cell? Does that mean cell 0xB will be copied somewhere else, erased to make room for 3.14?
The short answer is that the entire array holding the vector's data is moved around to a location where it has space to grow. The vector class reserves a larger array than is technically required to hold the number of elements in the vector. For example:
vector< int > vec;
for( int i = 0; i < 100; i++ )
vec.push_back( i );
cout << vec.size(); // prints "100"
cout << vec.capacity(); // prints some value greater than or equal to 100
The capacity() method returns the size of the array that the vector has reserved, while the size() method returns the number of elements in the array which are actually in use. capacity() will always return a number larger than or equal to size(). You can change the size of the backing array by using the reserve() method:
vec.reserve( 400 );
cout << vec.capacity(); // returns "400"
Note that size(), capacity(), reserve(), and all related methods refer to individual instances of the type that the vector is holding. For example, if vec's type parameter T is a struct that takes 10 bytes of memory, then vec.capacity() returning 400 means that the vector actually has 4000 bytes of memory reserved (400 x 10 = 4000).
So what happens if more elements are added to the vector than it has capacity for? In that case, the vector allocates a new backing array (generally twice the size of the old array), copies the old array to the new array, and then frees the old array. In pseudo-code:
if(capacity() < size() + items_added)
{
size_t sz = capacity();
while(sz < size() + items_added)
sz*=2;
T* new_data = new T[sz];
for( int i = 0; i < size(); i++ )
new_data[ i ] = old_data[ i ];
delete[] old_data;
old_data = new_data;
}
So the entire data store is moved to a new location in memory that has enough space to store the current data plus a number of new elements. Some vectors may also dynamically decrease the size of their backing array if they have far more space allocated than is actually required.
std::vector first allocates a bigger buffer, then copies existing elements from the "old" buffer to the "new" buffer, then it deletes the "old buffer", and finally adds the new element into the "new" buffer.
Generally, std::vector implementation grow their internal buffer by doubling the capacity each time it's necessary to allocate a bigger buffer.
As Chris mentioned, every time the buffer grows, all existing iterators are invalidated.
When std::vector allocates memory for the values, it allocates more than it needs; you can find out how much by calling capacity. When that capacity is used up, it allocates a bigger chunk, again larger than it needs, and copies everything from the old memory to the new; then it releases the old memory.
If there is not enough space to add the new element, more space will be allocated (as you correctly indicated), and the old data will be copied to the new location. So cell 0xB will still contain the old value (as it might have pointers to it in other places, it is impossible to move it without causing havoc), but the whole vector in question will be moved to the new location.
A vector is an array of memory. Typical implementation is that it grabs more memory than is required. It that footprint needs to expand over any other memory - the whole lot is copied. The old stuff is freed. The vector memory is on the stack - and that should be noted. It is also a good idea to say the maximum size is required.
In C++, which comes from C, memory is not 'managed' the way you describe - Cell 0x0B's contents will not be moved around. If you did that, any existing pointers would be made invalid! (The only way this could be possible is if the language had no pointers and used only references for similar functionality.)
std::vector allocates a new, larger buffer and stores the value of 3.14 to the "end" of the buffer.
Usually, though, for optimized this->push_back()s, a std::vector allocates memory about twice its this->size(). This ensures that a reasonable amount of memory is exchanged for performance. So, it is not guaranteed 3.14 will cause a this->resize(), and may simply be put into this->buffer[this->size()++] if and only if this->size() < this->capacity().

On different ways of filling a vector

I can think of three ways of filling a std::vector
Suppose we have
vector<int> v(100, 0);
Then I want it to hold (1, 1, 1). We can do:
v.clear();
v.resize(3, 1);
Or
v = vector<int>(3, 1);
And I learned another approach:
vector<int>(3, 1).swap(v);
First question is: Is any of them the best approach?
Second question: suppose v was declared outside the main function. According to this answer, the memory will be allocated in the data segment. If I use the second or third approach, will the memory be allocated on the stack?
How about you use the member of vector that is there for this task?
std::vector<int> v(100);
v.assign(3, 1); // this is what you should do.
So, here's the differences and I'll let you decide what is best for your situation.
v.clear();
v.resize(3, 1);
In this case we have marked the vector as cleared. It still holds whatever it allocated in order to hold 100 elements (which can be more than the space necessary for 100 elements). Then we added 3 items with value of 1. All this did was increase the size counter and reset 3 values, the underlying memory is still the same size.
v = vector<int>(3, 1);
This does pretty much the same thing except that an extra vector temporary is created and instead of there being intermittent places where the counter is 0 and then 3 with some values, it simple copies the counter size and then does an operation similar to a memcpy to copy over the 3 elements. Size of underlying memory allocated for v is still enough to hold 100 integers.
vector<int>(3, 1).swap(v);
This one is significantly different. In this case we create a temporary vector that holds 3 elements that are all initialized to 1. Theoretically it could still have enough memory reserved for 100 elements but the chances are it has much less. Then we swap this vector with our own and let the temporary get destroyed. This has the added benefit of clearing out any extra memory allocated by our old vector that wasn't in the temporary. The way this works is that the two vectors (our v and the temporary) swap more than just counters and values, they also swap buffer pointers.
This is the only way to shrink a vector.
To answer the second question first: vector will always dynamically allocate the memory for the objects it contains, so it will end up on the heap.
As to which method of reassigning is better, I'd say your first or second method make your intent most clear, and that's the most important attribute.
Swapping will effectively shrink the vector to 3 elements. The other ones will likely not.
vector<int> v(100);
v.assign(3, 1);
assert(v.size() == 3);
assert(v.capacity() != 3);
v = vector<int>(3, 1);
// Now, v.capacity() is likely not to be 3.
vector<int>(3, 1).swap(v);
assert(v.capacity() == 3);
The other approaches won't resize the vector internally. It still will occupy 100 * sizeof(int) bytes in memory, even if the size() member returns 3. Try displaying v.capacity() to convince yourself.
One issue, not mentioned in previous posts, is important in choosing among these alternatives. Namely, exception safety. The vector<int>(3, 1).swap(v); has strong exception safety guarantee. The form v = vector<int>(3, 1); might also offer such guarantee if assignment implemented in terms of swap. The first alternative is not safe: v.clear(); v.resize(3, 1);