Does a std::vector<T> hold only the exact number of elements it needs or extra space? - c++

So the standard says that a vector holds all of its elements in a continuous array of memory. Further, a std::vector will resize whenever it needs to be resized, up or down. Re-sizing and maintaining all of your memory in one continuous block means you need to maloc new memory of the new size then copy all of your data over to it -> O(n).
My understanding was that the standard solved this problem much like how hashtables solve collisions by re-sizing when a certain percent of its buckets were full. IE a vector when initialized will grab enough memory so that it can add items to the vector without having to expand to a larger array and then copy all of its data over until after enough calls it will resize yet again. Is this the case? When does the vector actually grab more memory/resize and then copy its elements over and how does it maintain the run times mentioned in Questions About Running time if it is resizing each push and erase?

The vector does not reallocate the contiguous area of memory it uses for its value every time a value is added or removed from the vector. A typical implementation of a vector grows the actual allocated memory for the vector by a factor of two when needed; and does not release the allocated space when the vector shrinks.
There's a std::vector method called reserve(). It does not change the actual size of the vector, the vector still contains the same number of elements, but it "reserves" the vector so that it can grow to hold up to the given number of elements, without actually resizing. So, if the vector currently has eight elements, reserve(12) grows the vector large enough so that no resizing will be needed if four additional elements are added to the vector (it's allowed to reserve even more space, if it feels like it, you're only guaranteed that the vector can grow to size 12 without resizing).
The std::vector automatically uses reserve() whenever the vector actually needs more space; as I mentioned typically by doubling the allocated space for the vector. So, if the vector has eight elements, and no more available space, the next attempt to insert an element reserves the vector for 16 values, before actually inserting a new value.
Because the vector's size growth logarithmically, and only occasionally, when there's a need for it, it can be shown that the complexity of insertion of a single value into a vector is constant amortized time.

Related

What is the space complexity of a vector of empty vectors?

Will the space complexity of be O(n) or O(1) considering all the vectors the vector holds is empty?
vector<vector<int>> mat(n);
I'm not really sure what you mean by "space complexity", but I'll give this a try.
A vector internally, consists of two parts:
The "local part" which is what gets allocated on the stack when you declare a local variable of type (say) vector<int>. This part is constant size - it doesn't change depending on the number of elements in the vector. In many implementations, this is the size of three pointers + an allocator.
The "remote part" which lives (usually) on the heap, but is allocated using the allocator. This changes size depending on the capacity of the vector. Not the size, but the capacity (the capacity is always at least as large as the size, but may be larger). An empty vector (capacity == 0) could have no "remote part".
So a vector of empty vectors would have the following space usage:
A local part for the vector<vector<int>>
A remote part that consists of N (where that's the capacity of the vector) "local parts of vector<int>.
Each of the vector<int>s could have a remote part (which could be empty).
Does that answer your question?
An empty vector is an object that exists and thus takes up storage. The outer vector doesn’t really care what it stores, and the standard requires that a vector stores the elements continuously and that each element is a distinct object.
So, from the point of view of the outer vector, you could replace the inner vector with an int and nothing changes much - the vector still holds N elements. If those elements are empty vector<int>, they’ll be larger than int, but the complexity doesn’t change: N objects exist.
It doesn’t even matter whether the inner vectors allocate or not. That would only add some constant factor to the memory cost, the total cost remains in the same O(N) class.

Why do dynamic arrays have to geometrically increase their capacity to gain O(1) amortized push_back time complexity?

I've learned that dynamic arrays, such as std::vector, double their capacity when their capacity is reached to make the push_back operation O(1) amortized time.
However, why is this needed in the first place? Isn't allocating space for one element at the end of the vector and copying the new element there already O(1)?
If you want to allocate space at the end of the array, that only works if the memory at that location is available. Something else could already be there, or that memory could be unusable. So the way that resizing an array works (in general):
Create a new, bigger array,
Copy the elements from the original array into the bigger array, and
Destroy the original array.
As you can see, when you increase the size of the array, you pay a cost proportional to the original size of the array.
So if you start with an array with one element and add a second element, you have to copy the first element into another array. If you add a third element, you have to copy the other two elements. If you add a fourth element, you have to copy the first three. This adds up to 1+2+3...+N, which is equal to N(N+1)/2, which is in O(N2). See Arithmetic Progression (Wikipedia)
If you resize the array with a geometric progression, you still have to copy the elements each time, but you are copying them fewer times.
If you resize by doubling the array, then when you get some power of two size N, N/2 will have been copied 0 times, N/4 will have been copied once, N/8 will have been copied twice, and so on. The sum of 0N/2 + 1N/4 + 2N/8 + 3N/16... is in O(N). See Geometric Series (Wikipedia)
You do not need to pick doubling, you can pick some other factor, like 1.5x. Choosing a different factor does not change the asymptotic complexity but it does change the real performance.
If you could just allocate space for one more element at the end, and copy the new element there, it would indeed be O(1).
But the standard library doesn't provide a way to do that, mostly because you can't depend on actually being able to do it.
So what normally ends up happening is that you allocate a whole new block of memory, copy (or move) the existing data from the existing block to the new block, then add the new element to the new block. And that extra step of copying/moving all the elements from the existing block to the new block is linear, meaning that adding a new element is linear.
The difficulty here is that a std::vector keeps its contents in contiguous memory. It's not enough to allocate memory for a single additional element and move it there. You need to guarantee that you have enough memory for the entire container, all in one place. Each allocation might require copying over all previous contents to maintain contiguity, so it's not necessarily actually O(1) to get enough memory for a single additional element.

Effects of vector pushback on element address

I have a vector of class objects. A function randomly choses two elements and return their addresses.
Now using these two elements, I want to generate two new objects of the same class and add them to the vector by using push_back.
Here is the address of the two parent elements:
No problem here. The first child object is then generated, and I use vector_pushback to add it to the end of the vector. The problem is, after the push_back command has been executed, it seems like the addresses of the parent objects change. Here is the state of debugger after push_back:
As you can see, the addresses obviously stay the same, but it seems like they point to garbage values after push_back. To my understanding, push_back adds an element at the end of the vector. Therefore I expect the address of the 2 elements to not change at all.
What's wrong?
TL;DR version:
An insertion operation can invalidate any pointers, references or iterators to elements of a std::vector.
Full explanation:
A std::vector has two useful metrics:
size, which is the number of elements stored.
capacity, which is the number of elements it's currently capable of storing.
capacity >= size at all times.
The capacity is the length of the internal dynamically-allocated array.* When you insert an element, the size increments by 1. But once it reaches capacity, a new, larger array must be allocated (hence increasing the capacity). This requires all the elements to be copied across, and the originals to be deleted. So all their addresses change.
* This is the typical internal implementation of a std::vector.
push_back can cause reallocation and moving of all elements in the vector, if the space that's currently assigned for element storage cannot contain the new element.

c++ inserting elements at the end of a vector

I am experiencing a problem with the vector container. I am trying to improve the performance of inserting a lot of elements into one vector.
Basically I am using vector::reserve to expand my vector _children if needed:
if (_children.capacity() == _children.size())
{
_children.reserve(_children.size() * 2);
}
and using vector::at() to insert a new element at the end of _children instead of vector::push_back():
_children.at(_children.size()) = child;
_children has already one element in it, so the first element should be inserted at position 1, and the capacity at this time is 2.
Despite this, an out_of_range error is thrown. Can someone explain to me, what I misunderstood here? Is it not possible to just insert an extra element even though the chosen position is less than the vector capacity? I can post some more code if needed.
Thanks in advance.
/mads
Increasing the capacity doesn't increase the number of elements in the vector. It simply ensures that the vector has capacity to grow up to the required size without having to reallocate memory. I.e., you still need to call push_back().
Mind you, calling reserve() to increase capacity geometrically is a waste of effort. std::vector already does this.
This causes accesses out of bounds. Reserving memory does not affect the size of the vector.
Basically, you are doing manually what push_back does internally. Why do you think it would be any more efficient?
That's not what at() is for. at() is a checked version of [], i.e. accessing an element. But reserve() does not change the number of elements.
You should just use reserve() followed by push_back or emplace_back or insert (at the end); all those will be efficient, since they will not cause reallocations if you stay under the capacity limit.
Note that the vector already behaves exactly like you do manually: When it reaches capacity, it resizes the allocated memory to a multiple of the current size. This is mandated by the requirement that adding elements have amortized constant time complexity.
Neither at nor reserve increase the size of the vector (the latter increases the capacity but not the size).
Also, your attempted optimization is almost certainly redundant; you should simply push_back the elements into the array and rely on std::vector to expand its capacity in an intelligent manner.
You have to differentiate between the capacity and the size. You can only assign within size, and reserve only affects the capacity.
vector::reserve is only internally reserving space but is not constructing objects and is not changing the external size of the vector. If you use reserve you need to use push_back.
Additionally vector::at does range checking, which makes it a lot slower compared to vector::operator[].
What you are doing is trying to mimic part of the behaviour vector already implements internally. It is going to expand by its size by a certain factor (usually around 1.5 or 2) every time it runs out of space. If you know that you are pushing back many objects and only want one reallocation use:
vec.reserve(vec.size() + nbElementsToAdd);
If you are not adding enough elements this is potentially worse than the default behaviour of vector.
The capacity of a vector is not the number of elements it has, but the number of elements it can hold without allocating more memory. The capacity is equal to or larger than the number of elements in the vector.
In your example, _children.size() is 1, but there is no element at position 1. You can only use assignment to replace existing elements, not for adding new ones. Per definition, the last element is at _children.at(_children.size()-1).
The correct way is just to use push_back(), which is highly optimized, and faster than inserting at an index. If you know beforehand how many elements you want to add, you can of course use reserve() as an optimization.
It's not necessary to call reserve manually, as the vector will automatically resize the internal storage if neccessary. Actually I believe what you do in your example is similar what the vector does internally anyway - when it reaches the capacity, reserve twice the current size.
See also http://www.cplusplus.com/reference/stl/vector/capacity/

What is a truly empty std::vector in C++?

I've got a two vectors in class A that contain other class objects B and C. I know exactly how many elements these vectors are supposed to hold at maximum. In the initializer list of class A's constructor, I initialize these vectors to their max sizes (constants).
If I understand this correctly, I now have a vector of objects of class B that have been initialized using their default constructor. Right? When I wrote this code, I thought this was the only way to deal with things. However, I've since learned about std::vector.reserve() and I'd like to achieve something different.
I'd like to allocate memory for these vectors to grow as large as possible because adding to them is controlled by user-input, so I don't want frequent resizings. However, I iterate through this vector many, many times per second and I only currently work on objects I've flagged as "active". To have to check a boolean member of class B/C on every iteration is silly. I don't want these objects to even BE there for my iterators to see when I run through this list.
Is reserving the max space ahead of time and using push_back to add a new object to the vector a solution to this?
A vector has capacity and it has size. The capacity is the number of elements for which memory has been allocated. Size is the number of elements which are actually in the vector. A vector is empty when its size is 0. So, size() returns 0 and empty() returns true. That says nothing about the capacity of the vector at that point (that would depend on things like the number of insertions and erasures that have been done to the vector since it was created). capacity() will tell you the current capacity - that is the number of elements that the vector can hold before it will have to reallocate its internal storage in order to hold more.
So, when you construct a vector, it has a certain size and a certain capacity. A default-constructed vector will have a size of zero and an implementation-defined capacity. You can insert elements into the vector freely without worrying about whether the vector is large enough - up to max_size() - max_size() being the maximum capacity/size that a vector can have on that system (typically large enough not to worry about). Each time that you insert an item into the vector, if it has sufficient capacity, then no memory-allocation is going to be allocated to the vector. However, if inserting that element would exceed the capacity of the vector, then the vector's memory is internally re-allocated so that it has enough capacity to hold the new element as well as an implementation-defined number of new elements (typically, the vector will probably double in capacity) and that element is inserted into the vector. This happens without you having to worry about increasing the vector's capacity. And it happens in constant amortized time, so you don't generally need to worry about it being a performance problem.
If you do find that you're adding to a vector often enough that many reallocations occur, and it's a performance problem, then you can call reserve() which will set the capacity to at least the given value. Typically, you'd do this when you have a very good idea of how many elements your vector is likely to hold. However, unless you know that it's going to a performance issue, then it's probably a bad idea. It's just going to complicate your code. And constant amortized time will generally be good enough to avoid performance issues.
You can also construct a vector with a given number of default-constructed elements as you mentioned, but unless you really want those elements, then that would be a bad idea. vector is supposed to make it so that you don't have to worry about reallocating the container when you insert elements into it (like you would have to with an array), and default-constructing elements in it for the purposes of allocating memory is defeating that. If you really want to do that, use reserve(). But again, don't bother with reserve() unless you're certain that it's going to improve performance. And as was pointed out in another answer, if you're inserting elements into the vector based on user input, then odds are that the time cost of the I/O will far exceed the time cost in reallocating memory for the vector on those relatively rare occasions when it runs out of capacity.
Capacity-related functions:
capacity() // Returns the number of elements that the vector can hold
reserve() // Sets the minimum capacity of the vector.
Size-related functions:
clear() // Removes all elements from the vector.
empty() // Returns true if the vector has no elements.
resize() // Changes the size of the vector.
size() // Returns the number of items in the vector.
Yes, reserve(n) will allocate space without actually putting elements there - increasing capacity() without increasing size().
BTW, if "adding to them is controlled by user-input" means that the user hits "insert X" and you insert X into the vector, you need not worry about the overhead of resizing. Waiting for user input is many times slower than the amortized constant resizing performance.
Your question is a little confusing, so let me try to answer what I think you asked.
Let's say you have a vector<B> which you default-construct. You then call vec.reserve(100). Now, vec contains 0 elements. It's empty. vec.empty() returns true and vec.size() returns 0. Every time you call push_back, you will insert one element, and unless vec conatins 100 elements, there will be no reallocation.