What does 'compacting memory' mean when removing items from the front of a std::vector?

What does 'compacting memory' mean when removing items from the front of a std::vector? - c++

Remove first N elements from a std::vector
This question talks about removing items from a vector and 'compacting memory'. What is 'compacting memory' and why is it important here?

Inside the implementation of the std::vector class is some code that dynamically allocates an array of data-elements. Often not all of the elements in this internal array will be in use -- the array is often allocated to be bigger than what is currently needed, in order to avoid having to reallocate a bigger array too often (array-reallocations are expensive!).
Similarly, when items are removed from the std::vector, the internal data-array is not immediately reallocated to be smaller (because doing so would be expensive); rather, the now-extra slots in the array are left "empty" in the expectation that the calling code might want to re-use them in the near future.
However, those empty slots still take up RAM, so if the calling code has just removed a lot of items from the vector, it might want to force the vector to reallocate a smaller internal array that doesn't contain so many empty slots. That is what they are referring to as compacting in that question.

The OP is talking about shrinking the memory the vector takes up. When you erase elements from a vector its size decreases but it capacity (the memory it is using) remains the same. When the OP says
(that also compacts memory)
They want the removal of the elements to also shrink the capacity of the vector so it reduces its memory consumption.

It means that the vector shouldn't use more memory than it needs to. In other words, the OP wants:
size() == capacity()
This can be achieved in C++11 and later by calling shrink_to_fit() on a vector. This is only a request though, it is not binding.
To make sure it will compact memory you should create a new vector and call reserve(oldvector.size()).

Related

Difference between std::resize(n) and std::shrink_to_fit in C++?

I came across these statements:
resize(n) – Resizes the container so that it contains ‘n’ elements.
shrink_to_fit() – Reduces the capacity of the container to fit its size and destroys all elements beyond the capacity.
Is there any significant difference between these functions? they come under vectors in c++

Vectors have two "length" attributes that mean different things:
size is the number of usable elements in the vector. It is the number of things you have stored. This is a conceptual length.
capacity is how many elements would fit into the amount of memory the vector has currently allocated.
capacity >= size must always be true, but there is no reason for them to be always equal. For example, when you remove an element, shrinking the allocation would require creating a new allocation one bucket smaller and moving the remaining contents over ("allocate, move, free").
Similarly, if capacity == size and you add an element, the vector could grow the allocation by one element (another "allocate, move, free" operation), but usually you're going to add more than one element. If the capacity needs to increase, the vector will increase its capacity by more than one element so that you can add several more elements before needing to move everything again.
With this knowledge, we can answer your question:
std::vector<T>::resize() changes the size of the array. If you resize it smaller than its current size, the excess objects are destructed. If you resize it larger than its current size, the "new" objects added at the end are default-initialized.
std::vector<T>::shrink_to_fit() asks for the capacity to be changed to match the current size. (Implementations may or may not honor this request. They might decrease the capacity but not make it equal to the size. They might not do anything at all.) If the request is fulfilled, this will discard some or all of the unused portion of the vector's allocation. You'd typically use this when you are done building a vector and will never add another item to it. (If you know in advance how many items you will be adding, it would be better to use std::vector<T>::reserve() to tell the vector before adding any items instead of relying on shrink_to_fit doing anything.)
So you use resize() to change how much stuff is conceptually in the vector.
You use shrink_to_fit() to minimize the excess space the vector has allocated internally without changing how much stuff is conceptually in the vector.

shrink_to_fit() – Reduces the capacity of the container to fit its size and destroys all elements beyond the capacity.
That is a mischaracterization of what happens. Secifically, the destroys all elements beyond the capacity part is not accurate.
In C++, when dynamically memory is used for objects, there are two steps:
Memory is allocated for objects.
Objects are initialized/constructed at the memory locations.
When objects in dynamically allocated memory are deleted, there are also two steps, which mirror the steps of construction but in inverse order:
Objects at the memory locations destructed (for built-in types, this is a noop).
Memory used by the objects is deallocated.
The memory allocated beyond the size of the container is just buffer. They don't hold any properly initialized objects. It's just raw memory. shrink_to_fit() makes sure that the additional memory is not there but there were no objects in those locations. Hence, nothing is destroyed, only memory is deallocated.

According to the C++ Standard relative to shrink_to_fit
Effects: shrink_to_fit is a non-binding request to reduce capacity()
to size().
and relative to resize
Effects: If sz < size(), erases the last size() - sz elements from the
sequence. Otherwise, appends sz - size() default-inserted elements to
the sequence.
It is evident that the functions do different things. Moreover the first function has no parameter while the second function has even two parameters. The function shrink_to_fit does not change the size of the container though can reallocate memory.

How does STL containers keep track of the current size of container over the total size?

Given
vector<int> a;
If a.push_back() is done, how does the vector knows whether to increase the size by reallocating memory or there is space available (Because vector allocates some extra space when size is full to reduce overhead).
P.S. Does same technique applies for other types of containers like stack, queue etc.
I think that it does the same thing as "struct" in C.

The method capacity() returns the number of items that can be stored in the vector without a reallocation.
The method size() returns the number of items which are currently stored in the vector.
Prior to inserting another item, it stands to reason that if size() == capacity() then more capacity will need to be made available. This will involve a reallocation to make more capacity available.
Does same technique applies for other types of containers like stack, queue etc.
stack and queue are built on top of other std containers. These underlying containers (normally vector or deque) employ a similar technique.
I think that it does the same thing as "struct" in C.
No.

In general, a vector ( and incidently a List in C# ), will allocate a block of memory. As you add elements to it, it will mark more and more of that memory as consumed. Then, when the block is full, it will allocate a new larger block, copy the contents into the new larger block, and delete the old one. Again the new larger block has more free space, and then, again, it can be filled up. The idea is that vector always has a contiguous space so it can be used in applications where one would consider an array. Because it has contiguous space, machine instructions for accessing a single element are trivial and so random access is very fast. List in C# has similar semantics. The implementation dependent thing has a lot to do with how much bigger that new bigger block is. Sometimes they make it a percentage bigger. Sometimes they just double the size.

Deque - how come "reserve" doesn't exist?

The standard STL vector container has a "reserve" function to reserve uninitialized memory that can be used later to prevent reallocations.
How come that the other deque container hasn't it?

Increasing the size of a std::vector can be costly. When a vector outgrows its reserved space, the entire contents of the vector must be copied (or moved) to a larger reserve.
It is specifically because std::vector resizing can be costly that vector::reserve() exists. reserve() can prepare a std::vector to anticipate reaching a certain size without exceeding its capacity.
Conversely, a deque can always add more memory without needing to relocate the existing elements. If a std::deque could reserve() memory, there would be little to no noticeable benefit.

For vector and string, reserved space prevents later insertions at the end (up to the capacity) from invalidating iterators and references to earlier elements, by ensuring that elements don't need to be copied/moved. This relocation may also be costly.
With deque and list, earlier references are never invalidated by insertions at the end, and elements aren't moved, so the need to reserve capacity does not arise.
You might think that with vector and string, reserving space also guarantees that later insertions will not throw an exception (unless a constructor throws), since there's no need to allocate memory. You might think that the same guarantee would be useful for other sequences, and hence deque::reserve would have a possible use. There is in fact no such guarantee for vector and string, although in most (all?) implementations it's true. So this is not the intended purpose of reserve.

Quoting from C++ Reference
As opposed to std::vector, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays.
The storage of a deque is automatically expanded and contracted as needed. Expansion of a deque is cheaper than the expansion of a std::vector because it does not involve copying of the existing elements to a new memory location.
Deque can allocate new memory anywhere it wants and just point to it, unlike vectors which require a continuous block of memory to hold all their elements.

Only vector have. There is no need of reserve function for deque, since elements not stored continuougusly and there is no need to reallocate and move elements when add, or remove elements.

reserve implies allocation of large blocks of contiguous data (like a vector). There is nothing in the dequeue that implies contiguous storage - it's generally implemented more like a list (which you will notice also doesn't have a 'reserve' function).
Thus, a 'reserve' function would make no sense.

There are 2 main types of memory: memories that allocate a single chunk like array and vectors, and distributed memories whose members grabs any empty location to fill in. queue and linkest list structures belong to the second type and they have some special practical advantages such that deleting a particular element does not cause a mass memory movement as opposed to arrays and vectors. Therefore they do not need to reserve any space beforehand, if they need it they just take it by connecting to tip

If you aim for having aligned memory containers you could think about implementing something like this:
std::deque<std::vector> dv; //deque with dynamic size memory aligned vectors.
typedef size_t[N] Mem;
std::deque<Mem> dvf //deque with fixed size memory aligned vectors. Here you can store the raw bytes adding a header to loop through and cast using header information and typeid...
//templates and polymorphism can help storing raw bytes, checking the type where a pointer points for example, and creating an interface to access the partial aligned memory.
Alternatively you can use a map to access the vectors instead of a deque...

What happens under the hood of vector::push_back memory wise?

My question is regarding the effect of vector::push_back, I know it adds an element in the end of the vector but what happens underneath the hood?
IIRC memory objects are allocated in a sequential manner, so my question is whether vector::push_back simply allocates more memory immediately after the vector, and if so what happens if there is not enough free memory in that location? Or perhaps a pointer is added in the "end" to cause the vector to "hop" to the location it continues? Or is it simply reallocated through copying it to another location that has enough space and the old copy gets discarded? Or maybe something else?

If there is enough space already allocated, the object is copy constructed from the argument in place. When there is not enough memory, the vector will grow it's internal databuffer following some kind of geometric progression (each time the new size will be k*old_size with k > 1[1]) and all objects present in the original buffer will then be moved to the new buffer. After the operation completes the old buffer will be released to the system.
In the previous sentence move is not used in the technical move-constructor/ move-assignment sense, they could be moved or copied or any equivalent operation.
[1] Growing by a factor k > 1 ensures that the amortized cost of push_back is constant. The actual constant varies from one implementation to another (Dinkumware uses 1.5, gcc uses 2). The amortized cost means that even if every so often one push_back will be highly expensive (O(N) on the size of the vector at the time), those cases happen rarely enough that the cost of all operations over the whole set of insertions is linear in the number of insertions, and thus each insertion averages a constant cost)

When vector is out of space, it will use it's allocator to reserve more space.
It is up to the allocator to decide how this is implemented.
However, the vector decides how much space to reserve: the standard guarantees that the vector capacity shall grow by at least a factor of 1.51 geometrically (see comment), thus preventing horrible performance due to repeated 'small' allocations.
On the physical move/copy of elements:
c++11 conforming implementations will move elements if they support move assignment and construction
most implementations I know of (g++ notably) will just use std::copy for POD types; the algorithm specialisation for POD types ensures that this compiles into (essentially) a memcpy operation. This in turn gets compiled in whatever CPU instruction is fastest on your system (e.g. SSE2 instructions)
1 I tried finding the reference quote for that from the n3242 standard draft document, but I was unable to find it at this time

A vector gurantees that all elements are contigious in memory.
Internally you can think of it as defined as three pointers (or what act like pointers):
start: Points at the beginning of the allocated block.
final: Points one past the last element in the vector.
If the vector is empty then start == final
capacity: Points one past the end of allocated memory.
If final == capacity there is no room left.
When you push back.
If final is smaller than capacity:
the new element is copied into the location pointed at by final
final is incremented to the next location.
If final is the same as capacity then the vector is full
new memory must be allocated.
The compiler will then allocate X*(capacity - start)*sizeof(t) bytes.
where X is usually a value between 1.5 and 2.
It then copies all the values from the old memory buffer to the new memory buffer.
the new value is added to the buffer.
Transfers start/final/capacity pointers.
Free's up the old buffer

When vector runs out of space, it is reallocated and all the elements are copied over to the new array. The old array is then destroyed.
To avoid an excessive number of allocations and to keep the average push_back() time at O(1), a reallocation requires that the size be increased by at least a constant factor. (1.5 and 2 are common)

When you call vector::push_back the end pointer is compared to the capacity pointer. If there is enough room for the new object placement new is called to construct the object in the available space and the end pointer is incremented.
If there isn't enough room the vector calls its allocator to allocate enough contiguous space for at least the existing elements plus new element (different implementation may grow the allocated memory by different multipliers). Then all existing elements plus the new one are copied to the newly allocated space.

std::vector overallocates - it will usually allocate more memory than necessary automatically. sizeis not affectd by this, but you can control that through capacity.
std::vector will copy everything if the additional capacity is not sufficient.
The memory allocated by std::vector is raw, no constructors are called on demand, using placement new.
So, push_back does:
if capacity is not sufficient for the new element, it will
allocate a new block
copy all existing elements (usually using the copy constructor)
increase size by one
copy the new element to the new location

If you have some idea of what will be the final size of your array, try to vector::reserve the memory first. Note that reserve is different from vector::resize. With reserve the vector::size() of your array is not changed

shrinking a vector

I've got a problem with my terrain engine (using DirectX).
I'm using a vector to hold the vertices of a detail block.
When the block increases in detail, so the vector does.
BUT, when the block decreases its detail, the vector doesn't shrink in size.
So, my question: is there a way to shrink the size of a vector?
I did try this:
vertexvector.reserve(16);

If you pop elements from a vector, it does not free memory (because that would invalidate iterators into the container elements). You can copy the vector to a new vector, and then swap that with the original. That will then make it not waste space. The Swap has constant time complexity, because a swap must not invalidate iterators to elements of the vectors swapped: So it has to just exchange the internal buffer pointers.
vector<vertex>(a).swap(a);
It is known as the "Shrink-to-fit" idiom. Incidentally, the next C++ version includes a "shrink_to_fit()" member function for std::vector.

The usual trick is to swap with an empty vector:
vector<vertex>(vertexvector.begin(), vertexvector.end()).swap(vertexvector);

The reserved memory is not reduced when the vector size is reduced because it is generally better for performance. Shrinking the amount of memory reserved by the vector is as expensive as increasing the size of the vector beyond the reserved size, in that it requires:
Ask the allocator for a new, smaller memory location,
Copy the contents from the old location, and
Tell the allocator to free the old memory location.
In some cases, the allocator can resize an allocation in-place, but it's by no means guaranteed.
If you have had a very large change in the size required, and you know that you won't want that vector to expand again (the principal of locality suggests you will, but of course there are exceptions), then you can use litb's suggested swap operation to explicitly shrink your vector:
vector<vertex>(a).swap(a);

There is a member function for this, shrink_to_fit. Its more efficient than most other methods since it will only allocate new memory and copy if there is a need. The details are discussed here,
Is shrink_to_fit the proper way of reducing the capacity a `std::vector` to its size?
If you don't mind the libc allocation functions realloc is even more efficient, it wont copy the data on a shrink, just mark the extra memory as free, and if you grow the memory and there is memory free after it will mark the needed memory as used and not copy either. Be careful though, you are moving out of C++ stl templates into C void pointers and need to understand how pointers and memory management works, its frowned upon by many now adays as a source for bugs and memory leaks.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js