Visual C++ vector erase increases memory usage? - c++

it's my very first question here on stackoverflow. I have largely looked for a reason for what I'm experiencing with the following lines of code:
unsigned long long _mem1= getUsedVirtualMemory();
vector.erase(vector.begin() + _idx);
contained= false; // don't stop the loop
_idx--; // the removed object has redefined the idx to be consider.
_mem1 = getUsedVirtualMemory() - _mem1;
if (_mem1 > 0) printf("Memory - 2 mem1: %lu\n" , _mem1);
I have a huge memory consumption in my program and after an intensive debug session, some printfs and time consuming analyses, I arrived to this point:
getUsedVirtualMemory is implemented with the following code:
PROCESS_MEMORY_COUNTERS_EX pmc;
GetProcessMemoryInfo(GetCurrentProcess(), (PROCESS_MEMORY_COUNTERS*) &pmc, sizeof(pmc));
SIZE_T virtualMemUsedByMe = pmc.PrivateUsage;
return virtualMemUsedByMe;
to obtain the amount of virtual memory allocated by the process; the vector is a vector of objects (not pointers).
In most cases the vector's erase method seems to work as expected, but in some cases it looks like the erase method of that vector increases the memory used by the process instead of freeing it. I'm using the Windows system function GetProcessMemoryInfo in a lot of situations around the code to debug this problem and it seems to return an actual value for used virtual memory.
I'm using Visual Studio C++ 2010 Professional.
Let me know if more information are needed.
Thanks for any replies.
UPDATE:
Everything you wrote in your replies is correct and I forgot the following details:
I already know that a vector has a size (actual number of elements) and a capacity (allocated slots to store elements)
I already know that the erase method does not free memory (I looked for a lot of documentation about that method)
finally, I will add other elements to that vector later, so I don't need to shrink that vector.
The actual problem is that in that case, the value of "_mem1" in the last line of code shows a difference of 1.600.000 bytes: unjustified increase of memory, while I expected to be 0 bytes.
Also in the case where the value of used memory after the operation would be less than the first one, I would expect a very big number for what is explained for instance at Is unsigned integer subtraction defined behavior?
Instead of the expected results I get a value greater than 0 but relatively short.
To better understand the incidence of the problem, iterating some thousands of times on that piece of code, unexpectedly allocates about 20 Gb of virtual memory.

A vector has :
a size(), which indicates how many active elements are in the container
a capacity(), which tells how many elements are reserved in memory
erase() changes the size to zero. It does not free memory capactiy allocated.
You can use shrink_to_fit() which makes sure that the capacity is reduced to the size.
Changing the size with resize() or the capacity with reserve(), may increase memory allocated if necessary, but it does not necessarily free memory if the new size/capacity is lower than the existing capacity.

It's because erase will not free memory, it just erases elements. Take a look at Herbs.
To (really) release the memory you could do (from reference link):
The Right Way To "Shrink-To-Fit" a vector or deque
So, can we write code that does shrink a vector "to fit" so that its capacity is just enough to hold the contained elements? Obviously reserve() can't do the job, but fortunately there is indeed a way:
vector<Customer>( c ).swap( c );
// ...now c.capacity() == c.size(), or
// perhaps a little more than c.size()

vector.ersase() is only guaranteed to remove elements from the vector, is is not guaranteed to reduce the size of the underlying array (as that process is rather expensive). IE: It only zeros out data, it doesn't necessarily deallocate it.
If you need to have a vector that is only as large as the elements it contains, try using vector.resize(vector.size())

IIRC in a Debug build on windows, new is actually #defined to be DEBUG_NEW which causes (amongst other things) memory blocks not to be actually freed, but merely marked as 'deleted'.
Do you get the same behaviour with a release build?

One part of the puzzle might be that std::vector cannot delete entries from the underlying memory buffer if they are not at the end of the buffer (which yours aren't), so the kept entries are moved - potentially to an altogether different buffer. Since you're erasing the first element, std::vector is allowed (since the standard states that erase() invalidates all iterators at/after the point of erasure, all of them in your case) to allocate an additional buffer to copy the remaining elements to, and then discard the old buffer after copying. So you may end up with two buffers being in use at the same time, and your memory manager will likely not return the discarded buffer to the operating system, but rather keep the memory around to re-use it in a subsequent allocation. This would explain the memory usage increase for a single one of your loop iterations.

Related

std::vector increasing peak memory

This is in continuation of my last question. I am failed to understand the memory taken up by vector. Problem skeleton:
Consider an vector which is an collection of lists and lists is an collection of pointers. Exactly like:
std::vector<std::list<ABC*> > vec;
where ABC is my class. We work on 64bit machines, so size of pointer is 8 bytes.
At the start of my flow in the project, I resize this vector to an number so that I can store lists at respective indexes.
vec.resize(613284686);
At this point, capacity and size of the vector would be 613284686. Right. After resizing, I am inserting the lists at corresponding indexes as:
// Some where down in the program, make these lists. Simple push for now.
std::list<ABC*> l1;
l1.push_back(<pointer_to_class_ABC>);
l1.push_back(<pointer_to_class_ABC>);
// Copy the list at location
setInfo(613284686, l1);
void setInfo(uint64_t index, std::list<ABC*> list>) {
std::copy(list.begin(), list.end(), std::back_inserter(vec.at(index));
}
Alright. So inserting is done. Notable things are:
Size of vector is : 613284686
Entries in the vector is : 3638243731 // Calculated this by going over vector indexes and add the size of std::lists at each index.
Now, since there are 3638243731 entries of pointers, I would expect memory taken by this vector is ~30Gb. 3638243731 * 8(bytes) = ~30Gb.
BUT BUT When I have this data in memory, memory peaks to, 400G.
And then I clear up this vector with:
std::vector<std::list<nl_net> >& ccInfo = getVec(); // getVec defined somewhere and return me original vec.
std::vector<std::list<nl_net> >::iterator it = ccInfo.begin();
for(; it != ccInfo.end(); ++it) {
(*it).clear();
}
ccInfo.clear(); // Since it is an reference
std::vector<std::list<nl_net> >().swap(ccInfo); // This makes the capacity of the vector 0.
Well, after clearing up this vector, memory drops down to 100G. That is too much holding from an vector.
Would you all like to correct me what I am failing to understand here?
P.S. I can not reproduce it on smaller cases and it is coming in my project.
vec.resize(613284686);
At this point, capacity and size of the vector would be 613284686
It would be at least 613284686. It could be more.
std::vector<std::list<nl_net> >().swap(ccInfo); // This makes the capacity of the vector 0.
Technically, there is no guarantee by the standard that a default constructed vector wouldn't have capacity other than 0... But in practice, this is probably true.
Now, since there are 3638243731 entries of pointers, I would expect memory taken by this vector is ~30Gb. 3638243731 * 8(bytes)
But the vector doesn't contain pointers. It contains std::list<ABC*> objects. So, you should expect vec.capacity() * sizeof(std::list<ABC*>) bytes used by the buffer of the vector itself. Each list has at least a pointer to beginning and the end.
Furthermore, you should expect each element in each of the lists to use memory as well. Since the list is doubly linked, you should expect about two pointers plus the data (a third pointer) worth of memory for each element.
Also, each pointer in the lists apparently points to an ABC object, and each of those use sizeof(ABC) memory as well.
Furthermore, since each element of the linked lists are allocated separately, and each dynamic allocation requires book-keeping so that they can be individually de-allocated, and each allocation must be aligned to the maximum native alignment, and the free store may have fragmented during the execution, there will be much overhead associated with each dynamic allocation.
Well, after clearing up this vector, memory drops down to 100G.
It is quite typical for the language implementation to retain (some) memory it has allocated from the OS. If your target system documents an implementation specific function for explicitly requesting release of such memory, then you could attempt using that.
However, if the vector buffer wasn't the latest dynamic allocation, then its deallocation may have left a massive reusable area in the free store, but if there exists later allocations, then all that memory might not be releasable back to the OS.
Even if the langauge implementation has released the memory to the OS, it is quite typical for the OS to keep the memory mapped for the process until another process actually needs the memory for something else. So, depending on how you're measuring memory use, the results might not necessarily be meaningful.
General rules of thumb that may be useful:
Don't use a vector unless you use all (or most) of the indices. In case where you don't, consider a sparse array instead (there is no standard container for such data structure though).
When using vector, reserve before resize if you know the upper bound of allocation.
Don't use linked lists without a good reason.
Don't rely on getting all memory back from peak usage (back to the OS that is; The memory is still usable for further dynamic allocations).
Don't stress about virtual memory usage.
std::list is a fragmented memory container. Typically each node MUST have the data it is storing, plus the 2 prev/next pointers, and then you have to add in the space required within the OS allocation table (typically 16 or 32 bytes per allocation - depending on OS). You then have to account for the fact all allocations must be returned on a 16byte boundary (on Intel/AMD based 64bit machines anyway).
So using the example of std::list<ABC*> the size of a pointer is 8, however you will need at least 48bytes to store each element (at least).
So memory usage for ONLY the list entries is going to be around: 3638243731 * 48(bytes) = ~162Gb.
This is of course assuming that there is no memory fragmentation (where there may be a block of 62bytes free, and the OS returns the entire block of 62 rather than the 48 requested). We are also assuming here that the OS has a minimum allocation size of 48 bytes (and not say, 64bytes, which would not be overly silly, but would push the usage up far higher).
The size of the std::lists themselves within the vector comes to around 18GB. So in total we are looking at 180Gb at least to store that vector. It would not be beyond the realm of possibility that the extra allocations are additional OS book keeping info, for all of those individual memory allocations (e.g. lists of loaded memory pages, lists of swapped out memory pages, the read/write/mmap permissions, etc, etc).
As a final note, instead of using swap on a newly constructed vector, you can just use shrink to fit.
ccInfo.clear();
ccInfo.shrinkToFit();
The main vector needs some more consideration. I get the impression it will always be a fixed size. So why not use a std::array instead? A std::vector always allocates more memory than it needs to allow for growth. The bigger your vector the bigger the reservation of memory to allow for more even growth. The reasononing behind is to keep relocations in memory to a minimum. Relocations on really big vectors take up huge amounts of time so a lot of extra memory is reserved to prevent this.
No vector function that can delete elements (such as vector::clear and ::erase) also deallocates memory (e.g. lower the capacity). The size will decrease but the capacity doesn't. Again, this is meant to prevent relocations; if you delete you are also very likely to add again. ::shrink_to_fit also doesn't guarantuee you that all of the used memory is released.*
Next is the choice of a list to store elements. Is a list really applicable? Lists are strong in random access/insertion/removal operations. Are you really constantly adding and removing ABC objects to the list in random locations? Or is another container type with different properties but with contiguous memory more suitable? Another std::vector or std::array perhaps. If the answer is yes than you're pretty much stuck with a list and its scattered memory allocations. If no, than you could win back a lot of memory by using a different container type.
So, what is it you really want to do? Do you really need dynamic growth on both the main container and its elements? Do you really need random manipulation? Or can you use fixed-size arrays for both container and ABC objects and use iteration instead? When contemplating this you might want to read up on the available containers and their properties on en.cppreference.com. It will help you decide what is most appropriate.
*For the fun of it I dug around in VS2017's implementation and it creates an entirely new vector without the growth segment, copies the old elements and then reassigns the internal pointers of the old vector to the new one while deleting the old memory. So at least with that compiler you can count on memory being released.

What does 'compacting memory' mean when removing items from the front of a std::vector?

Remove first N elements from a std::vector
This question talks about removing items from a vector and 'compacting memory'. What is 'compacting memory' and why is it important here?
Inside the implementation of the std::vector class is some code that dynamically allocates an array of data-elements. Often not all of the elements in this internal array will be in use -- the array is often allocated to be bigger than what is currently needed, in order to avoid having to reallocate a bigger array too often (array-reallocations are expensive!).
Similarly, when items are removed from the std::vector, the internal data-array is not immediately reallocated to be smaller (because doing so would be expensive); rather, the now-extra slots in the array are left "empty" in the expectation that the calling code might want to re-use them in the near future.
However, those empty slots still take up RAM, so if the calling code has just removed a lot of items from the vector, it might want to force the vector to reallocate a smaller internal array that doesn't contain so many empty slots. That is what they are referring to as compacting in that question.
The OP is talking about shrinking the memory the vector takes up. When you erase elements from a vector its size decreases but it capacity (the memory it is using) remains the same. When the OP says
(that also compacts memory)
They want the removal of the elements to also shrink the capacity of the vector so it reduces its memory consumption.
It means that the vector shouldn't use more memory than it needs to. In other words, the OP wants:
size() == capacity()
This can be achieved in C++11 and later by calling shrink_to_fit() on a vector. This is only a request though, it is not binding.
To make sure it will compact memory you should create a new vector and call reserve(oldvector.size()).

Can you predict where in memory a vector might move when growing?

I'm learning about C++ and have a conceptual question. Let's say I have a vector. I know that my vector is stored in contiguous memory, but let's say my vector keeps growing and runs out of room to keep the memory contiguous. How can I predict where in memory the vector will go? I'm excluding the option of using functions that tell the vector where it should be in memory.
If it "runs out of room to keep the memory contiguous", then it simply won't grow. Attempting to add items past the currently allocated size will (typically) result in its throwing an exception (though technically, it's up to the allocator object to decide what to do--it's responsible for memory allocation, and responding when that's not possible.
Note, however, that this could result from running out of address space (especially on a 32-bit machine) rather than running out of actual memory. A typical virtual memory manager can reallocate physical pages (e.g., 4 KB or 8 KB chunks) and write data to the paging file if necessary to free physical memory if needed--but when/if there's not enough contiguous address space, there's not much that can be done.
The answer depends highly on your allocation strategy, but in general, the answer is no. Most allocators do not provide you with information where the next allocation will occur. If you were writing a custom allocator, then you could potentially make this information accessible, but doing so is not necessarily a good idea unless your use case specifically requires this knowledge.
The realloc function is the only C function which will attempt to grow your memory in place, and it makes no guarantees that it will do so.
Neither new nor malloc provide any information for where the "next" allocation will take place. You could potentially guess, if you knew the exact implementation details for your specific compiler, but this would be very unwise to rely on in a real program. Regarding specifically the std::allocator used for std::vector, it also does not provide details about where future allocations will take place.
Even if you could predict it in a particular situation, it would be extremely fragile - all it takes is one function you call to change to make another call to new or malloc [unless you are using a very specific allocation method - which is different from the "usual" method] to "break" where the next allocation is made.
If you KNOW that you need a certain size, you can use std::vector::resize() to set the size of the vector [or std::vector<int> vec(10000); to create a pre-sized to 10000, for example] - which of course is not guaranteed to work, but it guarantees that you never need "enough space to hold 3x the current content", which is what happens with std::vector when you grow it using push_back [and if you are REALLY unlucky, that means that your vector will use 2*n-1 elements, leaving n-1 unused, because your size is n-1 and you add ONE more element, which doubles the size, so now 2*n, and you only actually require one more element...
The internal workings of STL containers are kept private for good reasons. You should never be accessing any container elements through any mechanism other than the appropriate iterators; and it is not possible to acquire one of those on an element that does not yet exist.
You could however, supply an allocator and use that to deterministically place future allocations.
Can you predict where in memory a vector might move when growing?
As others like EJP, Jerry and Mats have said, you cannot determine the location of a "grown" vector until after it grows. There are some corner cases, like the allocator providing a block of memory that's larger than required so that the vector does not actually move after a grow. But its not something you should depend on.
In general, stacks grow down and heaps grow up. This is an artifact from the old memory days. Your code segment was sandwiched between them, and it ensured your program would overwrite its own code segment and eventually cause an illegal instruction. So you might be able to guess the new vector is going to be higher in memory than the old vector because the vector is probably using heap memory. But its not really useful information.
If you are devising a strategy for locating elements after a grow, then use an index and not an iterator. Iterators are invalidated after inserts and deletes (including the grow).
For example, suppose you are parsing the vector and you are looking for the data that follows -----BEGIN CERTIFICATE-----. Once you know the offset of the data (byte 27 in the vector), then you can always relocate it in constant time with v.begin() + 26. If you only have part of the certificate and later add the tail of the data and the -----END CERTIFICATE----- (and the vector grows), then the data is still located at v.begin() + 26.
No, in practical terms you can't predict where it will go if it has to move due to resizing. However, it isn't so random that you could use it as a random number generator (;

Possible memory leak when using a vector of a map of strings C++

I have a pretty complex data object that uses has a map of strings
typedef std::map<std::string, unsigned int> Event;
typedef std::pair<double, Event> EventGroup;
std::vector<EventGroup> eventVector;
This is a program that's always running in the background listening to incoming messages. Every time a new EventGroup comes in, which can have any number of strings in the map, I add it to the vector.
// New data came in
eventVector.push_back(newEventGroup);
Every now and then I'll do an erase on this vector
//Flush some of the data because it's old
// it's been determined that the index to erase at is flushIndex
eventVector.erase(eventVector.begin(), eventVector.begin()+flushIndex);
Typically this tends to be the first 5% of the data.
What I've been noticing is that there seems to be a memory leak. The memory usage starts out around 50 MB... but ends up near 1 GB before it's too slow and crashes. I've heard that it's an expensive operation to do an erase, but could this be the cause of the memory leak? Am I missing some way of freeing up the memory used by the map?
Without knowing what your custom types do or look like (are THEY leaking memory?) it's hard to say. You should note however that erasing elements from a vector does not actually free any memory, it makes the area the vector has already allocated available for different elements added to THAT vector. The vector's reserved space remains the same in other words.
So, if you grow a vector to some million elements, erase 90% of them, and are expecting to get a bunch of memory back you'll be disappointed. The way you can free up memory reserved by a vector (which will NEVER give anything back until it's destroyed) is to do the swap idiom thing:
std::vector<EventGroup>(eventVector).swap(eventVector);
I don't recall the exact specifics of how the copy constructor works here. It should behave exactly the same as if you'd done this:
std::vector<EventGroup>(eventVector.begin(), eventVector.end()).swap(eventVector);
You still have no control over how much space this uses up, but if you've freed a lot of space up and it will remain freed for a long while...this should give some unknown amount of memory back to the system.
Keep in mind that this is an expensive operation (which is why std::vector doesn't just do it for you) so only do it when you need to.

What happens under the hood of vector::push_back memory wise?

My question is regarding the effect of vector::push_back, I know it adds an element in the end of the vector but what happens underneath the hood?
IIRC memory objects are allocated in a sequential manner, so my question is whether vector::push_back simply allocates more memory immediately after the vector, and if so what happens if there is not enough free memory in that location? Or perhaps a pointer is added in the "end" to cause the vector to "hop" to the location it continues? Or is it simply reallocated through copying it to another location that has enough space and the old copy gets discarded? Or maybe something else?
If there is enough space already allocated, the object is copy constructed from the argument in place. When there is not enough memory, the vector will grow it's internal databuffer following some kind of geometric progression (each time the new size will be k*old_size with k > 1[1]) and all objects present in the original buffer will then be moved to the new buffer. After the operation completes the old buffer will be released to the system.
In the previous sentence move is not used in the technical move-constructor/ move-assignment sense, they could be moved or copied or any equivalent operation.
[1] Growing by a factor k > 1 ensures that the amortized cost of push_back is constant. The actual constant varies from one implementation to another (Dinkumware uses 1.5, gcc uses 2). The amortized cost means that even if every so often one push_back will be highly expensive (O(N) on the size of the vector at the time), those cases happen rarely enough that the cost of all operations over the whole set of insertions is linear in the number of insertions, and thus each insertion averages a constant cost)
When vector is out of space, it will use it's allocator to reserve more space.
It is up to the allocator to decide how this is implemented.
However, the vector decides how much space to reserve: the standard guarantees that the vector capacity shall grow by at least a factor of 1.51 geometrically (see comment), thus preventing horrible performance due to repeated 'small' allocations.
On the physical move/copy of elements:
c++11 conforming implementations will move elements if they support move assignment and construction
most implementations I know of (g++ notably) will just use std::copy for POD types; the algorithm specialisation for POD types ensures that this compiles into (essentially) a memcpy operation. This in turn gets compiled in whatever CPU instruction is fastest on your system (e.g. SSE2 instructions)
1 I tried finding the reference quote for that from the n3242 standard draft document, but I was unable to find it at this time
A vector gurantees that all elements are contigious in memory.
Internally you can think of it as defined as three pointers (or what act like pointers):
start: Points at the beginning of the allocated block.
final: Points one past the last element in the vector.
If the vector is empty then start == final
capacity: Points one past the end of allocated memory.
If final == capacity there is no room left.
When you push back.
If final is smaller than capacity:
the new element is copied into the location pointed at by final
final is incremented to the next location.
If final is the same as capacity then the vector is full
new memory must be allocated.
The compiler will then allocate X*(capacity - start)*sizeof(t) bytes.
where X is usually a value between 1.5 and 2.
It then copies all the values from the old memory buffer to the new memory buffer.
the new value is added to the buffer.
Transfers start/final/capacity pointers.
Free's up the old buffer
When vector runs out of space, it is reallocated and all the elements are copied over to the new array. The old array is then destroyed.
To avoid an excessive number of allocations and to keep the average push_back() time at O(1), a reallocation requires that the size be increased by at least a constant factor. (1.5 and 2 are common)
When you call vector::push_back the end pointer is compared to the capacity pointer. If there is enough room for the new object placement new is called to construct the object in the available space and the end pointer is incremented.
If there isn't enough room the vector calls its allocator to allocate enough contiguous space for at least the existing elements plus new element (different implementation may grow the allocated memory by different multipliers). Then all existing elements plus the new one are copied to the newly allocated space.
std::vector overallocates - it will usually allocate more memory than necessary automatically. sizeis not affectd by this, but you can control that through capacity.
std::vector will copy everything if the additional capacity is not sufficient.
The memory allocated by std::vector is raw, no constructors are called on demand, using placement new.
So, push_back does:
if capacity is not sufficient for the new element, it will
allocate a new block
copy all existing elements (usually using the copy constructor)
increase size by one
copy the new element to the new location
If you have some idea of what will be the final size of your array, try to vector::reserve the memory first. Note that reserve is different from vector::resize. With reserve the vector::size() of your array is not changed