How to release resources of std::map and std::unordered_map manually as std::vector in C++ 11 and higher - c++

Assume we have a vector std::vector<int> v and let's assume that some resources are allocated to it. To my knowledge, v.clear() and v.shrink_to_fit() releases all resources allocated to v. I am wondering if there exist similar operations for std::map and std::unordered_map that release all resources manually. I can only find a member function clear() for these two templates. Can someone explain why there is no shrink_to_fit() for the latter two templates?

There is no shrink_to_fit() in std::map, because it would be useless.
In std::vector, to ensure amortized constant insertion time required by standard, implementation can allocate more memory than is currently necessary for future storage (so that it doesn't have to reallocate everything each push_back()). Most implementations allocate 2*size() if current capacity() would be exceeded.
shrink_to_fit() asks to release that extra memory to make size() == capacity() (but it's not guaranteed that this will actually happen).
Now, std::map is usually implemented as a red-black tree. Adding an element into such structure is just creating new node and a bit of pointer magic. It will not involve reallocation of other nodes and you cannot speed it up by pre-allocating some memory. shrink_to_fit() doesn't make sense, because there is nothing to shrink.
Update after dewaffled's comment: For std::unordered_map there's a rehash() method which may decrease size of hash table by recalculating it, but similarly to shrink_to_fit() it's not a guaranteed result.

I'm going to assume you're talking about dynamic maps; the shrink_to_fit() wouldn't make sense because the map is only as big as its linked elements. My understanding is that there isn't 'empty' nodes for a map, like you could have empty fields in a vector.

Related

Using vector to minimize heap allocations causes seg faults

Within a function, I have created a vector with generous amounts of space to which I push a runtime determined amount of objects(Edge). Other objects, however, maintain pointers to the Edges within the vector. Occasionally the entire program seg faults because a pointer becomes invalid, and I suspect that this happens when the vector reaches capacity and reallocates, thereby invalidating the memory addresses.
Is there any way around this? Or perhaps is there another solution to grouping together heap allocations?
Note: that the primary motivation for this is to minimize heap allocations, for this is what is slowing down my algorithm. Initially I had vector<Edge *> and every element added was individually allocated. Batch allocation increased the speed dramatically, but the vector method described here invalidates pointers.
Your code example, as requested:
This is the vector I declare as a stack var:
vector<Edge> edgeListTemp(1000);
I then add to it as such, using an rvalue overload:
edgeListTemp.push_back(Edge{edge->movie, first, second});
Node objects keep pointers to these:
first->edges.push_back(&edgeListTemp.back());
second->edges.push_back(&edgeListTemp.back());
Where edges is declared as follows:
std::vector<Edge *> edges; /**< Adjacency list */
There are several possible solutions:
if you already know the maximum number of elements in advance, do a reserve over the vector from the start; elements won't be reallocated until you reach that size;
if you don't know the maximum number of elements/don't want to preallocate the maximum size for performance reasons but you only add/remove elements from the end (or from the start) of the vector, use an std::deque instead. std::deque guarantees that pointers to elements aren't invalidated as long as you only push/pop from front/back;
std::list guarantees to never invalidate references to elements, but it introduces several serious performance penalties (no O(1) addressing, one allocation for each node);
if you want to ignore the problem completely, add a layer of indirection, and store into the vector pointers to elements allocated over the heap; even better, make a vector of std::shared_ptr and always use it to keep references to the elements; this obviously has the disadvantage of needing one allocation for each element, which may or may not be acceptable, depending on your use case.
A std::deque does not move elements once added, so iterators and references are stable as long as you don't delete the referenced element.
Like std::vector, std::deque offers random access iterators. Random access into a deque is a little slower than std::vector, but still O(1). If you need stable references, the slight slow-down is probably acceptable.
Alternatively, instead of the pointer to the element, you could keep a reference to the vector and an index into the vector.

Memory Allocation in STL C++

I am a little confused about memory reallocation in STL C++. For example, I know if I declare a vector, and keep pushing back elements into it, the vector will at some point need a reallocation of memory space and copy all the existing elements into it. For linked lists no reallocation is needed, because the elements are not stored consecutively in the stack and each element uses a pointer to point to the next element.
My question is, what's the situation for other STL in C++ ? for example, string, map,unordered_map? Do they need reallocation ?
(disclaimer: all the concrete data structures specified here may not be required by the standard, but they are useful to remember to help link the rules to something concrete)
std::string ~= std::vector; it's a dynamic array, if you keep pushing elements at its end, at some time it will reallocate your elements.
std::list: each element is a new linked list node, so no reallocation ever takes place, wherever you may insert the new element.
std::deque: it's typically made of several pages of elements; when a page is full, a new one is allocated and added to the list of pages; for this reason, no reallocation of your elements ever takes place, and your elements never get moved if you keep pushing stuff at the beginning or at the end.
std::map/std::multimap/std::set/std::multiset: typically a binary tree, each element is allocated on its own; no reallocations are ever performed.
std::unordered_map/std::unordered_multimap/std::unordered_set/std::unordered_multiset: a hash table; when the table is full enough, rehashing and reallocation occurs.
Almost all STL containers memory is allocated on heap, even for a vector. A plain array and std::array template are the only once probably that have memory on stack.
If your questions is about continuous memory allocation (whether on stack or heap), then, probably, plain array, std::array, std::vector have got continuous memory. Almost all other containers such as list, deque, map, set etc. do not have memory allocated continuously.

What is the time complexity for a clear function is std::map according to big O?

What is the time complexity for a clear function is std::map?
Would I be right in saying that it is O(1)?
The Standard says in [associative.reqmts]/8 Table 102:
a.clear() <=> a.erase(a.begin(), a.end()) linear in a.size()
So it is actually mandated to be O(N).
EDIT: summing up the various bits.
To remove a node, a map does two operation:
Call the allocator destroy method to destroy the element
Call the allocator deallocate method to deallocate the memory occupied by the node
The former can be elided in code (checking for is_trivially_destructible), and actually it is generally done in vector for example. The latter is unfortunately trickier, and no trait exists, so we must rely on the optimizer.
Unfortunately, even if by inlining the optimizer could completely remove the destroy and deallocate nodes, I am afraid it would not be able to realize that the tree traversal is now useless and optimize that away too. Therefore you would end up in a Θ(N) traversal of the tree and nothing done at each step...
The cplusplus reference site claims it has linear complexity in the container's size as the destructor of each element must be called.
Because it's a template, it may be known at compile time that destruction in a no-op for the type (e.g. std::map<int>), so the need to destroy members isn't a good basis for deducing a necessary worst-case performance. Still, the compiler must visit every node of the binary tree, releasing the heap memory, and the number of nodes relates linearly to the number of elements (that erase() only invalidates iterators/references/pointers to the erased element, insert() doesn't invalidate any etc. all evidence the 1:1 relationship).
So, it's linear, but because of the need to clean up the heap usage even if element destructors aren't needed....
(Interestingly, this implies that a std::map<>-like associative container - or perhaps std::map<> itself with a clever custom allocator - could be made O(1) for elements with trivial no-op destructors if all the memory was allocated from a dedicated memory pool that could be "thrown away" in O(1).)
As I know, all the clean-operation's complexity is O(n), because you need to destuct these objects one by one.

why not implement c++ std::vector::pop_front() by shifting the pointer to vector[0]?

Why can't pop_front() be implemented for C++ vectors by simply shifting the pointer contained in the vector's name one spot over? So in a vector containing an array foo, foo is a pointer to foo[0], so pop_front() would make the pointer foo = foo[1] and the bracket operator would just do the normal pointer math. Is this something to do with how C++ keeps track of the memory you're using for what when it allocates space for an array?
This is similar to other questions I've seen about why std::vector doesn't have a pop_front() function, I will admit, but i haven't anyone asking about why you can't shift the pointer.
The vector wouldn't be able to free its memory if it did this.
Generally, you want the overhead per vector object to be small. That means you only store three items: the pointer to the first element, the capacity, and the length.
In order to implement what you suggest, every vector ever (all of them) would need an additional member variable: the offset from the start pointer at which the zeroth element resides. Otherwise, the memory could not be freed, since the original handle to it would have been lost.
It's a tradeoff, but generally the memory consumption of an object which may have millions of instances is more valuable than the corner case of doing the absolute worst thing you can do performance-wise to the vector.
Because implementers want to optimize the size of a vector. They usually use 3 pointers, one for the beginning, one for the capacity (the allocated end) and one for the end.
Doing what you require adds another 4 bytes to every vector (and there are a lot of those in a c++ program) for very little benefit: the contract of vector is to be fast when pushing back new elements, removing and inserting are "unsual" operations and their performance matter less than the size of the class.
I started typing out an elaborate answer explaining how the memory is allocated and freed but after typing it all out I realized that memory issues alone don't justify why pop_front isn't there as other answers here suggested.
Having pop_front in a vector where the extra cost is another pointer is justifiable in most circumstances. The problem, in my opinion, is push_front. If the container has pop_front then it should also have push_front otherwise the container is not being consistent. push_front is definitely expensive for a vector container (unless you match your pushes with your pops which is not a good design). Without push_front the vector is really wasting memory if one does lots of pop_front operations with no push_front functionality.
Now the need for pop_front and push_front is there for a container that is similar to a vector (constant time random access) which is why there is deque.
You could, but it would complicate the implementation a bit, and add a pointer of overhead to the type's size (so it could track the actual allocation's address). Is that worth it? Sometimes. First consider other structures which may handle your usage better (maybe deque?).
You could do that, but vector is designed to be a simple container with constant time index lookups and push/pop from the end. Doing what you suggest would complicate the implementation as it would have to track the allocated beginning and the "current" beginning. Not to mention that you still couldn't guarantee constant time insertion at the front but you might get it sometimes.
If you need a container with constant time front and back insertion and removal, that's precisely what deque is for, there's no need to modify vector to handle it.
You can use std::deque instead of std::vector. It's a double-ended-queue with also the vector-like access members. It implements both front and back push/pop.
http://www.cplusplus.com/reference/stl/deque/
Another shortcoming of your suggestion is that you'll waste memory spaces as you can't make use of those on the left of the array after shifting. The more you execute pop_front(), the more you'll waste until the vector is destructed.

Why does std::stack use std::deque by default?

Since the only operations required for a container to be used in a stack are:
back()
push_back()
pop_back()
Why is the default container for it a deque instead of a vector?
Don't deque reallocations give a buffer of elements before front() so that push_front() is an efficient operation? Aren't these elements wasted since they will never ever be used in the context of a stack?
If there is no overhead for using a deque this way instead of a vector, why is the default for priority_queue a vector not a deque also? (priority_queue requires front(), push_back(), and pop_back() - essentially the same as for stack)
Updated based on the Answers below:
It appears that the way deque is usually implemented is a variable size array of fixed size arrays. This makes growing faster than a vector (which requires reallocation and copying), so for something like a stack which is all about adding and removing elements, deque is likely a better choice.
priority_queue requires indexing heavily, as every removal and insertion requires you to run pop_heap() or push_heap(). This probably makes vector a better choice there since adding an element is still amortized constant anyways.
As the container grows, a reallocation for a vector requires copying all the elements into the new block of memory. Growing a deque allocates a new block and links it to the list of blocks - no copies are required.
Of course you can specify that a different backing container be used if you like. So if you have a stack that you know is not going to grow much, tell it to use a vector instead of a deque if that's your preference.
See Herb Sutter's Guru of the Week 54 for the relative merits of vector and deque where either would do.
I imagine the inconsistency between priority_queue and queue is simply that different people implemented them.