Using vector to minimize heap allocations causes seg faults - c++

Within a function, I have created a vector with generous amounts of space to which I push a runtime determined amount of objects(Edge). Other objects, however, maintain pointers to the Edges within the vector. Occasionally the entire program seg faults because a pointer becomes invalid, and I suspect that this happens when the vector reaches capacity and reallocates, thereby invalidating the memory addresses.
Is there any way around this? Or perhaps is there another solution to grouping together heap allocations?
Note: that the primary motivation for this is to minimize heap allocations, for this is what is slowing down my algorithm. Initially I had vector<Edge *> and every element added was individually allocated. Batch allocation increased the speed dramatically, but the vector method described here invalidates pointers.
Your code example, as requested:
This is the vector I declare as a stack var:
vector<Edge> edgeListTemp(1000);
I then add to it as such, using an rvalue overload:
edgeListTemp.push_back(Edge{edge->movie, first, second});
Node objects keep pointers to these:
first->edges.push_back(&edgeListTemp.back());
second->edges.push_back(&edgeListTemp.back());
Where edges is declared as follows:
std::vector<Edge *> edges; /**< Adjacency list */

There are several possible solutions:
if you already know the maximum number of elements in advance, do a reserve over the vector from the start; elements won't be reallocated until you reach that size;
if you don't know the maximum number of elements/don't want to preallocate the maximum size for performance reasons but you only add/remove elements from the end (or from the start) of the vector, use an std::deque instead. std::deque guarantees that pointers to elements aren't invalidated as long as you only push/pop from front/back;
std::list guarantees to never invalidate references to elements, but it introduces several serious performance penalties (no O(1) addressing, one allocation for each node);
if you want to ignore the problem completely, add a layer of indirection, and store into the vector pointers to elements allocated over the heap; even better, make a vector of std::shared_ptr and always use it to keep references to the elements; this obviously has the disadvantage of needing one allocation for each element, which may or may not be acceptable, depending on your use case.

A std::deque does not move elements once added, so iterators and references are stable as long as you don't delete the referenced element.
Like std::vector, std::deque offers random access iterators. Random access into a deque is a little slower than std::vector, but still O(1). If you need stable references, the slight slow-down is probably acceptable.
Alternatively, instead of the pointer to the element, you could keep a reference to the vector and an index into the vector.

Related

How does deque manage memory?

There are some confusion for me in using deque container.
I compared a vector with a deque, I entered Integer values dynamically and observed that after a few insertions vector's objects start moving around and addresses have been changed which seemed logical. However deque's objects stayed on the same place in the memory even after I entered a few hundred integers.
This observation gives me the idea that deque reserves a much larger memory than vector but if it is true then what is the point of having dynamic memory instead of static? Even if it does, it will run out of memory at somewhere and need to change the place on the memory, So the next question is does it move every object or just start using memory somewhere else and links it with previous location?
deque container supports iterator arithmetic but is it safe to use it? i want to know how deque manage the memory not how one might prefer to use it.
A deque is a double linked list of mini vectors. That means addresses are stable.
Iterator arithmetic is valid unless operation which invalidates iterators is performed.
This is true for vectors too
From this std::deque reference:
... typical implementations use a sequence of individually allocated fixed-size arrays
You could think of it like a list of arrays.
A deque is typically implemented as a sequence of fixed-length pages of elements. If you append elements, when a page is full a new one is allocated and added at the end of the pages index. This guarantees that, if you add or remove elements only at the end or at the beginning, the ones that have already been stored aren't moved around in memory (in standardese speak, references to existing elements aren't invalidated by push_back, pop_back, push_front and pop_front).

Memory Allocation in STL C++

I am a little confused about memory reallocation in STL C++. For example, I know if I declare a vector, and keep pushing back elements into it, the vector will at some point need a reallocation of memory space and copy all the existing elements into it. For linked lists no reallocation is needed, because the elements are not stored consecutively in the stack and each element uses a pointer to point to the next element.
My question is, what's the situation for other STL in C++ ? for example, string, map,unordered_map? Do they need reallocation ?
(disclaimer: all the concrete data structures specified here may not be required by the standard, but they are useful to remember to help link the rules to something concrete)
std::string ~= std::vector; it's a dynamic array, if you keep pushing elements at its end, at some time it will reallocate your elements.
std::list: each element is a new linked list node, so no reallocation ever takes place, wherever you may insert the new element.
std::deque: it's typically made of several pages of elements; when a page is full, a new one is allocated and added to the list of pages; for this reason, no reallocation of your elements ever takes place, and your elements never get moved if you keep pushing stuff at the beginning or at the end.
std::map/std::multimap/std::set/std::multiset: typically a binary tree, each element is allocated on its own; no reallocations are ever performed.
std::unordered_map/std::unordered_multimap/std::unordered_set/std::unordered_multiset: a hash table; when the table is full enough, rehashing and reallocation occurs.
Almost all STL containers memory is allocated on heap, even for a vector. A plain array and std::array template are the only once probably that have memory on stack.
If your questions is about continuous memory allocation (whether on stack or heap), then, probably, plain array, std::array, std::vector have got continuous memory. Almost all other containers such as list, deque, map, set etc. do not have memory allocated continuously.

How does deque have an amortized constant Time Complexity

I read here from the accepted answer that a std::deque has the following characteristic
1- Random access - constant O(1)
2- Insertion or removal of elements at the end or beginning - amortized constant O(1)
3- Insertion or removal of elements - linear O(n)
My question is about point 2. How can a deque have an amortized constant insertion at the end or beginning?
I understand that a std::vector has an amortized constant time complexity for insertions at the end. This is because a vector is continguous and is a dynamic array. So when it runs out of memory for a push_back at the end, it would allocate a whole new block of memory, copy existing items from the old location to the new location and then delete items from the old location. This operation I understand is amortized constant. How does this apply to a deque ? How can insertions at the top and bottom of a deque be amortized constant. I was under the impression that it was supposed to be constant O(1). I know that a deque is composed of memory chunks.
The usual implementation of a deque is basically a vector of pointers to fixed-sized nodes.
Allocating the fixed-size node clearly has constant complexity, so that's pretty easy to handle--you just amortize the cost of allocating a single node across the number of items in that node to get a constant complexity for each.
The vector of pointers part is what's (marginally) more interesting. When we allocate enough of the fixed-size nodes that the vector of pointers is full, we need to increase the size of the vector. Like std::vector, we need to copy its contents to the newly allocated vector, so its growth must follow a geometric (rather than arithmetic) progression. This means that we have more pointers to copy we do the copying less and less frequently, so the total time devoted to copying pointers remains constant.
As a side note: the "vector" part is normally treated as a circular buffer, so if you're using your deque as a queue, constantly adding to one end and removing from the other does not result in re-allocating the vector--it just means moving the head and tail pointers that keep track of which of the pointers are "active" at a given time.
The (profane) answer lies in containers.requirements.general, 23.2.1/2:
All of the complexity requirements in this Clause are stated solely in
terms of the number of operations on the contained objects.
Reallocating the array of pointers is hence not covered by the complexity guarantee of the standard and may take arbitrarily long. As mentioned before, it likely adds an amortized constant overhead to each push_front()/push_back() call (or an equivalent modifier) in any "sane" implementation. I would not recommend using deque in RT-critical code though. Typically, in an RT scenario, you don't want to have unbounded queues or stacks (which in C++ by default use deque as the underlying container) anyway, neither memory allocations that could fail, so you will be most likely using a preallocated ring buffer (e.g. Boost's circular_buffer) instead.

std::vector and pointer predictability

when you push_back() items into an std::vector, and retain the pointers to the objects in the vector via the back() reference -- are you guaranteed (assuming no deletions occur) that the address of the objects in the vector will remain the same?
It seems like my vector changes the pointers of the objects I use, such that if I push 10 items into it, and retain the pointers to those 10 items by remembering the back() reference after each push_back.
if your vector is to store objects, not pointers to objects, are the addresses of those objects subject to constant change upon pushing more items?
Any method that causes the vector to resize itself will invalidate all iterators, pointers, and references to the elements contained within. This can be avoided by reserving mememory, or using boost::stable_vector.
23.3.6.5/1:
Remarks: Causes reallocation if the new size is greater than the old capacity. If no reallocation happens,
all the iterators and references before the insertion point remain valid.
No, std::vector is not a stable container, i.e. pointers and iterators may get invalidated by resizing the vector (or, better, by the corresponding reallocation). If you want to avoid this behaviour, use boost::stable_vector or std::list or std::deque instead (I would prefer the last). Or, more easily, you can simply store your locations by indices.
For more information, consider also the answer to this question here.
It's not guaranteed. If you push_back enough items to exceed the size of the memory buffer that's the backing store of the vector, a new buffer will be created, all the contents will be copied to the new location, and the old buffer will be deleted. At that point, old pointers (as well as iterators!) will be invalid.
If you know exactly how much maximum space you will ever need, you could set the size of the vector's buffer to that size when you create it, to avoid reallocation. However, I prefer to store "references" to elements of a vector as a reference to the vector and a size_t index into the vector, instead of using pointers. It's not necessarily slower than pointers (depending on the CPU type) but, even if it is, it won't be much slower and in my opinion it's worth it for the peace of mind knowing that no matter how the vector is used in the future or reallocated, it'll still refer to the proper element.

why not implement c++ std::vector::pop_front() by shifting the pointer to vector[0]?

Why can't pop_front() be implemented for C++ vectors by simply shifting the pointer contained in the vector's name one spot over? So in a vector containing an array foo, foo is a pointer to foo[0], so pop_front() would make the pointer foo = foo[1] and the bracket operator would just do the normal pointer math. Is this something to do with how C++ keeps track of the memory you're using for what when it allocates space for an array?
This is similar to other questions I've seen about why std::vector doesn't have a pop_front() function, I will admit, but i haven't anyone asking about why you can't shift the pointer.
The vector wouldn't be able to free its memory if it did this.
Generally, you want the overhead per vector object to be small. That means you only store three items: the pointer to the first element, the capacity, and the length.
In order to implement what you suggest, every vector ever (all of them) would need an additional member variable: the offset from the start pointer at which the zeroth element resides. Otherwise, the memory could not be freed, since the original handle to it would have been lost.
It's a tradeoff, but generally the memory consumption of an object which may have millions of instances is more valuable than the corner case of doing the absolute worst thing you can do performance-wise to the vector.
Because implementers want to optimize the size of a vector. They usually use 3 pointers, one for the beginning, one for the capacity (the allocated end) and one for the end.
Doing what you require adds another 4 bytes to every vector (and there are a lot of those in a c++ program) for very little benefit: the contract of vector is to be fast when pushing back new elements, removing and inserting are "unsual" operations and their performance matter less than the size of the class.
I started typing out an elaborate answer explaining how the memory is allocated and freed but after typing it all out I realized that memory issues alone don't justify why pop_front isn't there as other answers here suggested.
Having pop_front in a vector where the extra cost is another pointer is justifiable in most circumstances. The problem, in my opinion, is push_front. If the container has pop_front then it should also have push_front otherwise the container is not being consistent. push_front is definitely expensive for a vector container (unless you match your pushes with your pops which is not a good design). Without push_front the vector is really wasting memory if one does lots of pop_front operations with no push_front functionality.
Now the need for pop_front and push_front is there for a container that is similar to a vector (constant time random access) which is why there is deque.
You could, but it would complicate the implementation a bit, and add a pointer of overhead to the type's size (so it could track the actual allocation's address). Is that worth it? Sometimes. First consider other structures which may handle your usage better (maybe deque?).
You could do that, but vector is designed to be a simple container with constant time index lookups and push/pop from the end. Doing what you suggest would complicate the implementation as it would have to track the allocated beginning and the "current" beginning. Not to mention that you still couldn't guarantee constant time insertion at the front but you might get it sometimes.
If you need a container with constant time front and back insertion and removal, that's precisely what deque is for, there's no need to modify vector to handle it.
You can use std::deque instead of std::vector. It's a double-ended-queue with also the vector-like access members. It implements both front and back push/pop.
http://www.cplusplus.com/reference/stl/deque/
Another shortcoming of your suggestion is that you'll waste memory spaces as you can't make use of those on the left of the array after shifting. The more you execute pop_front(), the more you'll waste until the vector is destructed.