why not implement c++ std::vector::pop_front() by shifting the pointer to vector[0]? - c++

Why can't pop_front() be implemented for C++ vectors by simply shifting the pointer contained in the vector's name one spot over? So in a vector containing an array foo, foo is a pointer to foo[0], so pop_front() would make the pointer foo = foo[1] and the bracket operator would just do the normal pointer math. Is this something to do with how C++ keeps track of the memory you're using for what when it allocates space for an array?
This is similar to other questions I've seen about why std::vector doesn't have a pop_front() function, I will admit, but i haven't anyone asking about why you can't shift the pointer.

The vector wouldn't be able to free its memory if it did this.
Generally, you want the overhead per vector object to be small. That means you only store three items: the pointer to the first element, the capacity, and the length.
In order to implement what you suggest, every vector ever (all of them) would need an additional member variable: the offset from the start pointer at which the zeroth element resides. Otherwise, the memory could not be freed, since the original handle to it would have been lost.
It's a tradeoff, but generally the memory consumption of an object which may have millions of instances is more valuable than the corner case of doing the absolute worst thing you can do performance-wise to the vector.

Because implementers want to optimize the size of a vector. They usually use 3 pointers, one for the beginning, one for the capacity (the allocated end) and one for the end.
Doing what you require adds another 4 bytes to every vector (and there are a lot of those in a c++ program) for very little benefit: the contract of vector is to be fast when pushing back new elements, removing and inserting are "unsual" operations and their performance matter less than the size of the class.

I started typing out an elaborate answer explaining how the memory is allocated and freed but after typing it all out I realized that memory issues alone don't justify why pop_front isn't there as other answers here suggested.
Having pop_front in a vector where the extra cost is another pointer is justifiable in most circumstances. The problem, in my opinion, is push_front. If the container has pop_front then it should also have push_front otherwise the container is not being consistent. push_front is definitely expensive for a vector container (unless you match your pushes with your pops which is not a good design). Without push_front the vector is really wasting memory if one does lots of pop_front operations with no push_front functionality.
Now the need for pop_front and push_front is there for a container that is similar to a vector (constant time random access) which is why there is deque.

You could, but it would complicate the implementation a bit, and add a pointer of overhead to the type's size (so it could track the actual allocation's address). Is that worth it? Sometimes. First consider other structures which may handle your usage better (maybe deque?).

You could do that, but vector is designed to be a simple container with constant time index lookups and push/pop from the end. Doing what you suggest would complicate the implementation as it would have to track the allocated beginning and the "current" beginning. Not to mention that you still couldn't guarantee constant time insertion at the front but you might get it sometimes.
If you need a container with constant time front and back insertion and removal, that's precisely what deque is for, there's no need to modify vector to handle it.

You can use std::deque instead of std::vector. It's a double-ended-queue with also the vector-like access members. It implements both front and back push/pop.
http://www.cplusplus.com/reference/stl/deque/

Another shortcoming of your suggestion is that you'll waste memory spaces as you can't make use of those on the left of the array after shifting. The more you execute pop_front(), the more you'll waste until the vector is destructed.

Related

Removing from the beginning of an std::vector in C++

I might be missing something very basic here but here is what I was wondering -
We know removing an element from the beginning of an std::vector ( vector[0] ) in C++ is an O(n) operation because all the other elements have to be shifted one place backwards.
But why isn't it implemented such that the pointer to the first element is moved one position ahead so that now the vector starts from the second element and, in essence, the first element is removed? This would be an O(1) operation.
std::array and C-style arrays are fixed-length, and you can't change their length at all, so I think you're having a typo there and mean std::vector instead.
"Why was it done that way?" is a bit of a historical question. Perspectively, if your system library allowed for giving back unused memory to the operating system, your "shift backwards" trick would disallow any reuse of the former first elements' memory later on.
Also, std::vector comes from systems (like they are still basically used in every operating system) with calls like free and malloc, where you need to keep the pointer to the beginning of an allocated region around, to be able to free it later on. Hence, you'd have to lug around another pointer in the std::vector structure to be able to free the vector, and that would only be a useful thing if someone deleted from the front. And if you're deleting from the front, chances are you might be better off using a reversed vector (and delete from the end), or a vector isn't the right data structure alltogether.
It is not impossible for a vector to be implemented like that (it wouldn't be std::vector though). You need to keep the pointer to first element in addition to a pointer to the underlying array (alternatively some offset can be stored, but no matter how you put it you need to store more data in the vector).
Consider that this is useful only for one quite specific use-case: Erasing the first element. Well, once you got that you can also benefit while inserting an element in the front when there is free space left. If there is free space left then even inserting in the first half could benefit by shifting only the first half.
However, all this does not fit with the concept of capacity. With std::vector you know exactly how many elements you can add before a reallocation occurs: capcity() - size(). With your proposal this wouldn't hold any more. Erasing the first element would affect capacity in an odd way. It would complicate the interface and usages of vectors for all use cases.
Further, erasing elements anywhere else would still not be O(1). In total it would incur a cost and add complexity for any use of the vector, while bringing an advantage only in a very narrow use case.
If you do find yourself in the situation that you need to erase the front element very often, then you can either store the vector in reverse, and erasing the last element is already O(1), or use a different container.

Using vector to minimize heap allocations causes seg faults

Within a function, I have created a vector with generous amounts of space to which I push a runtime determined amount of objects(Edge). Other objects, however, maintain pointers to the Edges within the vector. Occasionally the entire program seg faults because a pointer becomes invalid, and I suspect that this happens when the vector reaches capacity and reallocates, thereby invalidating the memory addresses.
Is there any way around this? Or perhaps is there another solution to grouping together heap allocations?
Note: that the primary motivation for this is to minimize heap allocations, for this is what is slowing down my algorithm. Initially I had vector<Edge *> and every element added was individually allocated. Batch allocation increased the speed dramatically, but the vector method described here invalidates pointers.
Your code example, as requested:
This is the vector I declare as a stack var:
vector<Edge> edgeListTemp(1000);
I then add to it as such, using an rvalue overload:
edgeListTemp.push_back(Edge{edge->movie, first, second});
Node objects keep pointers to these:
first->edges.push_back(&edgeListTemp.back());
second->edges.push_back(&edgeListTemp.back());
Where edges is declared as follows:
std::vector<Edge *> edges; /**< Adjacency list */
There are several possible solutions:
if you already know the maximum number of elements in advance, do a reserve over the vector from the start; elements won't be reallocated until you reach that size;
if you don't know the maximum number of elements/don't want to preallocate the maximum size for performance reasons but you only add/remove elements from the end (or from the start) of the vector, use an std::deque instead. std::deque guarantees that pointers to elements aren't invalidated as long as you only push/pop from front/back;
std::list guarantees to never invalidate references to elements, but it introduces several serious performance penalties (no O(1) addressing, one allocation for each node);
if you want to ignore the problem completely, add a layer of indirection, and store into the vector pointers to elements allocated over the heap; even better, make a vector of std::shared_ptr and always use it to keep references to the elements; this obviously has the disadvantage of needing one allocation for each element, which may or may not be acceptable, depending on your use case.
A std::deque does not move elements once added, so iterators and references are stable as long as you don't delete the referenced element.
Like std::vector, std::deque offers random access iterators. Random access into a deque is a little slower than std::vector, but still O(1). If you need stable references, the slight slow-down is probably acceptable.
Alternatively, instead of the pointer to the element, you could keep a reference to the vector and an index into the vector.

Resize and copy of c++ array

If I have for example array of pointers (which is full) with 5 elements and I want to insert another element at second position, I would have to allocate another array (with size+1), copy first element from an old array, insert new element, and then copy remaining elements. This application can't waste any space.
This is the code so far:
Sometype **newArray=new Sometype*[++Count];
size_t s=sizeof(Array);
memcpy(newArray,Array,s*position);
newArray[position]=new Sometype();
memcpy(newArray+position+1,Array+position,s*(Count-position-1));
delete [] Array;
Array=newArray;
Is there any more efficient method to do this thing because this is a bottleneck for my application?I'm new to c++ so I don't know any advanced stuff.
Could vector be used for this purpose?
I think I read somewhere that it takes double of previous used space when it's resizing.
Is this true or this behavior can be modified?
If you can't waste any space and you have to stick to sequential containers, then I'm afraid this is the most efficient way. But I still don't believe that you can't waste any space. If you can anticipate in advance that you will need to later add 5 more elements, then having your array resized from the beginning will prove much more effective. In any case, you should use vector to avoid this awful C-style code and be more clear with your intents. You might want to take a look at std::vector<T>::reserve() function. Whether or not vector takes double of previous when it's resizing is unspecified and varies across implementations.
Have a look at the standard containers std::vector, std::list, std::unordered_set and std::unordered_map

Is there an equivalent of vector::reserve() for an std::list?

I have a class that looks like this:
typedef std::list<char*> PtrList;
class Foo
{
public:
void DoStuff();
private:
PtrList m_list;
PtrList::iterator m_it;
};
The function DoStuff() basically adds elements to m_list or erases elements from it, finds an iterator to some special element in it and stores it in m_it. It is important to note that each value of m_it is used in every following call of DoStuff().
So what's the problem?
Everything works, except that profiling shows that the operator new is invoked too much due to list::push_back() called from DoStuff().
To increase performance I want to preallocate memory for m_list in the initialization of Foo as I would do if it were an std::vector. The problem is that this would introduce new problems such as:
Less efficient insert and erase of elements.
m_it becomes invalid as soon as the vector is changed from one call to DoStuff() to the next. EDIT: Alan Stokes suggested to use an index instead of an iterator, solving this issue.
My solution: the simplest solution I could think of is to implement a pool of objects that also has a linked-list functionality. This way I get a linked list and can preallocate memory for it.
Am I missing something or is it really the simplest solution? I'd rather not "re-invent the wheel", and use a standard solution instead, if it exists.
Any thoughts, workarounds or enlightening comments would be appreciated!
I think you are using wrong the container.
If you want fast push back then don't automatically assume that you need a linked list, a linked list is a slow container, it is basically suitable for reordering.
A better container is a std::deque. A deque is basically a array of arrays. It allocates a block of memory and occupies it when you push back, when it runs out it will allocate another block. This means that it only allocates very infrequently and you do not have to know the size of the container ahead of time for efficiency like std::vector and reserver.
You can use the splice function in std::list to implement a pool. Add a new member variable PtrList m_Pool. When you want to add a new object and the pool is not empty, assign the value to the first element in the pool and then splice it into the list. To erase an element, splice it from the list to the pool.
But if you don't care about the order of the elements, then a deque can be much faster. If you want to erase an element in the middle, copy the last element onto the element you want to delete, then erase the last element.
My advice is the same as 111111's, try switching to deque before you write any significant code.
However, to directly answer your question: you could use std::list with a custom allocator. It's a bit fiddly, and I'm not going to work through all the details here, but the gist of it is that you write a class that represents the memory allocation strategy for list nodes. The nodes allocated by list will be a small implementation-defined amount larger than char*, but they will all be the same size, which means you can write an optimized allocator just for that size (a pool of memory blocks rather than a pool of objects), and you can add functions to it that let you reserve whatever space you want in the allocator, at the time you want. Then the list can allocate/free quickly. This saves you needing to re-implement any of the actual list functionality.
If you were (for some reason) going to implement a pool of objects with list functionality, then you could start with boost::intrusive. That might also be useful when writing your own allocator, for keeping track of your list of free blocks.
List and vector are completely different in the way they manage objects.
Vector constructs elements in place into a allocated buffer of a given capacity. New allocation happens when the capacity is exhausted.
List allocate elements one by one, each into an individually allocated space.
Vector elements shift when something is inserted / removed, hence, vector indexes and element addresses are not stable.
List element are re-linked when something is inserted / removed, hence, list iterators and elements addresses are stable.
A way to make a list to behave similarly to a vector, is to replace the default allocator (that allocates form the system every time is invoked) with another one the allocates objects in larger chunks, dispatching sub-chunks to the list when it invokes it.
This is not something the standard library provides by default.
Could potentially use list::get_allocator().allocate(). Afaik, default behaviour would be to acquire memory as and when due to the non-contiguous nature of lists - hence the lack of reserve() - but no major drawbacks with using the allocator method occur to me immediately. Provided you have a non-critical section in your program, at the start or whatever, you can at least choose to take the damage at that point.

Something like a deque on large numbers of items, but small memory usage on small numbers?

I have a whole bunch of objects of a certain type, each of which may allocate a deque to hold other objects of that same type. I am using a deque because I need fast access at both ends, and because any particular object could possibly refer to many other objects.
However, it's likely the case that many or even most of the objects refer to very few other objects. In this case, the memory usage of deque is pretty big. The implementation I'm using is allocating 4096 bytes at a shot, as soon as I do the very first push_back(). Each element in the deque is only 8 bytes. That's a whole lot of wasted space, especially because I'm making many of these objects, and hence many of these deques.
At the same time, I pretty much need a deque (or something like it), because like I said, any particular object can actually refer to many other objects, despite the fact that most objects refer to very few other objects.
My first thought was using capacity() and reserve() to grow the deque myself, but my compiler informed me that there are no such functions on deque.
So, I was thinking perhaps to write a class with a deque-like interface, underlying which is a vector and a deque, with the vector used until (say) sixteen elements exist, after which the vector is thrown away and the deque is used from there on out.
Since the vector is only used when there are only a small number of elements, it shouldn't really matter too much that push_front() and pop_front() are going to be inefficient in terms of speed, and since I can control the vector with capacity() and reserve(), it shouldn't really matter too much that deque uses a lot of memory when more elements exist.
But, before rolling my own class like this, I wanted to check to see if something like this already exists. Also, if anybody knows of any reason I haven't thought of why something like this is a bad idea, or if anybody has any related suggestions, I'd love to hear it.
Thanks in advance.
You don't mention if you need other capabilities of vector or deque like random access iterators. If you don't this actually sounds like a good candidate to use list. It has good performance inserting and removing from both ends.
You could use an (intrusive) list if you don't need random access by index. Lists allow quick O(1) push_front/push_back() and pop_front/pop_back().
If objects are not shared, that is, an object is only ever owned by at most one other object, than an intrusive list would be the best. And since your objects are of the same type, they can be allocated from one memory pool (big array) to avoid any memory overhead.