Removing from the beginning of an std::vector in C++ - c++

I might be missing something very basic here but here is what I was wondering -
We know removing an element from the beginning of an std::vector ( vector[0] ) in C++ is an O(n) operation because all the other elements have to be shifted one place backwards.
But why isn't it implemented such that the pointer to the first element is moved one position ahead so that now the vector starts from the second element and, in essence, the first element is removed? This would be an O(1) operation.

std::array and C-style arrays are fixed-length, and you can't change their length at all, so I think you're having a typo there and mean std::vector instead.
"Why was it done that way?" is a bit of a historical question. Perspectively, if your system library allowed for giving back unused memory to the operating system, your "shift backwards" trick would disallow any reuse of the former first elements' memory later on.
Also, std::vector comes from systems (like they are still basically used in every operating system) with calls like free and malloc, where you need to keep the pointer to the beginning of an allocated region around, to be able to free it later on. Hence, you'd have to lug around another pointer in the std::vector structure to be able to free the vector, and that would only be a useful thing if someone deleted from the front. And if you're deleting from the front, chances are you might be better off using a reversed vector (and delete from the end), or a vector isn't the right data structure alltogether.

It is not impossible for a vector to be implemented like that (it wouldn't be std::vector though). You need to keep the pointer to first element in addition to a pointer to the underlying array (alternatively some offset can be stored, but no matter how you put it you need to store more data in the vector).
Consider that this is useful only for one quite specific use-case: Erasing the first element. Well, once you got that you can also benefit while inserting an element in the front when there is free space left. If there is free space left then even inserting in the first half could benefit by shifting only the first half.
However, all this does not fit with the concept of capacity. With std::vector you know exactly how many elements you can add before a reallocation occurs: capcity() - size(). With your proposal this wouldn't hold any more. Erasing the first element would affect capacity in an odd way. It would complicate the interface and usages of vectors for all use cases.
Further, erasing elements anywhere else would still not be O(1). In total it would incur a cost and add complexity for any use of the vector, while bringing an advantage only in a very narrow use case.
If you do find yourself in the situation that you need to erase the front element very often, then you can either store the vector in reverse, and erasing the last element is already O(1), or use a different container.

Related

What advantages do vectors have over linked lists

I had the following exchange with my professor which wasn't very satisfying. I included my parts of the exchange which should be enough to get my point across.
"For vectors, does the C++ implementation traverse through each element of the old dynamically allocated array and free it?
(Edit: I mean, when resizing and adding elements, either by pushback or resize)
I am especially curious because the book tries to make the case that linked lists are troublesome because of having to traverse each time. Does not seem to me that vectors have a huge advantage in that regard.
The main benefit of the vectors I can see is the convenience and fast accessing but not much more. As in, everytime you try to do something other than accessing, you will be traversing through everything to move and free memory. Is that correct?"
After his reply, I added.
"Professor xxxx,
I went out to test, and in fact, the addresses change if you either resize or push_back, so my assumption that the old addresses are freed is correct. I can only assume that program will have to go to each element to free it, and if that's correct, won't the insertion of new things be costly in terms of time, even more so than traversing linked lists?
Can you kindly correct the following statement If it states any incorrect facts or assumptions. Using vectors in any other way than using arrays (for any other purpose than accessing already stored data), means that linked lists will almost always be faster because unlike linked lists, in vectors not only would you traverse through the elements, you will traverse through them, free them, and then create a whole new array to accommodate new space. That is because the next address after the last element of the current vector could have a pointer variable pointing to it, and using that address will cause an extremely strange behavior that I cannot imagine the poor soul's misery that tries to figure out what had gone wrong."
TL;DR:
Disadvantage of linked lists is traversing, but vector uses (push_back, resize(), etc) most often require the traversing anyway, so how are vectors exactly faster?
There are several things that are faster than you expect:
When a vector reallocates, the original elements are destroyed, not freed, one by one. Their storage is then freed all at once. This is as opposed to a linked list, where each node is allocated and freed individually. But this is somewhat moot because:
A vector batches reallocations. std::vector is specified to have an amortized constant insertion cost, which implies that it avoids reallocating every time you push_back, to the extent that this cost becomes negligible when considering complexity. Typical implementations multiply the vector's capacity by a fixed factor every time it is exceeded, so when performing the costly reallocation it provides room for the next several push_backs. These then do not need to traverse the vector or allocate anything.
A vector is extremely cache-friendly. This makes all sequential operations on a vector blazing fast, and can counter-intuitively outperform a linked list in many cases, especially in long-running applications where memory might become fragmented.
As a complement of the already given answer and comments...
The std::vector has its elements stored contiguously in memory while it is not the case of a linked list.
The direct consequence is that element access is trivial for a std::vector while it's not for a linked list.
For example, if I want to access the nth element of a linked list, I have to iterate through the list until reaching the desired element.
But on the other hand, the linked list will perform better if we want to insert a new element inside.
Indeed, for a linked list, we have to iterate until we reach the desired position, then we just have to change the connections between the previous and the next node so that the new element is inserted in-between.
For a std::vector, you have to relocate every elements after the desired position (and make a reallocation if needed, i.e. if adding a new element exceeds the reserved available space).
So the std::vector is better for element access but is less efficient when inserting an element inside (same thing for removal).

Resize and copy of c++ array

If I have for example array of pointers (which is full) with 5 elements and I want to insert another element at second position, I would have to allocate another array (with size+1), copy first element from an old array, insert new element, and then copy remaining elements. This application can't waste any space.
This is the code so far:
Sometype **newArray=new Sometype*[++Count];
size_t s=sizeof(Array);
memcpy(newArray,Array,s*position);
newArray[position]=new Sometype();
memcpy(newArray+position+1,Array+position,s*(Count-position-1));
delete [] Array;
Array=newArray;
Is there any more efficient method to do this thing because this is a bottleneck for my application?I'm new to c++ so I don't know any advanced stuff.
Could vector be used for this purpose?
I think I read somewhere that it takes double of previous used space when it's resizing.
Is this true or this behavior can be modified?
If you can't waste any space and you have to stick to sequential containers, then I'm afraid this is the most efficient way. But I still don't believe that you can't waste any space. If you can anticipate in advance that you will need to later add 5 more elements, then having your array resized from the beginning will prove much more effective. In any case, you should use vector to avoid this awful C-style code and be more clear with your intents. You might want to take a look at std::vector<T>::reserve() function. Whether or not vector takes double of previous when it's resizing is unspecified and varies across implementations.
Have a look at the standard containers std::vector, std::list, std::unordered_set and std::unordered_map

how c++ vector works

Lets say if I have a vector V, which has 10 elements.
If I erase the first element (at index 0) using v.erase(v.begin()) then how STL vector handle this?
Does it create another new vector and copy elements from the old vector to the new vector and deallocate the old one? Or Does it copy each element starting from index 1 and copy the element to index-1 ?
If I need to have a vector of size 100,000 at once and later I don't use that much space, lets say I only need a vector of size 10 then does it automatically reduce the size? ( I don't think so)
I looked online and there are only APIs and tutorials how to use STL library.
Is there any good references that I can have an idea of the implementation or complexity of STL library?
Actually, the implementation of vector is visible, since it's a template, so you can look into that for details:
iterator erase(const_iterator _Where)
{ // erase element at where
if (_Where._Mycont != this
|| _Where._Myptr < _Myfirst || _Mylast <= _Where._Myptr)
_DEBUG_ERROR("vector erase iterator outside range");
_STDEXT unchecked_copy(_Where._Myptr + 1, _Mylast, _Where._Myptr);
_Destroy(_Mylast - 1, _Mylast);
_Orphan_range(_Where._Myptr, _Mylast);
--_Mylast;
return (iterator(_Where._Myptr, this));
}
Basically, the line
unchecked_copy(_Where._Myptr + 1, _Mylast, _Where._Myptr);
does exactly what you thought - copies the following elements over (or moves them in C++11 as bames53 pointed out).
To answer your second question, no, the capacity cannot decrease on its own.
The complexities of the algorithms in std can be found at http://www.cplusplus.com/reference/stl/ and the implementation, as previously stated, is visible.
Does it copy each element starting from index 1 and copy the element to index-1 ?
Yes (though it actually moves them since C++11).
does it automatically reduce the size?
No, reducing the size would typically invalidate iterators to existing elements, and that's only allowed on certain function calls.
I looked online and there are only APIs and tutorials how to use STL library. Is there any good references that I can have an idea of the implementation or complexity of STL library?
You can read the C++ specification which will tell you exactly what's allowed and what isn't in terms of implementation. You can also go look at your actual implementation.
Vector will copy (move in C++11) the elements to the beginning, that's why you should use deque if you would like to insert and erase from the beginning of a collection. If you want to truly resize the vector's internal buffer you can do this:
vector<Type>(v).swap(v);
This will hopefully make a temporary vector with the correct size, then swaps it's internal buffer with the old one, then the temporary one goes out of scope and the large buffer gets deallocated with it.
As others noted, you may use vector::shrink_to_fit() in C++11.
That's one of my (many) objection to C++. Everybody says "use the standard libraries" ... but even when you have the STL source (which is freely available from many different places. Including, in this case, the header file itself!) ... it's basically an incomprehensible nightmare to dig in to and try to understand.
The (C-only) Linux kernel is a paragon of simplicity and clarity in contrast.
But we digress :)
Here's the 10,000-foot answer to your question:
http://www.cplusplus.com/reference/stl/vector/
Vector containers are implemented as dynamic arrays; Just as regular
arrays, vector containers have their elements stored in contiguous
storage locations, which means that their elements can be accessed not
only using iterators but also using offsets on regular pointers to
elements.
But unlike regular arrays, storage in vectors is handled
automatically, allowing it to be expanded and contracted as needed.
Vectors are good at:
Accessing individual elements by their position index (constant time).
Iterating over the elements in any order (linear time).
Add and remove elements from its end (constant amortized time).
Compared to arrays, they provide almost the same performance for these
tasks, plus they have the ability to be easily resized. Although, they
usually consume more memory than arrays when their capacity is handled
automatically (this is in order to accommodate extra storage space for
future growth).
Compared to the other base standard sequence containers (deques and
lists), vectors are generally the most efficient in time for accessing
elements and to add or remove elements from the end of the sequence.
For operations that involve inserting or removing elements at
positions other than the end, they perform worse than deques and
lists, and have less consistent iterators and references than lists.
...
Reallocations may be a costly operation in terms of performance, since
they generally involve the entire storage space used by the vector to
be copied to a new location. You can use member function
vector::reserve to indicate beforehand a capacity for the vector. This
can help optimize storage space and reduce the number of reallocations
when many enlargements are planned.
...
I only need a vector of size 10 then does it automatically reduce the size?
No it doesn't automatically shrink.
Traditionally you swap the vector with a new empty one: reduce the capacity of an stl vector
But C++x11 includes a std::vector::shrink_to_fit() which it does it directly

why not implement c++ std::vector::pop_front() by shifting the pointer to vector[0]?

Why can't pop_front() be implemented for C++ vectors by simply shifting the pointer contained in the vector's name one spot over? So in a vector containing an array foo, foo is a pointer to foo[0], so pop_front() would make the pointer foo = foo[1] and the bracket operator would just do the normal pointer math. Is this something to do with how C++ keeps track of the memory you're using for what when it allocates space for an array?
This is similar to other questions I've seen about why std::vector doesn't have a pop_front() function, I will admit, but i haven't anyone asking about why you can't shift the pointer.
The vector wouldn't be able to free its memory if it did this.
Generally, you want the overhead per vector object to be small. That means you only store three items: the pointer to the first element, the capacity, and the length.
In order to implement what you suggest, every vector ever (all of them) would need an additional member variable: the offset from the start pointer at which the zeroth element resides. Otherwise, the memory could not be freed, since the original handle to it would have been lost.
It's a tradeoff, but generally the memory consumption of an object which may have millions of instances is more valuable than the corner case of doing the absolute worst thing you can do performance-wise to the vector.
Because implementers want to optimize the size of a vector. They usually use 3 pointers, one for the beginning, one for the capacity (the allocated end) and one for the end.
Doing what you require adds another 4 bytes to every vector (and there are a lot of those in a c++ program) for very little benefit: the contract of vector is to be fast when pushing back new elements, removing and inserting are "unsual" operations and their performance matter less than the size of the class.
I started typing out an elaborate answer explaining how the memory is allocated and freed but after typing it all out I realized that memory issues alone don't justify why pop_front isn't there as other answers here suggested.
Having pop_front in a vector where the extra cost is another pointer is justifiable in most circumstances. The problem, in my opinion, is push_front. If the container has pop_front then it should also have push_front otherwise the container is not being consistent. push_front is definitely expensive for a vector container (unless you match your pushes with your pops which is not a good design). Without push_front the vector is really wasting memory if one does lots of pop_front operations with no push_front functionality.
Now the need for pop_front and push_front is there for a container that is similar to a vector (constant time random access) which is why there is deque.
You could, but it would complicate the implementation a bit, and add a pointer of overhead to the type's size (so it could track the actual allocation's address). Is that worth it? Sometimes. First consider other structures which may handle your usage better (maybe deque?).
You could do that, but vector is designed to be a simple container with constant time index lookups and push/pop from the end. Doing what you suggest would complicate the implementation as it would have to track the allocated beginning and the "current" beginning. Not to mention that you still couldn't guarantee constant time insertion at the front but you might get it sometimes.
If you need a container with constant time front and back insertion and removal, that's precisely what deque is for, there's no need to modify vector to handle it.
You can use std::deque instead of std::vector. It's a double-ended-queue with also the vector-like access members. It implements both front and back push/pop.
http://www.cplusplus.com/reference/stl/deque/
Another shortcoming of your suggestion is that you'll waste memory spaces as you can't make use of those on the left of the array after shifting. The more you execute pop_front(), the more you'll waste until the vector is destructed.

array vs vector vs list

I am maintaining a fixed-length table of 10 entries. Each item is a structure of like 4 fields. There will be insert, update and delete operations, specified by numeric position. I am wondering which is the best data structure to use to maintain this table of information:
array - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item using [] is faster.
stl vector - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item is slower than an array since it is a call to operator[] and a linked list .
stl list - insert and delete takes linear time since you need to iterate to a specific position before applying the insert/delete; additional space is needed for pointers; accessing an item is slower than an array since it is a linked list linear traversal.
Right now, my choice is to use an array. Is it justifiable? Or did I miss something?
Which is faster: traversing a list, then inserting a node or shifting items in an array to produce an empty position then inserting the item in that position?
What is the best way to measure this performance? Can I just display the timestamp before and after the operations?
Use STL vector. It provides an equally rich interface as list and removes the pain of managing memory that arrays require.
You will have to try very hard to expose the performance cost of operator[] - it usually gets inlined.
I do not have any number to give you, but I remember reading performance analysis that described how vector<int> was faster than list<int> even for inserts and deletes (under a certain size of course). The truth of the matter is that these processors we use are very fast - and if your vector fits in L2 cache, then it's going to go really really fast. Lists on the other hand have to manage heap objects that will kill your L2.
Premature optimization is the root of all evil.
Based on your post, I'd say there's no reason to make your choice of data structure here a performance based one. Pick whatever is most convenient and return to change it if and only if performance testing demonstrates it's a problem.
It is really worth investing some time in understanding the fundamental differences between lists and vectors.
The most significant difference between the two is the way they store elements and keep track of them.
- Lists -
List contains elements which have the address of a previous and next element stored in them. This means that you can INSERT or DELETE an element anywhere in the list with constant speed O(1) regardless of the list size. You also splice (insert another list) into the existing list anywhere with constant speed as well. The reason is that list only needs to change two pointers (the previous and next) for the element we are inserting into the list.
Lists are not good if you need random access. So if one plans to access nth element in the list - one has to traverse the list one by one - O(n) speed
- Vectors -
Vector contains elements in sequence, just like an array. This is very convenient for random access. Accessing the "nth" element in a vector is a simple pointer calculation (O(1) speed). Adding elements to a vector is, however, different. If one wants to add an element in the middle of a vector - all the elements that come after that element will have to be re allocated down to make room for the new entry. The speed will depend on the vector size and on the position of the new element. The worst case scenario is inserting an element at position 2 in a vector, the best one is appending a new element. Therefore - insert works with speed O(n), where "n" is the number of elements that need to be moved - not necessarily the size of a vector.
There are other differences that involve memory requirements etc., but understanding these basic principles of how lists and vectors actually work is really worth spending some time on.
As always ... "Premature optimization is the root of all evil" so first consider what is more convenient and make things work exactly the way you want them, then optimize. For 10 entries that you mention - it really does not matter what you use - you will never be able to see any kind of performance difference whatever method you use.
Prefer an std::vector over and array. Some advantages of vector are:
They allocate memory from the free space when increasing in size.
They are NOT a pointer in disguise.
They can increase/decrease in size run-time.
They can do range checking using at().
A vector knows its size, so you don't have to count elements.
The most compelling reason to use a vector is that it frees you from explicit memory management, and it does not leak memory. A vector keeps track of the memory it uses to store its elements. When a vector needs more memory for elements, it allocates more; when a vector goes out of scope, it frees that memory. Therefore, the user need not be concerned with the allocation and deallocation of memory for vector elements.
You're making assumptions you shouldn't be making, such as "accessing an item is slower than an array since it is a call to operator[]." I can understand the logic behind it, but you nor I can know until we profile it.
If you do, you'll find there is no overhead at all, when optimizations are turned on. The compiler inlines the function calls. There is a difference in memory performance. An array is statically allocated, while a vector dynamically allocates. A list allocates per node, which can throttle cache if you're not careful.
Some solutions are to have the vector allocate from the stack, and have a pool allocator for a list, so that the nodes can fit into cache.
So rather than worry about unsupported claims, you should worry about making your design as clean as possible. So, which makes more sense? An array, vector, or list? I don't know what you're trying to do so I can't answer you.
The "default" container tends to be a vector. Sometimes an array is perfectly acceptable too.
First a couple of notes:
A good rule of thumb about selecting data structures: Generally, if you examined all the possibilities and determined that an array is your best choice, start over. You did something very wrong.
STL lists don't support operator[], and if they did the reason that it would be slower than indexing an array has nothing to do with the overhead of a function call.
Those things being said, vector is the clear winner here. The call to operator[] is essentially negligible since the contents of a vector are guaranteed to be contiguous in memory. It supports insert() and erase() operations which you would essntially have to write yourself if you used an array. Basically it boils down to the fact that a vector is essentially an upgraded array which already supports all the operations you need.
I am maintaining a fixed-length table of 10 entries. Each item is a
structure of like 4 fields. There will be insert, update and delete
operations, specified by numeric position. I am wondering which is the
best data structure to use to maintain this table of information:
Based on this description it seems like list might be the better choice since its O(1) when inserting and deleting in the middle of the data structure. Unfortunately you cannot use numeric positions when using lists to do inserts and deletes like you can for arrays/vectors. This dilemma leads to a slew of questions which can be used to make an initial decision of which structure may be best to use. This structure can later be changed if testing clearly shows its the wrong choice.
The questions you need to ask are three fold. The first is how often are you planning on doing deletes/inserts in the middle relative to random reads. The second is how important is using a numeric position compared to an iterator. Finally, is order in your structure important.
If the answer to the first question is random reads will be more prevalent than a vector/array will probably work well. Note iterating through a data structure is not considered a random read even if the operator[] notation is used. For the second question, if you absolutely require numeric position than a vector/array will be required even though this may lead to a performance hit. Later testing may show this performance hit is negligible relative to the easier coding with numerical positions. Finally if order is unimportant you can insert and delete in a vector/array with an O(1) algorithm. A sample algorithm is shown below.
template <class T>
void erase(vector<T> & vect, int index) //note: vector cannot be const since you are changing vector
{
vect[index]= vect.back();//move the item in the back to the index
vect.pop_back(); //delete the item in the back
}
template <class T>
void insert(vector<T> & vect, int index, T value) //note: vector cannot be const since you are changing vector
{
vect.push_back(vect[index]);//insert the item at index to the back of the vector
vect[index] = value; //replace the item at index with value
}
I Believe it's as per your need if one needs more insert/to delete in starting or middle use list(doubly-linked internally) if one needs to access data randomly and addition to last element use array ( vector have dynamic allocation but if you require more operation as a sort, resize, etc use vector)