why Vector doesn't provide the remove() member function while list provides? - c++

If I want to delete all the elements with a value from vector,I call remove algorithm and then call vector's erase member function to physically delete it.
But in the case of list , simple call remove member function and it will delete all elements with that value.
I am not sure why vector does't provide the remove MF while list does it.
For Exp: I want to delete value '4' from vector v.
vector<int> v;
vector<int> ::iterator Itr;
for (int i=0; i< 6; i++)
v.push_back(i*2);
v.push_back(4);
v.push_back(8);
v.push_back(4);
v.erase(remove(v.begin(),v.end(),4), v.end());
and for list:
list.remove(4); // will delete all the element which has value 4

The question is not why std::vector does not offer the operation, but rather why does std::list offer it. The design of the STL is focused on the separation of the containers and the algorithms by means of iterators, and in all cases where an algorithm can be implemented efficiently in terms of iterators, that is the option.
There are, however, cases where there are specific operations that can be implemented much more efficiently with knowledge of the container. That is the case of removing elements from a container. The cost of using the remove-erase idiom is linear in the size of the container (which cannot be reduced much), but that hides the fact that in the worst case all but one of those operations are copies of the objects (the only element that matches is the first), and those copies can represent quite a big hidden cost.
By implementing the operation as a method in std::list the complexity of the operation will still be linear, but the associated cost for each one of the elements removed is very low, a couple of pointer copies and releasing of a node in memory. At the same time, the implementation as part of the list can offer stronger guarantees: pointers, references and iterators to elements that are not erased do not become invalidated in the operation.
Another example of an algorithm that is implemented in the specific container is std::list::sort, that uses mergesort that is less efficient than std::sort but does not require random-access iterators.
So basically, algorithms are implemented as free functions with iterators unless there is a strong reason to provide a particular implementation in a concrete container.

I believe the rationale is that std::list offers a very efficient remove method (if implemented as a doubly linked listed it just adjusts the pointers to the element and deallocates its storage), while element removal for std::vector is comparably slow.
The remove+erase trick is the standard way which works for all container types that offer the required iterator type. Presumably, the designers of the STL wanted to point out this difference. They could have opted to give all containers a remove method, which would be remove+erase for all containers except those who knew a better way, but they didn't.
This seems to violate the idea of generic code at a first glance, but I don't think it really does. Having a simple, generic remove that is easily accessible makes it easy to write generic code that compiles fine with all container types, but at the end generic code that would run extremely slow in 9 out of 10 cases and blazingly fast in the tenth is not truly generic, it just looks so.
The same pattern can be found at std::map::find vs. the generic std::find.

Removing an element from a vector is much slower than doing so for a list: it is (on average) proportional to the size of the vector, whereas the operation on a list is executed in constant time.
The designers of the standard library decided not to include this feature under the principle of "things that look easy should BE (computationally) easy".
I'm not sure whether I agree with this philosophy, but it's considered a design goal for C++.

Because dropping item from a list is cheap, while doing so on a vector is expensive - all following elements have to be shifted, i.e. copied/moved.

I imagine its due to efficiency, its slower to remove random elements from a vector than it is from a list. It might mean a little more typing for situations like this, but at least its obvious in the interface that the std::vector isn't the best data structure if you need to do this often.

Related

std::vector::insert vs std::list::operator[]

I know that std::list::operator[] is not implemented as it has bad performance. But what about std::vector::insert it is as much inefficient as much std::list::operator[] is. What is the explanation behind?
std::vector::insert is implemented because std::vector has to meet requirements of SequenceContainer concept, while operator[] is not required by any concepts (that I know of), possible that will be added in ContiguousContainer concept in c++17. So operator[] added to containers that can be used like arrays, while insert is required by interface specification, so containers that meet certain concept can be used in generic algorithms.
I think Slava's answer does the best job, but I'd like to offer a supplementary explanation. Generally with data structures, it's far more common to have more accesses than insertions, than vice versa. There are many data structures that may try to optimize access at the cost of insertion, but the inverse is much more rare, because it tends to be less useful in real life.
operator[], if implemented for a linked list, would access an existing value presumably (similar to how it does for vector). insert adds new values. It's much more likely you will be willing to take a big performance hit to insert some new elements into a vector, provided that the subsequent accesses are extremely fast. In many cases, element insertion may be outside of the critical path entirely whereas the critical path consists of a single traversal, or random access of already-present data. So it's simply convenient to have insert take care of the details for you in that case (it's actually a bit annoying to write efficiently and correctly). This is actually a not-uncommon use of a vector.
On the other hand, using operator[] on a linked list would almost always be a sign that you are using the wrong data structure.
std::list::operator[] would require an O(N) traversal and is not really in accordance with what a list is designed to do. If you need operator[] then use a different container type. When C++ folk see a [] they assume an O(1) (or, at worse, an O(Log N)) operation. Supplying [] for a list would break that.
But although std::vector::insert is also O(N), it can be optimised: an at-end insertion can be readily optimised by having the vector's capacity grow in large chunks. An insertion in the middle requires an element-by-element move, but that again can be performed very quickly on modern chipsets.
The [] operator is inherited from plain arrays. It has always been understood as a fast (sub linear time) accessor of the underlying container. Since list is does not support sub linear time access, it does not make sense for it to implement the operator.
auto a = Container [10]; // Ideally I can assume this is quick
The equivalent of your std::list <>::operator [] is std::next <std::list<>::iterator>. Documented at cpp-reference.
auto a = *std::next (Container.begin (), 10); // This may take a little while
This is the truly generic way index a container. If the container supports random access, it will be constant time, other wise it will be linear.

vector vs. list from stl - remove method

std::list has a remove method, while the std::vector doesn't. What is the reason for that?
std::list<>::remove is a physical removal method, which can be implemented by physically destroying list elements that satisfy certain criteria (by physical destruction I mean the end of element's storage duration). Physical removal is only applicable to lists. It cannot be applied to arrays, like std::vector<>. It simply is not possible to physically end storage duration of an individual element of an array. Arrays can only be created and destroyed as a whole. This is why std::vector<> does not have anything similar to std::list<>::remove.
The universal removal method applicable to all modifiable sequences is what one might call logical removal: the target elements are "removed" from the sequence by overwriting their values with values of elements located further down in the sequence. I.e. the sequence is shifted and compacted by copying the persistent data "to the left". Such logical removal is implemented by freestanding functions, like std::remove. Such functions are applicable in equal degree to both std::vector<> and std::list<>.
In cases where the approach based on immediate physical removal of specific elements applies, it will work more efficiently than the generic approach I referred above as logical removal. That is why it was worth providing it specifically for std::list<>.
std::list::remove removes all items in a list that match the provided value.
std::list<int> myList;
// fill it with numbers
myList.remove(10); // physically removes all instances of 10 from the list
It has a similar function, std::list::remove_if, which allows you to specify some other predicate.
std::list::remove (which physically removes the elements) is required to be a member function as it needs to know about the memory structure (that is, it must update the previous and next pointers for each item that needs to be updated, and remove the items), and the entire function is done in linear time (a single iteration of the list can remove all of the requested elements without invalidating any of the iterators pointing to items that remain).
You cannot physically remove a single element from a std::vector. You either reallocate the entire vector, or you move every element after the removed items and adjust the size member. The "cleanest" implementation of that set of operations would be to do
// within some instance of vector
void vector::remove(const T& t)
{
erase(std::remove(t), end());
}
Which would require std::vector to depend on <algorithm> (something that is currently not required).
As the "sorting" is needed to remove the items without multiple allocations and copies being required. (You do not need to sort a list to physically remove elements).
Contrary to what others are saying, it has nothing to do with speed. It has to do with the algorithm needing to know how the data is stored in memory.
As a side note: This is also a similar reason why std::remove (and friends) do not actually remove the items from the container they operate on; they just move all the ones that are not going to be removed to the "front" of the container. Without the knowledge of how to actually remove an object from a container, the generic algorithm cannot actually do the removing.
Consider the implementation details of both containers. A vector has to provide a continuous memory block for storage. In order to remove an element at index n != N (with N being the vector's length), all elements from n+1 to N-1 need to be moved. The various functions in the <algorithm> header implement that behavior, like std::remove or std::remove_if. The advantage of these being free-standing functions is that they can work for any type that offers the needed iterators.
A list on the other hand, is implemented as a linked list structure, so:
It's fast to remove an element from anywhere
It's impossible to do it as efficiently using iterators (since the internal structure has to be known and manipulated).
In general in STL the logic is "if it can be done efficiently - then it's a class member. If it's inefficient - then it's an outside function"
This way they make the distinction between "correct" (i.e. "efficient") use of classes vs. "incorrect" (inefficient) use.
For example, random access iterators have a += operator, while other iterators use the std::advance function.
And in this case - removing elements from an std::list is very efficient as you don't need to move the remaining values like you do in std::vector
It's all about efficiency AND reference/pointer/iterator validity. list items can be removed without disturbing any other pointers and iterators. This is not true for a vector and other containers in all but the most trivial cases. Nothing prevents use the external strategy, but you have a superior options.. That said this fellow said it better than I could on a duplicate question
From another poster on a duplicate question:
The question is not why std::vector does not offer the operation, but
rather why does std::list offer it. The design of the STL is focused
on the separation of the containers and the algorithms by means of
iterators, and in all cases where an algorithm can be implemented
efficiently in terms of iterators, that is the option.
There are, however, cases where there are specific operations that can
be implemented much more efficiently with knowledge of the container.
That is the case of removing elements from a container. The cost of
using the remove-erase idiom is linear in the size of the container
(which cannot be reduced much), but that hides the fact that in the
worst case all but one of those operations are copies of the objects
(the only element that matches is the first), and those copies can
represent quite a big hidden cost.
By implementing the operation as a method in std::list the complexity
of the operation will still be linear, but the associated cost for
each one of the elements removed is very low, a couple of pointer
copies and releasing of a node in memory. At the same time, the
implementation as part of the list can offer stronger guarantees:
pointers, references and iterators to elements that are not erased do
not become invalidated in the operation.
Another example of an algorithm that is implemented in the specific
container is std::list::sort, that uses mergesort that is less
efficient than std::sort but does not require random-access iterators.
So basically, algorithms are implemented as free functions with
iterators unless there is a strong reason to provide a particular
implementation in a concrete container.
std::list is designed to work like a linked list. That is, it is designed (you might say optimized) for constant time insertion and removal ... but access is relatively slow (as it typically requires traversing the list).
std::vector is designed for constant-time access, like an array. So it is optimized for random access ... but insertion and removal are really only supposed to be done at the "tail" or "end", elsewhere they're typically going to be much slower.
Different data structures with different purposes ... therefore different operations.
To remove an element from a container you have to find it first. There's a big difference between sorted and unsorted vectors, so in general, it's not possible to implement an efficient remove method for the vector.

Is there ever a reason to use std::list? [duplicate]

This question already has answers here:
Under what circumstances are linked lists useful?
(17 answers)
Closed 9 years ago.
After having read this question and looking at some results here, it seems like one should altogether completely avoid lists in C++. I always expected that linked lists would be the containers of choice for cases where I only need to iterate over all the contents because insertion is a matter of pointer manipulation and there is never a need to reallocate.
Apparently, because of "cache locality," lists are iterated over very slowly, so any benefit from having to use less reserve memory or faster addition (which is not that much faster, it seems, from the second link) doesn't seem worth it.
Having said that, when should I, from a performance standpoint, use std::list over std::deque or, if possible, std::vector?
On a side note, will std::forward_list also have lots of cache misses?
std::list is useful in a few corner cases.
However, the general rule of C++ sequential containers is "if your algorithm are compatible, use std::vector. If your algorithms are not compatible, modify your algorithms so you can use std::vector."
Exceptions exist, and here is an attempt to exhaustively list reasons that std::list is a better choice:
When you need to (A) insert into the middle of the container and (B) you need each objects location in memory to be stable. Requirement (B) can usually be removed by having a non-stable container consisting of pointers to elements, so this isn't a strong reason to use std::list.
When you need to (A) insert or delete in the middle of the container (B) orders of magnitude more often than you need to iterate over the container. This is also an extreme corner case: in order to find the element to delete from a list, you usually need to iterate!
Which leads to
you need (A) insert or delete in the middle of the container and (B) have iterators to all other elements remain valid. This ends up being a hidden requirement of case 1 and case 2: it is hard to delete or insert more often than you iterate when you don't have persistant iterators, and the stability of iterators and of objects is highly related.
the final case, is the case of splicing was once a reason to use std::list.
Back in C++03, all versions of std::list::splice could (in theory) be done in O(1) time. However, the extremely efficient forms of splice required that size be an O(n) operation. C++11 has required that size on a list be O(1), so splice's extreme efficiency is limited to the "splicing an entire other list" and "self-splicing a sublist" case. In the case of single element splice, this is just an insert and delete. In the case of sub range splice, the code now has to visit every node in the splice just to count them in order to maintain size as O(1) (except in self-splicing).
So, if you are doing only whole-list splices, or self-list subrange splices, list can do these operations much faster than other non-list containers.
When should I, from a performance standpoint, use std::list
From a performance standpoint, rarely. The only situation that comes to mind is if you have many lists that you need to split and join to form other lists; a linked list can do this without allocating memory or moving objects.
The real benefit of list is stability: elements don't need to be movable, and iterators and references are never invalidated unless they refer to an element that's been erased.
On a side note, will std::forward_list also have lots of cache misses?
Yes; it's still a linked list.
The biggest improvement that std::list can provide is when you're moving one or more elements from the middle of one list into another list. This splice operation is extremely efficient on list while it may involve allocation and movement of items in random access containers such as vector. That said, this comes of very rarely and much of the time vector is your best container in terms of performance an simplicity.
Always profile your code if you suspect performance problems with a container choice.
You chose std::list over std::deque and std::vector (and other contiguous memory containers) when you frequently add/remove items in the middle of your container, see also.

Iterating through a sequence while modifying it. Use vector or List ? C++/ STL

Suppose I have a long sequence of unordered elements S s1, s2, s3,.... of a arbitrary but fixed data type through which I wish to iterate and delete certain elements according to some boolean criterion.
Now if after iterating through the sequence if I am not interested in the final ordering of the sequence then I can store my sequence in 2 ways
Use a plain ol' std::list to represent the sequence. Perform removal with the std::list methods.
Use a std::vector to represent the sequence. If a certain element fails the criterion and has to be deleted swap it with the last vector element and perform a pop_back.
My questions are
1.Which would be a better/efficient way timewise and/or memorywise to store my sequence?
2.If I had to venture a guess, then I would say list, because if si 's data-type memory size is large, swapping would be expensive. Would this reasoning be correct?
In practice, std::vector has a great performance advantage over other containers due to its tight memory locality. If your elements are moreover movable (i.e. inexpensive to swap), then your second option should be your first try. Implement it with the standard remove/erase idiom:
v.erase(std::remove_if(v.begin(), v.end(), predicate), v.end());
You should also set up a second version with a std::list and compare the performance:
l.remove_if(predicate);
The list avoids moving any elements around, so in theory it could be efficient, however the practical effects of memory locality cannot be captured by the language standard and you cannot get around measuring and comparing the actual performance.
(Supposedly, if your element type is huge, like sizeof(T) > 10000, the list will probably start being faster than the vector. Test and compare, and keep your code modular such that changing this later is easy.)
If you have a C++11 compiler, or atleast an rvalue reference aware one, using swap will cost you nothing if your data type isn't flat (i.e., contains pointers to external resources or in other words, is expensive to copy) since it will just move your structs around. So if you have such a compiler, create a move constructor (read up on that) for the data type, and you're set. Just use a std::vector from there on.
Now if your structs are flat (no external resources), and are large, you might really want to use a std::list, since the memory overhead would be reasonably small in comparision to your data type's size. Since you only seem to be interested in bidirectional/sequential access to the elements, this might be just the right place to use a list.
A last point, and an important factor, measure. The default container to reach for should always be std::vector. Measure how both perform before blindly deciding on one. Another important factor is if you actually need to do anything else with the containers, like random-access or such stuff.
Edit: Before I forget, you might also just want to create a view over the container holding your data, which might be very cheap.
We can only guess. I'd say that if the objects are easily copiable (e.g. basic types) then std::vector will be more efficient, as removing elements will not alloc/realloc/free any memory. But if the cost of copying elements is significant, then the std::list will be better.
But note that with C++11, the copy will be converted into a move, so you should consider the moving cost, that will be presumably quite less than the copy.
In almost all practical cases, use std::vector initially. As always, write your code first then optimise later, if and when it is needed. If your profiler indicates that vector's inefficiencies are the cause, then try a list. I've almost never seen a performance benefit from it though.

Use of iterators over array indices

I just wanted to know what is the main advantage of using the iterators over the array indices. I have googled but i am not getting the right answer.
I presume you are talking about when using a vector, right?
The main advantage is that iterator code works for all stl containers, while the array indexing operator [] is only available for vectors and deques. This means you are free to change the underlying container if you need to without having to recode every loop. It also means you can put your iteration code in a template and it will work for any container, not just for deques and vectors (and arrays of course).
All of the standard containers provide the iterator concept. An iterator knows how to find the next element in the container, especially when the underlying structure isn't array-like. Array-style operator[] isn't provided by every container, so getting in the habit of using iterators will make more consistent-looking code, regardless of the container you choose.
You can abstract the collection implementation away.
To expand upon previous answers:
Writing a loop with operator[] constrains you to a container that supports [] and uses the same index/size type. Otherwise you'd need to rewrite every loop to change the container.
Even if your container supports [], it may not be optimal for sequential traversing. [] is basically a random-access operator, which is O(1) for vector but could be as bad as O(n) depending on the underlying container.
This is a minor point, but if you use iterators, your loop could be more easily moved to using the standard algorithms, e.g. std::for_each.
There are many data structures, e.g. hash tables and linked lists cannot be naturally or quickly indexed, but they are indeed traversable. Iterators act as an interface that let you walk on any data structure without knowing the actual implementation of the source.
The STL contains algorithms, such as transform and for_each that operate on containers. They don't accept indices, but use iterators.
Iterators help hide the container implementation and allow the programmer to focus more on the algorithm. The for_each function can be applied to anything that supports a forward iterator.
As well as the points in other answers, iterators can also be faster (specifically compared to operator[]), since they are essentially iteration by pointer. If you do something like:
for (int i = 0; i < 10; ++i)
{
my_vector[i].DoSomething();
}
Every iteration of the loop unnecessarily calculates my_vector.begin() + i. If you use iterators, incrementing the iterator means it's already pointing to the next element, so you don't need that extra calculation. It's a small thing, but can make a difference in tight loops.
One other slight difference is that you can't use erase() on an element in a vector by index, you must have an iterator. No big deal since you can always do "vect.begin() + index" as your iterator, but there are other considerations. For example, if you do this then you must always check your index against size() and not some variable you assigned that value.
None of that is really too much worth worrying about but if given the choice I prefer iterator access for the reasons already stated and this one.
I would say it's more a matter of consistency and code reuse.
Consistency in that you will use all other containers with iterators
Code reuse in that algorithms written for iterators cannot be used with the subscript operator and vice versa... and the STL has lots of algorithms so you definitely want to build on it.
Finally, I'd like to say that even C arrays have iterators.
const Foo* someArray = //...
const Foo* otherArray = new Foo[someArrayLength];
std::copy(someArray, someArray + someArrayLength, otherArray);
The iterator_traits class has been specialized so that pointers or a model of RandomAccessIterator.