Why do we have 2 ways like above to search for an element in the set?
Also find algorithm can be used to find an element in a list or a vector but what would be the harm in these providing a member function as well as member functions are expected to be faster than a generic algorithm?
Why do we need remove algorithm and create all the drama about erase remove where remove will just shift the elements and then use erase to delete the actual element..Just like STL list provides a member function remove why cant the other containers just offer a remove function and be done with it?
Binary_search in STL set over set's member function find?
Why do we have 2 ways like above to search for an element in the set?
Binary search returns a bool and set::find() and iterator. In order to compare apples to apples, the algorithm to compare set::find() with is std::lower_bound() which also returns an iterator.
You can apply std::lower_bound() on an arbitrary sorted range specified by a pair of (forward / bidirectional / random access) iterators and not only on a std::set. So having std::lower_bound() is justified. As std::set happens to be a sorted range, you can call
std::lower_bound(mySet.begin(), mySet.end(), value);
but the
mySet.find(value);
call is not only more concise, it is also more efficient. If you look into the implementation of std::lower_bound() you will find something like std::advance(__middle, __half); which has different complexity depending on the iterator (whether forward / bidirectional / random access iterator). In case of std::set, the iterators are bidirectional and advancing them has linear complexity, ouch! In contrast, std::set::find() is guaranteed to perform the search in logarithmic time complexity. The underlying implementation (which is a red and black tree in case of libstdc++) makes it possible. Offering a set::find() is also justified as it is more efficient than calling std::lower_bound() on std::set.
Also find algorithm can be used to find an element in a list or a
vector but what would be the harm in these providing a member function
as well as member functions are expected to be faster than a generic
algorithm?
I don't see how you could provide a faster member function for list or vector, unless the container is sorted (or possesses some special property).
Why do we need remove algorithm and create all the drama about erase
remove where remove will just shift the elements and then use erase to
delete the actual element..Just like STL list provides a member
function remove why cant the other containers just offer a remove
function and be done with it?
I can think of two reasons.
Yes, the STL is seriously lacking many convenience functions. I often feel like I live in a begin-end hell when using algorithms on an entire container; I often proved my own wrappers that accept a container, something like:
template <typename T>
bool contains(const std::vector<T>& v, const T& elem) {
return std::find(v.begin(), v.end(), elem) != v.end();
}
so that I can write
if (contains(myVector, 42)) {
instead of
if (std::find(myVector.begin(), myVector.end(), 42) != myVector.end()) {
Unfortunately, you quite often have to roll your own or use boost. Why? Because standardization is painful and slow so the standardization committee focuses on more important things. The people on the committee often donate their free time and are not paid for their work.
Now deleting elements from a vector can be tricky: Do you care about the order of your elements? Are your elements PODs? What are your exception safety requirements?
Let's assume you don't care about the order of your elements and you want to delete the i-th element:
std::swap(myVector[i], myVector.back());
myVector.pop_back();
or even simpler:
myVector[i] = myVector.back(); // but if operator= throws during copying you might be in trouble
myVector.pop_back();
In C++11 with move semantics:
myVector[i] = std::move(myVector.back());
myVector.pop_back();
Note that these are O(1) operations instead of O(N). These are examples of the efficiency and exception safety considerations that the standard committee leaves up to you. Providing a member function and "one size fits all" is not the C++ way.
Having said all these, I repeat I wish we had more convenience functions; I understand your problem.
I'll answer part of your question. The Erase-Remove idiom is from the book “Effective STL” written by Scott Meye. As to why remove() doesn't actually delete elements from the container, there is a good answer here, I just copy part of the answer:
The key is to realize that remove() is designed to work on not just a
container but on any arbitrary forward iterator pair: that means it
can't actually delete the elements, because an arbitrary iterator pair
doesn't necessarily have the ability to delete elements.
Why STL list provides a member function remove and why can't the other containers just offer a remove function and be done with it? I think it's because the idiom is more efficient than other methods to remove specific values from the contiguous-memory containers.
Related
I came across this answer to the question of removing elements by value in C++:
C++ Erase vector element by value rather than by position?
Basically:
vec.erase(std::remove(vec.begin(), vec.end(), valueToRemove), vec.end());
The answer makes sense, but isn't this bad style? The logic is consists of a double negative... is there a cleaner way to do this?
Deleting an element from a collection consists of two steps:
Moving down all subsequent elements to fill in the holes created by matches
Marking the new end
With the C++ standard library, these are two separate functions, remove and erase, respectively.
One could certainly imagine an erase_if type of function which would be easier to use, but evidently the current code is considered good enough. Of course you can write your own remove_if.
This isn't bad and in fact an efficient way of removing elements from a vector based on a condition in linear time. Watch this video from 35th minute. STL explanation for the Erase and Remove Idiom
Remember that there are different types of containers: Contiguous vs node-based, and sequential vs associative.
Node-based containers allow efficient erase/insert. Sequential containers organize elements by insertion order (i.e. position), while associative containers arrange them by (key) value.
All current associative containers (map/set/unordered) are node-based, and with them you can erase elements directly, and you should use the element-wise member erase function directly. Lists are node-based sequence containers, so you can erase individual elements efficiently, but finding an element by value takes linear time, which is why lists offer a member remove function. Only sequence containers (vector and deque) have no easy way to erase elements by value, and that's where the free remove algorithm comes in, which first rearranges the sequence to then allow the container's member erase to perform an efficient erasure at the end of the container.
Unlike the many generic aspects of the standard library which work without any knowledge of the underlying container, the copy/erase idiom is one of those things which require a bit of detail knowledge about the differences between the containers.
If I want to delete all the elements with a value from vector,I call remove algorithm and then call vector's erase member function to physically delete it.
But in the case of list , simple call remove member function and it will delete all elements with that value.
I am not sure why vector does't provide the remove MF while list does it.
For Exp: I want to delete value '4' from vector v.
vector<int> v;
vector<int> ::iterator Itr;
for (int i=0; i< 6; i++)
v.push_back(i*2);
v.push_back(4);
v.push_back(8);
v.push_back(4);
v.erase(remove(v.begin(),v.end(),4), v.end());
and for list:
list.remove(4); // will delete all the element which has value 4
The question is not why std::vector does not offer the operation, but rather why does std::list offer it. The design of the STL is focused on the separation of the containers and the algorithms by means of iterators, and in all cases where an algorithm can be implemented efficiently in terms of iterators, that is the option.
There are, however, cases where there are specific operations that can be implemented much more efficiently with knowledge of the container. That is the case of removing elements from a container. The cost of using the remove-erase idiom is linear in the size of the container (which cannot be reduced much), but that hides the fact that in the worst case all but one of those operations are copies of the objects (the only element that matches is the first), and those copies can represent quite a big hidden cost.
By implementing the operation as a method in std::list the complexity of the operation will still be linear, but the associated cost for each one of the elements removed is very low, a couple of pointer copies and releasing of a node in memory. At the same time, the implementation as part of the list can offer stronger guarantees: pointers, references and iterators to elements that are not erased do not become invalidated in the operation.
Another example of an algorithm that is implemented in the specific container is std::list::sort, that uses mergesort that is less efficient than std::sort but does not require random-access iterators.
So basically, algorithms are implemented as free functions with iterators unless there is a strong reason to provide a particular implementation in a concrete container.
I believe the rationale is that std::list offers a very efficient remove method (if implemented as a doubly linked listed it just adjusts the pointers to the element and deallocates its storage), while element removal for std::vector is comparably slow.
The remove+erase trick is the standard way which works for all container types that offer the required iterator type. Presumably, the designers of the STL wanted to point out this difference. They could have opted to give all containers a remove method, which would be remove+erase for all containers except those who knew a better way, but they didn't.
This seems to violate the idea of generic code at a first glance, but I don't think it really does. Having a simple, generic remove that is easily accessible makes it easy to write generic code that compiles fine with all container types, but at the end generic code that would run extremely slow in 9 out of 10 cases and blazingly fast in the tenth is not truly generic, it just looks so.
The same pattern can be found at std::map::find vs. the generic std::find.
Removing an element from a vector is much slower than doing so for a list: it is (on average) proportional to the size of the vector, whereas the operation on a list is executed in constant time.
The designers of the standard library decided not to include this feature under the principle of "things that look easy should BE (computationally) easy".
I'm not sure whether I agree with this philosophy, but it's considered a design goal for C++.
Because dropping item from a list is cheap, while doing so on a vector is expensive - all following elements have to be shifted, i.e. copied/moved.
I imagine its due to efficiency, its slower to remove random elements from a vector than it is from a list. It might mean a little more typing for situations like this, but at least its obvious in the interface that the std::vector isn't the best data structure if you need to do this often.
There is a sort() method for lists in STL. Which is absurd, because I would be more inclined to sort an array/vector.
Why isn't sort() provided for vector? Is there some underlying philosophy behind the creation of the vector container or its usage, that sort is not provided for it?
As has already been said, the standard library provides a nonmember function template that can sort any range given a pair of random access iterators.
It would be entirely redundant to have a member function to sort a vector. The following would have the same meaning:
std::sort(v.begin(), v.end());
v.sort();
One of the first principles of the STL is that algorithms are not coupled to containers. How data is stored and how data is manipulated should be as loosely coupled as possible.
Iterators are used as the interface between containers (which store data) and algorithms (which operate on the data). In this way, you can write an algorithm once and it can operate on containers of various types, and if you write a new container, the existing generic algorithms can be used to manipulate its contents.
The reason that std::list provides its own sort function as a member function is that it is not a random accessible container; it only provides bidirectional iterators (since it is intended to represent a doubly linked list, this makes sense). The generic std::sort function requires random access iterators, so you cannot use it with a std::list. std::list provides its own sort function in order that it can be sorted.
In general, there are two cases in which a container should implement an algorithm:
If the generic algorithm cannot operate on the container, but there is a different, container-specific algorithm that can provide the same functionality, as is the case with std::list::sort.
If the container can provide a specific implementation of the algorithm that is more efficient than the generic algorithm, as is the case with std::map::find, which allows an element to be found in the map in logarithmic time (the generic std::find algorithm performs a linear search because it cannot assume the range is sorted).
There are already interesting elements of answer, but there is actually more to be said about the question: while the answer to "why doesn't std::vector has a sort member function?" is indeed "because the standard library provides member functions only when they offer more than generic algorithms", the real interesting question is "why does std::list have a sort member function?", and a few things haven't been explained yet: yes, std::sort only works with random-access iterators and std::list only provides bidirectional iterators, but even if std::sort worked with bidirectional iterators, std::list::sort would still offer more. And here is why:
First of all, std::list::sort is stable, while std::sort isn't. Of course there is still std::stable_sort, but it doesn't work with bidirectional iterators either.
std::list::sort generally implements a mergesort, but it knows that it is sorting a list and can relink nodes instead of copying things. A list-aware mergesort can sort the list in O(n log n) time with only O(log n) additional memory, while your typical mergesort (such as std::stable_sort) uses O(n) additional memory or has a O(n log² n) complexity.
std::list::sort doesn't invalidate the iterators. If an iterator was pointing to a specific object in the list, it will still be pointing to the same object after the sort, even if its position in the list isn't the same than before the sort.
Last but not least, std::list::sort doesn't move or swap the objects around since it only relinks nodes. That means that it might be more performant when you need to sort objects that are expensive to move/swap around, but also that it can sort a list of objects that aren't even moveable, which is totally impossible for std::sort!
Basically, even if std::sort and std::stable_sort worked with bidirectional or forward iterators (and it would totally be possible, we know sorting algorithms that work with them), they still couldn't offer everything std::list::sort has to offer, and they couldn't relink nodes either since the standard library algorithms aren't allowed to modify the container, only the pointed values (relinking nodes counts as modifying the container). On the other hand, a dedicated std::vector::sort method wouldn't offer anything interesting, so the standard library doesn't provide one.
Note that everything that has been said about std::list::sort is also true for std::forward_list::sort.
A vector-specific sort would provide no advantage over std::sort from <algorithm>. However, std::list provides its own sort because it can use the special knowledge of how list is implemented to sort items by manipulating the links instead of copying objects.
You can easily sort a vector with:
sort(v.begin(), v.end());
UPDATE: (answer to the comment): Well, they have certainly provided it by default. The difference is that it's not a member function for vector. std::sort is a generic algorithm that's supposed to work for anything that provides iterators. However, it really expects a random access iterator to sort efficiently. std::list, being a linked list, cannot provide random access to its elements efficiently. That's why it provides its own specialized sort algorithm.
std::sort() in <algorithm> does sorting on containers with random access iterators like std::vector.
There is also std::stable_sort().
edit - why does std::list have its own sort() function versus std::vector?
std::list is different from both std::vector and std::deque (both random access iterable) in how it's implemented, so it contains its own sort algorithm that is specialized for its implementation.
Answering the question of "Why?"
A sorting algorithm for std::vector is the same as sorting a native array and is the same (probably) as sorting a custom vector class.
The STL was designed to separate containers and algorithms, and to have an efficient mechanism for applying an algorithm to data that has the right characteristics.
This lets you write a container that might have specific characteristics, and to get the algorithms free. Only where there is some special characteristic of the data that means the standard algorithm is unsuitable is a custom implementation supplied, as in the case of std::list.
what's the alternative?
Should I write by myself?
There is the std::find() algorithm, which performs a linear search over an iterator range, e.g.,
std::vector<int> v;
// Finds the first element in the vector that has the value 42:
// If there is no such value, it == v.end()
std::vector<int>::const_iterator it = std::find(v.begin(), v.end(), 42);
If your vector is sorted, you can use std::binary_search() to test whether a value is present in the vector, and std::equal_range() to get begin and end iterators to the range of elements in the vector that have that value.
The reason there is no vector::find is because there is no algorithmic advantage over std::find (std::find is O(N) and in general, you can't do better for vectors).
But the reason you have map::find is because it can be more efficient (map::find is O(log N) so you would always want to use that over std::find for maps).
Who told you that? There's is "find" algorithm for vector in C++. Ironically Coincidentally, it is called std::find. Or maybe std::binary_search. Or something else, depending on the properties of the data stored in your vector.
Containers get their own specific versions of generic algorithms (implemented as container methods) only when the effective implementation of the algorithm is somehow tied to the internal details of the container. std::list<>::sort would be one example.
In all other cases, the algorithms are implemented by standalone functions.
Having a 'find' functionality in the container class violates 'SRP' (Single Responsibility Principle). A container's core functionality is to provide interfaces for storage, retrieval of elements in the container. 'Finding', 'Sorting', 'Iterating' etc are not core functionality of any container and hence not part of it's direct interface.
However as 'Herb' states in Namespace Principle, 'find' is a part of the interface by being defined in the same namespace as 'vector' namely 'std'.
Use std::find(vec.begin(), vec.end(), value).
And don't forget to include <algorithm>
what's the alternative?
The standard offers std::find, for sequential search over arbitrary sequences of like-elements (or something like that).
This can be applied to all containers supporting iterators, but for internally sorted containers (like std::map) the search can be optimized. In that case, the container offers it's own find member function.
why there is no find for vector in C++?
There was no point in creating a std::vector<???>::find as the implementation would be identical to std::find(vector.begin(), vector.end(), value_to_find);.
Should I write by myself?
No. Unless you have specific limitations or requirements, you should use the STL implementation whenever possible.
I just wanted to know what is the main advantage of using the iterators over the array indices. I have googled but i am not getting the right answer.
I presume you are talking about when using a vector, right?
The main advantage is that iterator code works for all stl containers, while the array indexing operator [] is only available for vectors and deques. This means you are free to change the underlying container if you need to without having to recode every loop. It also means you can put your iteration code in a template and it will work for any container, not just for deques and vectors (and arrays of course).
All of the standard containers provide the iterator concept. An iterator knows how to find the next element in the container, especially when the underlying structure isn't array-like. Array-style operator[] isn't provided by every container, so getting in the habit of using iterators will make more consistent-looking code, regardless of the container you choose.
You can abstract the collection implementation away.
To expand upon previous answers:
Writing a loop with operator[] constrains you to a container that supports [] and uses the same index/size type. Otherwise you'd need to rewrite every loop to change the container.
Even if your container supports [], it may not be optimal for sequential traversing. [] is basically a random-access operator, which is O(1) for vector but could be as bad as O(n) depending on the underlying container.
This is a minor point, but if you use iterators, your loop could be more easily moved to using the standard algorithms, e.g. std::for_each.
There are many data structures, e.g. hash tables and linked lists cannot be naturally or quickly indexed, but they are indeed traversable. Iterators act as an interface that let you walk on any data structure without knowing the actual implementation of the source.
The STL contains algorithms, such as transform and for_each that operate on containers. They don't accept indices, but use iterators.
Iterators help hide the container implementation and allow the programmer to focus more on the algorithm. The for_each function can be applied to anything that supports a forward iterator.
As well as the points in other answers, iterators can also be faster (specifically compared to operator[]), since they are essentially iteration by pointer. If you do something like:
for (int i = 0; i < 10; ++i)
{
my_vector[i].DoSomething();
}
Every iteration of the loop unnecessarily calculates my_vector.begin() + i. If you use iterators, incrementing the iterator means it's already pointing to the next element, so you don't need that extra calculation. It's a small thing, but can make a difference in tight loops.
One other slight difference is that you can't use erase() on an element in a vector by index, you must have an iterator. No big deal since you can always do "vect.begin() + index" as your iterator, but there are other considerations. For example, if you do this then you must always check your index against size() and not some variable you assigned that value.
None of that is really too much worth worrying about but if given the choice I prefer iterator access for the reasons already stated and this one.
I would say it's more a matter of consistency and code reuse.
Consistency in that you will use all other containers with iterators
Code reuse in that algorithms written for iterators cannot be used with the subscript operator and vice versa... and the STL has lots of algorithms so you definitely want to build on it.
Finally, I'd like to say that even C arrays have iterators.
const Foo* someArray = //...
const Foo* otherArray = new Foo[someArrayLength];
std::copy(someArray, someArray + someArrayLength, otherArray);
The iterator_traits class has been specialized so that pointers or a model of RandomAccessIterator.