Idiomatic way to cheaply remove an element from an arbitrary container? - c++

All C++ standard library containers have an insert() method; yet they don't all have a remove() method which does not take any arguments, but performs the cheapest possible removal at arbitrary order. Now, of course this would act differently for different containers: In a vector we would remove from the back, in a singly-list list we'd remove from the front (unless we kept a pointer to the tail), and so forth according to implementation details.
So, is there a more idiomatic way to do this other than rolling my own template specialization for every container?

Part of the design of the standard containers is that they ONLY provide operations as member functions if it is possible to provide one that is optimal (by chosen measures) for that container.
If a container provides a member function, that is because there is some way of implementing that function is a way that is optimal for that container.
If it is not possible to provide an optimal implementation of an operation (like remove()) then it is not provided.
Only std::list and (C++11 and later) std::forward_list are designed for efficient removal of elements, which is why they are the only containers with a remove() member function.
The other containers are not designed for efficient removal of arbitrary elements;
std::array cannot be resized, so it makes no sense have have either an insert() OR a remove() member function.
std::deque is only optimised for removal at the beginning or end.
Removal of an element from std::vector is less efficient than other containers, except (possibly) from the end.
Implementing a remove() member function for these containers therefore goes against the design philosophy.
So if you want to be able to efficiently remove elements from a container, you need to pick the right container for the job.
Rolling your own wrapper for the standard containers, to emulate operations that some containers don't support is simply misleading - in the sense of encouraging a user of your wrapper classes to believe they don't need to be careful with their choice of container if they have particular requirements of performance or memory usage.

So to answer your question
"So, is there a more idiomatic way to do this other than rolling my own template specialization for every container?
There are lot of ways to do remove
Sequence container and unordered container's erase() returns the next
iterator after the erased item.
Associative container's erase() returns nothing.
/*
* Remove from Vector or Deque
*/
vector<int> vec = {1, 4, 1, 1, 1, 12, 18, 16}; // To remove all '1'
for (vector<int>::iterator itr = vec.begin(); itr != vec.end(); ++itr) {
if ( *itr == 1 ) {
vec.erase(itr);
}
} // vec: { 4, 12, 18, 16}
// Complexity: O(n*m)
remove(vec.begin(), vec.end(), 1); // O(n)
// vec: {4, 12, 18, 16, ?, ?, ?, ?}
vector<int>::iterator newEnd = remove(vec.begin(), vec.end(), 1); // O(n)
vec.erase(newEnd, vec.end());
// Similarly for algorithm: remove_if() and unique()
// vec still occupy 8 int space: vec.capacity() == 8
vec.shrink_to_fit(); // C++ 11
// Now vec.capacity() == 4
// For C++ 03:
vector<int>(vec).swap(vec); // Release the vacant memory
/*
* Remove from List
*/
list<int> mylist = {1, 4, 1, 1, 1, 12, 18, 16};
list<int>::iterator newEnd = remove(mylist.begin(), mylist.end(), 1);
mylist.erase(newEnd, mylist.end());
mylist.remove(1); // faster
/*
* Remove from associative containers or unordered containers
*/
multiset<int> myset = {1, 4, 1, 1, 1, 12, 18, 16};
multiset<int>::iterator newEnd = remove(myset.begin(), myset.end(), 1);
myset.erase(newEnd, myset.end()); // O(n)
myset.erase(1); // O(log(n)) or O(1)

Related

std::vector<T>::assign using a subrange valid?

I want to convert a vector into a sub-range of that vector, for example by removing the first and last values. Is the use of the assign member function valid in this context?
std::vector<int> data = {1, 2, 3, 4};
data.assign(data.begin() + 1, data.end() - 1);
// data is hopefully {2, 3}
Cppreference states that
All iterators, pointers and references to the elements of the container are invalidated. The past-the-end iterator is also invalidated.
This invalidation, however, doesn't appear to happen until the end of assign.
To be safe, I could just go with the following but it seems more verbose:
std::vector<int> data = {1, 2, 3, 4};
data = std::vector<int>{data.begin() + 1, data.end() - 1};
// data is now {2, 3}
The __invalidate_all_iterators function that your link refers to is merely a debugging tool. It doesn't "cause" the iterators to be invalidated; It effectively reports that the iterators have been invalidated by the previous actions. It may be that this debugging tool might not catch a bug caused by this assignment.
It is a precondition of assign that the iterators are not to the same container. A precondition violation results in undefined behaviour.
Standard quote (latest draft):
[sequence.reqmts] a.assign(i,j) Expects: T is Cpp17EmplaceConstructible into X from *i and assignable from *i.
For vector, if the iterator does not meet the forward iterator requirements ([forward.iterators]), T is also Cpp17MoveInsertable into X.
Neither i nor j are iterators into a.
Your safe alternative is correct.
If you want to avoid reallocation (keeping in mind that there will be unused space left), and if you want to avoid copying (which is important for complex types, doesn't matter for int), then following should be efficient:
int begin_offset = 1;
int end_offset = 1;
if (begin_offset)
std::move(data.begin() + begin_offset, data.end() - end_offset, data.begin());
data.erase(data.end() - end_offset - begin_offset, data.end());

How these lines execute [duplicate]

int main()
{
const int SIZE = 10;
int a[SIZE] = {10, 2, 35, 5, 10, 26, 67, 2, 5, 10};
std::ostream_iterator< int > output(cout, " ");
std::vector< int > v(a, a + SIZE);
std::vector< int >::iterator newLastElement;
cout << "contents of the vector: ";
std::copy(v.begin(), v.end(), output);
newLastElement = std::remove(v.begin(), v.end(), 10);
cout << "\ncontents of the vector after remove: ";
//std::copy(v.begin(), newLastElement, output);
//this gives the correct result : 2 35 5 26 67 2 5
std::copy(v.begin(), v.end(), output);
//this gives a 10 which was supposed to be removed : 2 35 5 26 67 2 5 2 5 10
cout << endl;
return 0;
}
There are three 10 in the array a.
why does the array v contains a 10 after we remove the all the 10s with remove function.
you can see the compiled output also here
Actually std::remove doesn't remove the item from the container. Quoted from here
Remove removes from the range [first, last) all elements that are equal to value. That is, remove returns an iterator new_last such that the range [first, new_last) contains no elements equal to value. The iterators in the range [new_last, last) are all still dereferenceable, but the elements that they point to are unspecified. Remove is stable, meaning that the relative order of elements that are not equal to value is unchanged.`
That is, std::remove works with a pair of iterators only and does not know anything about the container which actually contains the items. In fact, it's not possible for std::remove to know the underlying container, because there is no way it can go from a pair of iterators to discover about the container to which the iterators belong. So std::remove doesn't really remove the items, simply because it cannot. The only way to actually remove an item from a container is to invoke a member function on that container.
So if you want to remove the items, then use Erase-Remove Idiom:
v.erase(std::remove(v.begin(), v.end(), 10), v.end());
The erase-remove idiom is so common and useful is that std::list has added another member function called list::remove which produces the same effect as that of the erase-remove idiom.
std::list<int> l;
//...
l.remove(10); //it "actually" removes all elements with value 10!
That means, you don't need to use erase-remove idiom when you work with std::list. You can directly call its member function list::remove.
The reason is that STL algorithms do not modify the size of the sequence. remove, instead of actually erasing items, moves them and returns an iterator to the "new" end. That iterator can then be passed to the erase member function of your container to actually perform the removal:
v.erase(std::remove(v.begin(), v.end(), 10), v.end());
By the way, this is known as the "erase-remove idiom".
EDIT: I was incorrect. See comments, and Nawaz's answer.
Because std::remove doesn't actually shrink the container, it just moves all the elements down to to fill up the spot used by the "removed" element. For example, if you have a sequence 1 2 3 4 5 and use std::remove to remove the value 2, your sequence will look like 1 3 4 5 5. If you then remove the value 4, you'll get 1 3 5 5 5. At no point does the sequence ever get told to be shorter.
C++20 introduces a new non-member function std::erase that simplifies this task for all standard library containers.
The solution suggested by multiple older answers here:
v.erase(std::remove(v.begin(), v.end(), 10), v.end());
Can now be written as:
std::erase(v, 10);

Why is there a separation of algorithms, iterators and containers in C++ STL

I can't figure out why they have separated algorithms, iterators and containers in C++ STL. If it's a heavy use of templates everywhere, then we can have classes having all stuff in one place with template parameters.
Some text that I got explains that iterators helps algorithms to interact with containers data but what if containers expose some mechanism to access the data it possesses?
With M containers + N algorithms, one would normally need M * N pieces of code, but with iterators acting as "glue", this can be reduced to M + N pieces of code.
Example: run 2 algorithms on 3 containers
std::list<int> l = { 0, 2, 5, 6, 3, 1 }; // C++11 initializer lists
std::vector<int> v = { 0, 2, 5, 6, 3, 1 }; // C++11 initializer lists
std::array<int, 5> a = { 0, 2, 5, 6, 3, 1 };
auto l_contains1 = std::find(l.begin(), l.end(), 1) != l.end();
auto v_contains5 = std::find(v.begin(), v.end(), 5) != v.end();
auto a_contains3 = std::find(a.begin(), a.end(), 3) != a.end();
auto l_count1 = std::count(l.begin(), l.end(), 1);
auto v_count5 = std::count(v.begin(), v.end(), 5);
auto a_count3 = std::count(a.begin(), a.end(), 3);
You are calling only 2 different algorithms, and only have code for 3 containers. Each container passes the begin() and end() iterators to the container. Even though you have 3 * 2 lines of code to generate the answers, there are only 3 + 2 pieces of functionality that need to be written.
For more containers and algorithms, this separation is an enormous reduction in the combinatorial explosion in code that would otherwise ensue: there are 5 sequence containers, 8 associative containers and 3 container adapters in the STL, and there are almost 80 algorithms in <algorithm> alone (not even counting those in <numeric>) so that you have only 16 + 80 instead of 16 * 80, an 13-fold reduction in code! (Of course, not every algorithm makes sense on every container, but the point should be clear).
The iterators can be divided into 5 categories (input, output, forward, bidirectional and random access), and some algorithms will delegate to specialized versions depending on the iterator capabilities. This will diminish the code reduction somewhat, but greatly improve efficiency by selecting the best adapted algorithm to the iterator at hand.
Note that the STL is not completely consistent in the separation: std::list has its own sort member function that uses implementation specific details to sort itself, and std::string has an enormous number of member function algorithms, most of which could have been implemented as non-member functions.

STL remove doesn't work as expected?

int main()
{
const int SIZE = 10;
int a[SIZE] = {10, 2, 35, 5, 10, 26, 67, 2, 5, 10};
std::ostream_iterator< int > output(cout, " ");
std::vector< int > v(a, a + SIZE);
std::vector< int >::iterator newLastElement;
cout << "contents of the vector: ";
std::copy(v.begin(), v.end(), output);
newLastElement = std::remove(v.begin(), v.end(), 10);
cout << "\ncontents of the vector after remove: ";
//std::copy(v.begin(), newLastElement, output);
//this gives the correct result : 2 35 5 26 67 2 5
std::copy(v.begin(), v.end(), output);
//this gives a 10 which was supposed to be removed : 2 35 5 26 67 2 5 2 5 10
cout << endl;
return 0;
}
There are three 10 in the array a.
why does the array v contains a 10 after we remove the all the 10s with remove function.
you can see the compiled output also here
Actually std::remove doesn't remove the item from the container. Quoted from here
Remove removes from the range [first, last) all elements that are equal to value. That is, remove returns an iterator new_last such that the range [first, new_last) contains no elements equal to value. The iterators in the range [new_last, last) are all still dereferenceable, but the elements that they point to are unspecified. Remove is stable, meaning that the relative order of elements that are not equal to value is unchanged.`
That is, std::remove works with a pair of iterators only and does not know anything about the container which actually contains the items. In fact, it's not possible for std::remove to know the underlying container, because there is no way it can go from a pair of iterators to discover about the container to which the iterators belong. So std::remove doesn't really remove the items, simply because it cannot. The only way to actually remove an item from a container is to invoke a member function on that container.
So if you want to remove the items, then use Erase-Remove Idiom:
v.erase(std::remove(v.begin(), v.end(), 10), v.end());
The erase-remove idiom is so common and useful is that std::list has added another member function called list::remove which produces the same effect as that of the erase-remove idiom.
std::list<int> l;
//...
l.remove(10); //it "actually" removes all elements with value 10!
That means, you don't need to use erase-remove idiom when you work with std::list. You can directly call its member function list::remove.
The reason is that STL algorithms do not modify the size of the sequence. remove, instead of actually erasing items, moves them and returns an iterator to the "new" end. That iterator can then be passed to the erase member function of your container to actually perform the removal:
v.erase(std::remove(v.begin(), v.end(), 10), v.end());
By the way, this is known as the "erase-remove idiom".
EDIT: I was incorrect. See comments, and Nawaz's answer.
Because std::remove doesn't actually shrink the container, it just moves all the elements down to to fill up the spot used by the "removed" element. For example, if you have a sequence 1 2 3 4 5 and use std::remove to remove the value 2, your sequence will look like 1 3 4 5 5. If you then remove the value 4, you'll get 1 3 5 5 5. At no point does the sequence ever get told to be shorter.
C++20 introduces a new non-member function std::erase that simplifies this task for all standard library containers.
The solution suggested by multiple older answers here:
v.erase(std::remove(v.begin(), v.end(), 10), v.end());
Can now be written as:
std::erase(v, 10);

stl insertion iterators

Maybe I am missing something completely obvious, but I can't figure out why one would use back_inserter/front_inserter/inserter,
instead of just providing the appropriate iterator from the container interface.
And thats my question.
Because those call push_back, push_front, and insert which the container's "normal" iterators can't (or, at least, don't).
Example:
int main() {
using namespace std;
vector<int> a (3, 42), b;
copy(a.begin(), a.end(), back_inserter(b));
copy(b.rbegin(), b.rend(), ostream_iterator<int>(cout, ", "));
return 0;
}
The main reason is that regular iterators iterate over existing elements in the container, while the *inserter family of iterators actually inserts new elements in the container.
std::vector<int> v(3); // { 0, 0, 0 }
int array[] = { 1, 2, 3 };
std::copy( array, array+3, std::back_inserter(v) ); // adds 3 elements
// v = { 0, 0, 0, 1, 2, 3 }
std::copy( array, array+3, v.begin() ); // overwrites 3 elements
// v = { 1, 2, 3, 1, 2, 3 }
int array2[] = { 4, 5, 6 };
std::copy( array2, array2+3, std::inserter(v, v.begin()) );
// v = { 4, 5, 6, 1, 2, 3, 1, 2, 3 }
The iterator points to an element, and doesn't in general know what container it's attached to. (Iterators were based on pointers, and you can't tell from a pointer what data structure it's associated with.)
Adding an element to an STL container changes the description of the container. For example, STL containers have a .size() function, which has to change. Since some metadata has to change, whatever inserts the new element has to know what container it's adding to.
It is all a matter of what you really need. Both kinds of iterators can be used, depending on your intent.
When you use an "ordinary" iterator, it does not create new elements in the container. It simply writes the data into the existing consecutive elements of the container, one after another. It overwrites any data that is already in the container. And if it is allowed to reach the end of the sequence, any further writes make it "fall off the end" and cause undefined behavior. I.e. it crashes.
Inserter iterators, on the other hand, create a new element and insert it at the current position (front, back, somewhere in the middle) every time something is written through them. They never overwrite existing elements, they add new ones.