why doesn't std::remove_copy_if() actually remove? - c++

Could this be the worst named function in the STL? (rhetorical question)
std::remove_copy_if() doesn't actually appear to do any removing. As best I can tell, it behaves more like copy_if_not.
The negation is a bit confusing, but can be worked around with std::not1(), however I might be misunderstanding something as I cannot fathom what this function has to do with removing - am I missing something?
If not, is there an STL algorithm for conditionally removing (moving?) elements from a container & putting them in another container?
Editing to add an example so readers are less confused.
The following program appears to leave the input range (V1) untouched:
#include <vector>
#include <iostream>
#include <algorithm>
#include <iterator>
using std::cout;
using std::endl;
int main (void)
{
std::vector<int> V1, V2;
V1.push_back(-2);
V1.push_back(0);
V1.push_back(-1);
V1.push_back(0);
V1.push_back(1);
V1.push_back(2);
std::copy(V1.begin(), V1.end(), std::ostream_iterator<int>(cout, " "));
cout << endl;
std::remove_copy_if(
V1.begin(),
V1.end(),
std::back_inserter(V2),
std::bind2nd(std::less<int>(), 0));
std::copy(V2.begin(), V2.end(), std::ostream_iterator<int>(cout, " "));
cout << endl;
std::copy(V1.begin(), V1.end(), std::ostream_iterator<int>(cout, " "));
cout << endl;
}
It outputs:
-2 0 -1 0 1 2
0 0 1 2
-2 0 -1 0 1 2
I was expecting so see something like:
-2 0 -1 0 1 2
0 0 1 2
0 0 1 2 ? ? ?
Where ? could be any value. But I was surprised to see that the input range was untouched, & that the return value is not able to be used with (in this case) std::vector::erase(). (The return value is an output iterator.)

Could this be the worst named function in the STL?
A bit of background information: in the standard library (or the original STL), there are three concepts, the containers, the iterators into those containers and algorithms that are applied to the iterators. Iterators serve as a cursor and accessor into the elements of a range but do not have a reference to the container (as mentioned before, there might not even be an underlying container).
This separation has the nice feature that you can apply algorithms to ranges of elements that do not belong to a container (consider iterator adaptors like std::istream_iterator or std::ostream_iterator) or that, belonging to a container do not consider all elements (std::sort( v.begin(), v.begin()+v.size()/2 ) to short the first half of the container).
The negative side is that, because the algorithm (and the iterator) don't really know of the container, they cannot really modify it, they can only modify the stored elements (which is what they can access). Mutating algorithms, like std::remove or std::remove_if work on this premise: they overwrite elements that don't match the condition effectively removing them from the container, but they do not modify the container, only the contained values, that is up to the caller in a second step of the erase-remove idiom:
v.erase( std::remove_if( v.begin(), v.end(), pred ),
v.end() );
Further more, for mutating algorithms (those that perform changes), like std::remove there is a non-mutating version named by adding copy to the name: std::remove_copy_if. None of the XXXcopyYYY algorithms are considered to change the input sequence (although they can if you use aliasing iterators).
While this is really no excuse for the naming of std::remove_copy_if, I hope that it helps understanding what an algorithm does given its name: remove_if will modify contents of the range and yield a range for which all elements that match the predicate have been removed (the returned range is that formed by the first argument to the algorithm to the returned iterator). std::remove_copy_if does the same, but rather than modifying the underlying sequence, it creates a copy of the sequence in which those elements matching the predicate have been removed. That is, all *copy* algorithms are equivalent to copy and then apply the original algorithm (note that the equivalence is logical, std::remove_copy_if only requires an OutputIterator, which means that it could not possibly copy and then walk the copied range applying std::remove_if.
The same line of reasoning can be applied to other mutating algorithms: reverse reverses the values (remember, iterators don't access the container) in the range, reverse_copy copies the elements in the range to separate range in the reverse order.
If not, is there an STL algorithm for conditionally removing (moving?) elements from a container & putting them in another container?
There is no such algorithm in the STL, but it could be easily implementable:
template <typename FIterator, typename OIterator, typename Pred>
FIterator splice_if( FIterator first, FIterator last, OIterator out, Pred p )
{
FIterator result = first;
for ( ; first != last; ++first ) {
if ( p( *first ) ) {
*result++ = *first;
} else {
*out++ = *first;
}
}
return result;
}

is there an STL algorithm for conditionally removing (moving?) elements from a container & putting them in another container?
The closest thing I can think of is std::stable_partition:
std::vector<int> v;
// ...
auto it = std::stable_partition(v.begin(), v.end(), pick_the_good_elements);
std::vector<int> w(std::make_move_iter(it), std::make_move_iter(v.end()));
v.erase(it, v.end());
Now v will contain the "good" elements, and w will contain the "bad" elements.

If not, is there an STL algorithm for conditionally removing (moving?) elements from a container & putting them in another container?
Not really. The idea is that the modifying algorithms are allowed to "move" (not in the C++ sense of the word) elements in a container around but cannot change the length of the container. So the remove algorithms could be called prepare_for_removal.
By the way, C++11 provides std::copy_if, which allows you to copy selected elements from one container to another without playing funny logic games with remove_copy_if.

You are right, that is what it does... std::remove_copy_if copies the vector, removing anything that matches the pred.
std::remove_if ... removes on condition (or rather, shuffles things around).

I agree that remove is not the best name for this family of functions.
But as Luc said, there's a reason for it working the way it does, and the GoTW item that he mentions explains how it works. remove_if works exactly the same way as remove - which is what you would expect.
You might also want to read this Wikibooks article.

Related

STL algorithm to repeat an erase operation on a second container?

Life gave me the following objects:
std::vector<T1> v1;
std::vector<T2> v2;
typename std::vector<T1>::iterator it_first;
typename std::vector<T1>::iterator it_last;
and the following constraints:
v1.size() == v2.size() > 0
v1.begin() <= it_first <= it_last <= v1.end()
Removing from v1 the range pointed by the two iterators is a trivial single line, but how do I remove the same range also from v2?
I can easily solve this for instance by building v2 iterators using a mix of std::distance/advance, but I was wondering if the STL provides some machinery for this. Something like the erase-remove idiom coupled with a transform operation, maybe? It seems beyond my STL-fu...
When you have an iterator, then you can get the index via std::distance to begin().
When you have an index, then you can get the iterator via begin() + index.
Thats bascially all you need to get the same range of incdices in the second vector.
Btw iterators are to abstract away the index. When you need the index, then work with the index. For erase that would be
size_t start = 1;
size_t end = 42;
std::erase( v1.begin() + start, v1.begin() + end );
std::erase( v2.begin() + start, v2.begin() + end );
I can easily solve this for instance by building v2 iterators using a mix of std::distance/advance, but I was wondering if the STL provides some machinery for this. Something like the erase-remove idiom coupled with a transform operation, maybe? It seems beyond my STL-fu...
I saw the edit only after writing the answer. The thing is std::distance / std::advance is the machinery to switch between iterators and indices.
Consider that iterators in general are for things that can be iterated. You do not even necesarily need a container to iterate. For example you could write an iterator type that lets you iterate integers. This iterator type could be used like this:
std::vector<std::string> x(100);
auto begin = int_iterator(0);
auto end = int_iterator(100);
for ( ; begin != end; ++begin) std::cout << x[ *begin ];
This is no C-ism. The example is perhaps not the best, but the int_iterator itself is on par with the general concept of iterators.
Usually iterators have no knowledge about the underlying container (if there is one). There are few exceptions (eg insert_iterator). And in case of erase of course you need to pass iterators that refer to element in the corresponding container.

Finding length of range in C++

I have defied a range and I need to find the number of elements in it. My current code is
size_t c = 0;
for(auto elem : range){
c++;
}
return c;
However, the compiler is whining about the unused variable elem and I can't get rid of it. I though of using something like
std::count_if(range.begin(), range.end(), [](type elem){return ture;});
But I feel it is an overkill and it does not seem right.
I am wondering if there is a nicer systematic way of achieving this without defining an extra variable?
If (for some reason) your range doesn't have a size() member function, you can use std::distance. This should always work since a range is required to have a begin and end iterator.
std::distance(cbegin(range), cend(range));
You can use std::distance like
std::distance(range.begin(), range.end());
Returns the number of hops from first to last.
And note the complexity:
Complexity
Linear.
However, if InputIt additionally meets the requirements of
LegacyRandomAccessIterator,
complexity is constant.
In C++ all the containers implement size method of a constant complexity so if by range you consider a container then there is no need for reinventing the wheel.
However, you can also use std::distance if you want to determine the number of elements from a certain range like:
std::vector<int> v{ 3, 1, 4 };
std::cout << std::distance(v.begin(), v.end() << std::endl; // 3
If you take a look at the possible implementation of std::distance, it is something like
while (first != last) {
++first;
++n;
}
where first and the last are start and end point of you range, and n is a simple counter.

What is the difference between std::transform and std::for_each?

Both can be used to apply a function to a range of elements.
On a high level:
std::for_each ignores the return value of the function, and
guarantees order of execution.
std::transform assigns the return value to the iterator, and does
not guarantee the order of execution.
When do you prefer using the one versus the other? Are there any subtle caveats?
std::transform is the same as map. The idea is to apply a function to each element in between the two iterators and obtain a different container composed of elements resulting from the application of such a function. You may want to use it for, e.g., projecting an object's data member into a new container. In the following, std::transform is used to transform a container of std::strings in a container of std::size_ts.
std::vector<std::string> names = {"hi", "test", "foo"};
std::vector<std::size_t> name_sizes;
std::transform(names.begin(), names.end(), std::back_inserter(name_sizes), [](const std::string& name) { return name.size();});
On the other hand, you execute std::for_each for the sole side effects. In other words, std::for_each closely resembles a plain range-based for loop.
Back to the string example:
std::for_each(name_sizes.begin(), name_sizes.end(), [](std::size_t name_size) {
std::cout << name_size << std::endl;
});
Indeed, starting from C++11 the same can be achieved with a terser notation using range-based for loops:
for (std::size_t name_size: name_sizes) {
std::cout << name_size << std::endl;
}
Your high level overview
std::for_each ignores the return value of the function and guarantees order of execution.
std::transform assigns the return value to the iterator, and does not guarantee the order of execution.
pretty much covers it.
Another way of looking at it (to prefer one over the other);
Do the results (the return value) of the operation matter?
Is the operation on each element a member method with no return value?
Are there two input ranges?
One more thing to bear in mind (subtle caveat) is the change in the requirements of the operations of std::transform before and after C++11 (from en.cppreference.com);
Before C++11, they were required to "not have any side effects",
After C++11, this changed to "must not invalidate any iterators, including the end iterators, or modify any elements of the ranges involved"
Basically these were to allow the undetermined order of execution.
When do I use one over the other?
If I want to manipulate each element in a range, then I use for_each. If I have to calculate something from each element, then I would use transform. When using the for_each and transform, I normally pair them with a lambda.
That said, I find my current usage of the traditional for_each being diminished somewhat since the advent of the range based for loops and lambdas in C++11 (for (element : range)). I find its syntax and implementation very natural (but your mileage here will vary) and a more intuitive fit for some use cases.
Although the question has been answered, I believe that this example would clarify the difference further.
for_each belongs to non-modifying STL operations, meaning that these operations do not change elements of the collection or the collection itself. Therefore, the value returned by for_each is always ignored and is not assigned to a collection element.
Nonetheless, it is still possible to modify elements of collection, for example when an element is passed to the f function using reference. One should avoid such behavior as it is not consistent with STL principles.
In contrast, transform function belongs to modifying STL operations and applies given predicates (unary_op or binary_op) to elements of the collection or collections and store results in another collection.
#include <vector>
#include <iostream>
#include <algorithm>
#include <functional>
using namespace std;
void printer(int i) {
cout << i << ", ";
}
int main() {
int mynumbers[] = { 1, 2, 3, 4 };
vector<int> v(mynumbers, mynumbers + 4);
for_each(v.begin(), v.end(), negate<int>());//no effect as returned value of UnaryFunction negate() is ignored.
for_each(v.begin(), v.end(), printer); //guarantees order
cout << endl;
transform(v.begin(), v.end(), v.begin(), negate<int>());//negates elements correctly
for_each(v.begin(), v.end(), printer);
return 0;
}
which will print:
1, 2, 3, 4,
-1, -2, -3, -4,
Real example of using std::tranform is when you want to convert a string to uppercase, you can write code like this :
std::transform(s.begin(), s.end(), std::back_inserter(out), ::toupper);
if you will try to achieve same thing with std::for_each like :
std::for_each(s.begin(), s.end(), ::toupper);
It wont convert it into uppercase string

Remove elements from a c++ vector where the removal condition is dependent on other elements

The standard way to remove certain elements from a vector in C++ is the remove/erase idiom. However, the predicate passed to remove_if only takes the vector element under consideration as an argument. Is there a good STL way to do this if the predicate is conditional on other elements of the array?
To give a concrete example, consider removing all duplicates of a number immediately following it. Here the condition for removing the n-th element is conditional on the (n-1)-th element.
Before: 11234555111333
After: 1234513
There's a standard algorithm for this. std::unique will remove the elements that are duplicates of those preceding them (actually, just like remove_if it reorganizes the container so that the elements to be removed are gathered at the end of it).
Example on a std::string for simplicity:
#include <string>
#include <iostream>
#include <algorithm>
int main()
{
std::string str = "11234555111333";
str.erase(std::unique(str.begin(), str.end()), str.end());
std::cout << str; // 1234513
}
Others mentioned std::unique already, for your specific example. Boost.Range has the adjacent_filtered adaptor, which passes both the current and the next element in the range to your predicate and is, thanks to the predicate, applicable to a larger range of problems. Boost.Range however also has the uniqued adaptor.
Another possibility would be to simply keep a reference to the range, which is easy to do with a lambda in C++11:
std::vector<T> v;
v.erase(std::remove_if(v.begin(), v.end(),
[&](T const& x){
// use v, with std::find for example
}), v.end());
In my opinion, there will be easier to use simple traversal algorithm(via for) rather then use std::bind. Of course, with std::bind you can use other functions and predicates(which depends on previous elements). But in your example, you can do it via simple std::unique.

C++ equivalent of Python difference_update?

s1 and s2 are sets (Python set or C++ std::set)
To add the elements of s2 to s1 (set union), you can do
Python: s1.update(s2)
C++: s1.insert(s2.begin(), s2.end());
To remove the elements of s2 from s1 (set difference), you can do
Python: s1.difference_update(s2)
What is the C++ equivalent of this? The code
s1.erase(s2.begin(), s2.end());
does not work, for s1.erase() requires iterators from s1.The code
std::set<T> s3;
std::set_difference(s1.begin(), s1.end(), s2.begin(), s2.end(), std::inserter(s3, s3.end());
s1.swap(s3);
works, but seems overly complex, at least compared with Python.
Is there a simpler way?
Using std::set_difference is the idiomatic way to do this in C++. You have stumbled across one of the primary differences (pun intended) between C++/STL and many other languages. STL does not bundle operations directly with the data structures. This is why std::set does not implement a difference routine.
Basically, algorithms such as std::set_difference write the result of the operation to another object. It is interesting to note that the algorithm does not require that either or both of the operands are actually std::set. The definition of the algorithm is:
Effects: Copies the elements of the range [first1, last1) which are not present in the range [first2, last2) to the range beginning at result. The elements in the constructed range are sorted.
Requires: The resulting range shall not overlap with either of the original ranges. Input ranges are required to be order by the same operator<.
Returns: The end of the constructed range.
Complexity: At most 2 * ((last1 - first1) + (last2 - first2)) - 1 comparisons
The interesting difference is that the C++ version is applicable to any two sorted ranges. In most languages, you are forced to coerce or translate the calling object (left-hand operand) into a set before you have access to the set difference algorithm.
This is not really pertinent to your question, but this is the reason that the various set algorithms are modeled as free-standing algorithms instead of member methods.
You should iterate through the second set:
for( set< T >::iterator iter = s2.begin(); iter != s2.end(); ++iter )
{
s1.erase( *iter );
}
This will could be cheaper than using std::set_difference - set_difference copies the unique objects into a new container, but it takes linear time, while .erase will not copy anything, but is O(n * log( n ) ).
In other words, depends on the container, you could choose the way, that will be faster for your case.
Thanks David Rodríguez - dribeas for the remark! (:
EDIT: Doh! I thought about BOOST_FOREACH at the very beginning, but I was wrong that it could not be used.. - you don't need the iterator, but just the value.. As user763305 said by himself/herself.
In c++ there is no difference method in the set. The set_difference looks much more awkward as it is more generic than applying a difference on two sets. Of course you can implement your own version of in place difference on sets:
template <typename T, typename Compare, typename Allocator>
void my_set_difference( std::set<T,Compare,Allocator>& lhs, std::set<T,Compare,Allocator> const & rhs )
{
typedef std::set<T,Comapre,Allocator> set_t;
typedef typename set_t::iterator iterator;
typedef typename set_t::const_iterator const_iterator;
const_iterator rit = rhs.begin(), rend = rhs.end();
iterator it = lhs.begin(), end = lhs.end();
while ( it != end && rit != rend )
{
if ( lhs.key_comp( *it, *rit ) ) {
++it;
} else if ( lhs.key_comp( *rit, *it ) ) {
++rit;
} else {
++rit;
lhs.erase( it++ );
}
}
}
The performance of this algorithm will be linear in the size of the arguments, and require no extra copies as it modifies the first argument in place.
You can also do it with remove_if writing your own functor for testing existence in a set, e.g.
std::remove_if(s1.begin(), s1.end(), ExistIn(s2));
I suppose that set_difference is more efficient though as it probably scans both sets only once
Python set is unordered, and is more of an equivalent of C++ std::unordered_set than std::set, which is ordered.
David Rodríguez's algorithm relies on the fact that std::set is ordered, so the lhs and rhs sets can be traversed in the way as exhibit in the algorithm.
For a more general solution that works for both ordered and unordered sets, Kiril Kirov's algorithm should be the safe one to adopt if you are enforcing/preserving the "unorderedness" nature of Python set.