C++ equivalent of Python difference_update? - c++

s1 and s2 are sets (Python set or C++ std::set)
To add the elements of s2 to s1 (set union), you can do
Python: s1.update(s2)
C++: s1.insert(s2.begin(), s2.end());
To remove the elements of s2 from s1 (set difference), you can do
Python: s1.difference_update(s2)
What is the C++ equivalent of this? The code
s1.erase(s2.begin(), s2.end());
does not work, for s1.erase() requires iterators from s1.The code
std::set<T> s3;
std::set_difference(s1.begin(), s1.end(), s2.begin(), s2.end(), std::inserter(s3, s3.end());
s1.swap(s3);
works, but seems overly complex, at least compared with Python.
Is there a simpler way?

Using std::set_difference is the idiomatic way to do this in C++. You have stumbled across one of the primary differences (pun intended) between C++/STL and many other languages. STL does not bundle operations directly with the data structures. This is why std::set does not implement a difference routine.
Basically, algorithms such as std::set_difference write the result of the operation to another object. It is interesting to note that the algorithm does not require that either or both of the operands are actually std::set. The definition of the algorithm is:
Effects: Copies the elements of the range [first1, last1) which are not present in the range [first2, last2) to the range beginning at result. The elements in the constructed range are sorted.
Requires: The resulting range shall not overlap with either of the original ranges. Input ranges are required to be order by the same operator<.
Returns: The end of the constructed range.
Complexity: At most 2 * ((last1 - first1) + (last2 - first2)) - 1 comparisons
The interesting difference is that the C++ version is applicable to any two sorted ranges. In most languages, you are forced to coerce or translate the calling object (left-hand operand) into a set before you have access to the set difference algorithm.
This is not really pertinent to your question, but this is the reason that the various set algorithms are modeled as free-standing algorithms instead of member methods.

You should iterate through the second set:
for( set< T >::iterator iter = s2.begin(); iter != s2.end(); ++iter )
{
s1.erase( *iter );
}
This will could be cheaper than using std::set_difference - set_difference copies the unique objects into a new container, but it takes linear time, while .erase will not copy anything, but is O(n * log( n ) ).
In other words, depends on the container, you could choose the way, that will be faster for your case.
Thanks David Rodríguez - dribeas for the remark! (:
EDIT: Doh! I thought about BOOST_FOREACH at the very beginning, but I was wrong that it could not be used.. - you don't need the iterator, but just the value.. As user763305 said by himself/herself.

In c++ there is no difference method in the set. The set_difference looks much more awkward as it is more generic than applying a difference on two sets. Of course you can implement your own version of in place difference on sets:
template <typename T, typename Compare, typename Allocator>
void my_set_difference( std::set<T,Compare,Allocator>& lhs, std::set<T,Compare,Allocator> const & rhs )
{
typedef std::set<T,Comapre,Allocator> set_t;
typedef typename set_t::iterator iterator;
typedef typename set_t::const_iterator const_iterator;
const_iterator rit = rhs.begin(), rend = rhs.end();
iterator it = lhs.begin(), end = lhs.end();
while ( it != end && rit != rend )
{
if ( lhs.key_comp( *it, *rit ) ) {
++it;
} else if ( lhs.key_comp( *rit, *it ) ) {
++rit;
} else {
++rit;
lhs.erase( it++ );
}
}
}
The performance of this algorithm will be linear in the size of the arguments, and require no extra copies as it modifies the first argument in place.

You can also do it with remove_if writing your own functor for testing existence in a set, e.g.
std::remove_if(s1.begin(), s1.end(), ExistIn(s2));
I suppose that set_difference is more efficient though as it probably scans both sets only once

Python set is unordered, and is more of an equivalent of C++ std::unordered_set than std::set, which is ordered.
David Rodríguez's algorithm relies on the fact that std::set is ordered, so the lhs and rhs sets can be traversed in the way as exhibit in the algorithm.
For a more general solution that works for both ordered and unordered sets, Kiril Kirov's algorithm should be the safe one to adopt if you are enforcing/preserving the "unorderedness" nature of Python set.

Related

STL algorithm to repeat an erase operation on a second container?

Life gave me the following objects:
std::vector<T1> v1;
std::vector<T2> v2;
typename std::vector<T1>::iterator it_first;
typename std::vector<T1>::iterator it_last;
and the following constraints:
v1.size() == v2.size() > 0
v1.begin() <= it_first <= it_last <= v1.end()
Removing from v1 the range pointed by the two iterators is a trivial single line, but how do I remove the same range also from v2?
I can easily solve this for instance by building v2 iterators using a mix of std::distance/advance, but I was wondering if the STL provides some machinery for this. Something like the erase-remove idiom coupled with a transform operation, maybe? It seems beyond my STL-fu...
When you have an iterator, then you can get the index via std::distance to begin().
When you have an index, then you can get the iterator via begin() + index.
Thats bascially all you need to get the same range of incdices in the second vector.
Btw iterators are to abstract away the index. When you need the index, then work with the index. For erase that would be
size_t start = 1;
size_t end = 42;
std::erase( v1.begin() + start, v1.begin() + end );
std::erase( v2.begin() + start, v2.begin() + end );
I can easily solve this for instance by building v2 iterators using a mix of std::distance/advance, but I was wondering if the STL provides some machinery for this. Something like the erase-remove idiom coupled with a transform operation, maybe? It seems beyond my STL-fu...
I saw the edit only after writing the answer. The thing is std::distance / std::advance is the machinery to switch between iterators and indices.
Consider that iterators in general are for things that can be iterated. You do not even necesarily need a container to iterate. For example you could write an iterator type that lets you iterate integers. This iterator type could be used like this:
std::vector<std::string> x(100);
auto begin = int_iterator(0);
auto end = int_iterator(100);
for ( ; begin != end; ++begin) std::cout << x[ *begin ];
This is no C-ism. The example is perhaps not the best, but the int_iterator itself is on par with the general concept of iterators.
Usually iterators have no knowledge about the underlying container (if there is one). There are few exceptions (eg insert_iterator). And in case of erase of course you need to pass iterators that refer to element in the corresponding container.

mid point using reverse iterators

palindromes implementation using reverse iterators
the error in code is of "operator /", is not defined for iterators
bool isPalindrome( std::string & s)
{
bool check = ( s == std::string{ s.rbegin(), s.rend() } );
return check; // works fine
}
in above there are n comparisons. ( n=s.length )
s == string{ s.rbegin(), s.rbegin() + (s.rend()/2) }
/* error: operator/ not defined */
I'm expecting a one or two lines of code for palindrome check with floor(n/2) comparisons.
Is there an elegant code. Am I missing something about reverse iterators?
and input of std::string{"cac"} should return true and should require 1 comparison
How to get mid-point in O(1) time, using reverse iterators
Dividing an iterator by a number does not really make sense. What you can do is obtain an iterator that is advanced half way the length of the container like this:
string{s.rbegin(), std::next(s.rbegin(), s.size() / 2)}
std::next obtains the iterator after incrementing it the supplied number of times.
This is only going to be efficient O(1) for contiguous containers like std::vector, std::array and std::string.
You are confusing something that references an element with the index of that element. Just think about what s.rend()/2 is supposed to mean. What you actually want is some difference between two indices divided by 2.
Given two iterators you can get their distance via std::distance.

Is there a std::unique-style library algorithm that has user-defined collision handler?

I have a basic std::vector of key/value pairs. It is sorted by key. I would like to reduce all of the adjacent duplicate key entries using a user-defined binary operator while compacting the vector.
This is basically a std::unique application where the user can decide how to handle the collision rather than just keeping the first entry.
Is there a library algorithm that satisfies this requirement? I can write my own but I would prefer to rely on something that an expert has written.
The map-as-sorted-vector is core to other parts of the algorithm and can't be changed. I am limited to C++14.
I can't think of a standard algo for this. std::unique almost satisfies the requirement, but unfortunately the BinaryPredicate you supply to compare elements isn't allowed to modify them ("binary_pred shall not apply any non-constant function through the dereferenced iterators." - [algorithms.requirements] paragraph 7 in the C++17 Standard) - a requirement that lets the implementation optimise more freely (e.g. parallel processing of different parts of the vector).
An implementation's not too hard though...
template <typename Iterator, typename BinaryPredicate, typename Compaction>
Iterator compact(Iterator begin, Iterator end, BinaryPredicate equals, Compaction compaction)
{
if (begin == end) return begin;
Iterator compact_to = begin;
while (++begin != end)
if (equals(*begin, *compact_to))
compaction(*compact_to, *begin);
else
*++compact_to = *begin;
return ++compact_to;
}
The return value will be the new "end" for the compacted vector - you can erase therefrom like you would for remove_if.
You can see it running here.

why doesn't std::remove_copy_if() actually remove?

Could this be the worst named function in the STL? (rhetorical question)
std::remove_copy_if() doesn't actually appear to do any removing. As best I can tell, it behaves more like copy_if_not.
The negation is a bit confusing, but can be worked around with std::not1(), however I might be misunderstanding something as I cannot fathom what this function has to do with removing - am I missing something?
If not, is there an STL algorithm for conditionally removing (moving?) elements from a container & putting them in another container?
Editing to add an example so readers are less confused.
The following program appears to leave the input range (V1) untouched:
#include <vector>
#include <iostream>
#include <algorithm>
#include <iterator>
using std::cout;
using std::endl;
int main (void)
{
std::vector<int> V1, V2;
V1.push_back(-2);
V1.push_back(0);
V1.push_back(-1);
V1.push_back(0);
V1.push_back(1);
V1.push_back(2);
std::copy(V1.begin(), V1.end(), std::ostream_iterator<int>(cout, " "));
cout << endl;
std::remove_copy_if(
V1.begin(),
V1.end(),
std::back_inserter(V2),
std::bind2nd(std::less<int>(), 0));
std::copy(V2.begin(), V2.end(), std::ostream_iterator<int>(cout, " "));
cout << endl;
std::copy(V1.begin(), V1.end(), std::ostream_iterator<int>(cout, " "));
cout << endl;
}
It outputs:
-2 0 -1 0 1 2
0 0 1 2
-2 0 -1 0 1 2
I was expecting so see something like:
-2 0 -1 0 1 2
0 0 1 2
0 0 1 2 ? ? ?
Where ? could be any value. But I was surprised to see that the input range was untouched, & that the return value is not able to be used with (in this case) std::vector::erase(). (The return value is an output iterator.)
Could this be the worst named function in the STL?
A bit of background information: in the standard library (or the original STL), there are three concepts, the containers, the iterators into those containers and algorithms that are applied to the iterators. Iterators serve as a cursor and accessor into the elements of a range but do not have a reference to the container (as mentioned before, there might not even be an underlying container).
This separation has the nice feature that you can apply algorithms to ranges of elements that do not belong to a container (consider iterator adaptors like std::istream_iterator or std::ostream_iterator) or that, belonging to a container do not consider all elements (std::sort( v.begin(), v.begin()+v.size()/2 ) to short the first half of the container).
The negative side is that, because the algorithm (and the iterator) don't really know of the container, they cannot really modify it, they can only modify the stored elements (which is what they can access). Mutating algorithms, like std::remove or std::remove_if work on this premise: they overwrite elements that don't match the condition effectively removing them from the container, but they do not modify the container, only the contained values, that is up to the caller in a second step of the erase-remove idiom:
v.erase( std::remove_if( v.begin(), v.end(), pred ),
v.end() );
Further more, for mutating algorithms (those that perform changes), like std::remove there is a non-mutating version named by adding copy to the name: std::remove_copy_if. None of the XXXcopyYYY algorithms are considered to change the input sequence (although they can if you use aliasing iterators).
While this is really no excuse for the naming of std::remove_copy_if, I hope that it helps understanding what an algorithm does given its name: remove_if will modify contents of the range and yield a range for which all elements that match the predicate have been removed (the returned range is that formed by the first argument to the algorithm to the returned iterator). std::remove_copy_if does the same, but rather than modifying the underlying sequence, it creates a copy of the sequence in which those elements matching the predicate have been removed. That is, all *copy* algorithms are equivalent to copy and then apply the original algorithm (note that the equivalence is logical, std::remove_copy_if only requires an OutputIterator, which means that it could not possibly copy and then walk the copied range applying std::remove_if.
The same line of reasoning can be applied to other mutating algorithms: reverse reverses the values (remember, iterators don't access the container) in the range, reverse_copy copies the elements in the range to separate range in the reverse order.
If not, is there an STL algorithm for conditionally removing (moving?) elements from a container & putting them in another container?
There is no such algorithm in the STL, but it could be easily implementable:
template <typename FIterator, typename OIterator, typename Pred>
FIterator splice_if( FIterator first, FIterator last, OIterator out, Pred p )
{
FIterator result = first;
for ( ; first != last; ++first ) {
if ( p( *first ) ) {
*result++ = *first;
} else {
*out++ = *first;
}
}
return result;
}
is there an STL algorithm for conditionally removing (moving?) elements from a container & putting them in another container?
The closest thing I can think of is std::stable_partition:
std::vector<int> v;
// ...
auto it = std::stable_partition(v.begin(), v.end(), pick_the_good_elements);
std::vector<int> w(std::make_move_iter(it), std::make_move_iter(v.end()));
v.erase(it, v.end());
Now v will contain the "good" elements, and w will contain the "bad" elements.
If not, is there an STL algorithm for conditionally removing (moving?) elements from a container & putting them in another container?
Not really. The idea is that the modifying algorithms are allowed to "move" (not in the C++ sense of the word) elements in a container around but cannot change the length of the container. So the remove algorithms could be called prepare_for_removal.
By the way, C++11 provides std::copy_if, which allows you to copy selected elements from one container to another without playing funny logic games with remove_copy_if.
You are right, that is what it does... std::remove_copy_if copies the vector, removing anything that matches the pred.
std::remove_if ... removes on condition (or rather, shuffles things around).
I agree that remove is not the best name for this family of functions.
But as Luc said, there's a reason for it working the way it does, and the GoTW item that he mentions explains how it works. remove_if works exactly the same way as remove - which is what you would expect.
You might also want to read this Wikibooks article.

STL algorithm for merge with addition

I was using stl::merge to put two sorted collections into one.
But my object has a natural key; and a defined addition semantic, so what I am after is a merge_and_sum that would not just merge the two collections into a single N+M length collection, but if the operator== on the object returned true, would then operator+ them.
I have implemented it thus
template<class _InIt1, class _InIt2, class _OutIt>
_OutIt merge_and_sum(_InIt1 _First1, _InIt1 _Last1, _InIt2 _First2, _InIt2 _Last2, _OutIt _Dest )
{ // copy merging ranges, both using operator<
for (; _First1 != _Last1 && _First2 != _Last2; ++_Dest)
{
if ( *_First2 < *_First1 )
*_Dest = *_First2, ++_First2;
else if ( *_First2 == *_First1)
*_Dest = *_First2 + *_First1, ++_First1, ++_First2;
else
*_Dest = *_First1, ++_First1;
}
_Dest = copy(_First1, _Last1, _Dest); // copy any tail
return (copy(_First2, _Last2, _Dest));
}
But was wondering if I have reinvented something that is composable from the other algorithms.
It sounds like your collections are like multisets with duplicates collapsed by your + operator (maybe just summing the multiplicities instead of keeping redundant copies). I assume so, because you're not changing the sorting order when you +, so + isn't affecting your key.
You should use your implementation. There's nothing in STL that will do it as efficiently. The closest semantic I can think of is standard merge followed by unique_copy. You could almost get unique_copy to work with a side-effectful comparison operator, but that would be extremely ill advised, as the implementation doesn't promise to only compare things directly vs. via a value-copied temporary (or even a given number of times).
Your type and variable names are unpleasantly long ;)
You could use std::merge with an output iterator of your own creation, which does the following in operator=. I think this ends up making more calls to operator== than your version, though, so unless it works out as less code it's probably not worth it.
if ((mylist.size() > 0) && (newvalue == mylist.back())) {
mylist.back() += newvalue;
} else {
mylist.push_back(newvalue);
}
(Actually, writing a proper output iterator might be more fiddly than that, I can't remember. But I hope you get the general idea).
mylist is a reference to the collection you're merging into. If the target doesn't have back(), then you'll have to buffer one value in the output iterator, and only write it once you see a non-equal value. Then define a flush function on the output iterator to write the last value, and call it at the end. I'm pretty sure that in this case it is too much mess to beat what you've already done.
Well, your other option would be to use set_symmetric_difference to get the elements that were different, then use set_intersection to get the ones that are the same, but twice. Then add them together and insert into the first.
typedef set<MyType, MyComp> SetType;
SetType merge_and_add(const SetType& s1, const SetType& s2)
{
SetType diff;
set_symmetric_difference(s1.begin(), s1.end(), s2.begin(), s2.end(), inserter(s2, s2.end());
vector<SetType::value_type> same1, same2;
set_intersection(s1.begin(), s1.end(), s2.begin(), s2.end(), back_inserter(same1));
set_intersection(s2.begin(), s2.end(), s1.begin(), s1.end(), back_inserter(same2));
transform(same1.begin(), same1.end(), same2.begin(), inserter(diff, diff.begin()), plus<SetType::value_type, SetType::value_type>());
return diff;
}
Side note! You should stick to either using operator==, in which case you should use an unordered_set, or you should use operator< for a regular set. A set is required to be partially ordered which means 2 entries are deemed equivalent if !(a < b) && !(b < a). So even if your two objects are unequal by operator==, if they satisfy this condition the set will consider them duplicates. So for your function supplied above I highly recommend refraining from using an == comparison.