How does an STL algorithm identify on what kind of container it is operating on?
For example, sort accepts iterators as arguments.
How does it know what kind of container it has to sort?
It doesn't :-) That's the whole point of iterators -- the algorithms that work on them don't need to know anything about the underlying container, and vice versa.
How does it work then? Well, the iterators themselves have a certain set of well-known properties. For example, 'random-access' iterators allow any algorithm to access an element offset from iterator by a constant:
std::vector<int> vec = { 1, 2, 3, 4 };
assert(*(vec.begin() + 2) == 3);
For a sort, the iterators need to support random access (in order to access all the elements between the first and end iterators in an arbitrary order), and they need to be writable (in order to assign or otherwise swap values around), otherwise known as 'output' iterators.
Example of an output iterator vs. an input (read-only) one:
std::vector<int> vec = { 1, 2, 3, 4 };
*vec.begin() = 9;
assert(vec[0] == 9);
*vec.cbegin() = 10; // Does not compile -- cbegin() yields a const iterator
// that is 'random-access', but not 'output'
It doesn't need to know the type of the container, it just needs to know the type of iterator.
As mentioned earlier, STL uses iterators, not containers. It uses the technique known as "tag dispatch" to deduce proper algorithm flavor.
For example, STL has a function "advance" which moves given iterator it by given n positions
template<class IteratorType,
class IntegerDiffType> inline
void advance(IteratorType& it, IntegerDiffType n)
For bidirectional iterators it has to apply ++ or -- many times; for random access iterators it can jump at once. This function is used in std::binary_search, std::lower_bound and some other algorithms.
Internally, it uses iterator type traits to select the strategy:
template<class IteratorType,
class IntegerDiffType>
void advance(IteratorType& it, IntegerDiffType n)
{
typedef typename iterator_traits<IteratorType>::category category;
advance_impl(it, n, category());
}
Of course, STL has to implement the overloaded "impl" functions:
template<class IteratorType,
class IntegerDiffType>
void advance(IteratorType& it, IntegerDiffType n, bidirectional_iterator_tag)
{ // increment iterator by offset, bidirectional iterators
for (; 0 < n; --n)
++it;
for (; n < 0; ++n)
--it;
}
template<class IteratorType,
class IntegerDiffType>
void advance(IteratorType& it, IntegerDiffType n, random_access_iterator_tag)
{ // increment iterator by offset, random-access iterators
it += n;
}
Related
I have a Visual Studio 2008 C++03 application where I have two standard containers. I would like to remove from one container all of the items that are present in the other container (the intersection of the sets).
something like this:
std::vector< int > items = /* 1, 2, 3, 4, 5, 6, 7 */;
std::set< int > items_to_remove = /* 2, 4, 5*/;
std::some_algorithm( items.begin, items.end(), items_to_remove.begin(), items_to_remove.end() );
assert( items == /* 1, 3, 6, 7 */ )
Is there an existing algorithm or pattern that will do this or do I need to roll my own?
Thanks
Try with:
items.erase(
std::remove_if(
items.begin(), items.end()
, std::bind1st(
std::mem_fun( &std::set< int >::count )
, items_to_remove
)
)
, items.end()
);
std::remove(_if) doesn't actually remove anything, since it works with iterators and not containers. What it does is reorder the elements to be removed at the end of the range, and returns an iterator to the new end of the container. You then call erase to actually remove from the container all of the elements past the new end.
Update: If I recall correctly, binding to a member function of a component of the standard library is not standard C++, as implementations are allowed to add default parameters to the function. You'd be safer by creating your own function or function-object predicate that checks whether the element is contained in the set of items to remove.
Personally, I prefer to create small helpers for this (that I reuse heavily).
template <typename Container>
class InPredicate {
public:
InPredicate(Container const& c): _c(c) {}
template <typename U>
bool operator()(U const& u) {
return std::find(_c.begin(), _c.end(), u) != _c.end();
}
private:
Container const& _c;
};
// Typical builder for automatic type deduction
template <typename Container>
InPredicate<Container> in(Container const& c) {
return InPredicate<Container>(c);
}
This also helps to have a true erase_if algorithm
template <typename Container, typename Predicate>
void erase_if(Container& c, Predicate p) {
c.erase(std::remove_if(c.begin(), c.end(), p), c.end());
}
And then:
erase_if(items, in(items_to_remove));
which is pretty readable :)
One more solution:
There is standard provided algorithm set_difference which can be used for this.
But it requires extra container to hold the result. I personally prefer to do it in-place.
std::vector< int > items;
//say items = [1,2,3,4,5,6,7,8,9]
std::set<int>items_to_remove;
//say items_to_remove = <2,4,5>
std::vector<int>result(items.size()); //as this algorithm uses output
//iterator not inserter iterator for result.
std::vector<int>::iterator new_end = std::set_difference(items.begin(),
items.end(),items_to_remove.begin(),items_to_remove.end(),result.begin());
result.erase(new_end,result.end()); // to erase unwanted elements at the
// end.
You can use std::erase in combination with std::remove for this. There is a C++ idiom called the Erase - Remove idiom, which is going to help you accomplish this.
Assuming you have two sets, A and B, and you want to remove from B, the intersection, I, of (A,B) such that I = A^B, your final results will be:
A (left intact)
B' = B-I
Full theory:
http://math.comsci.us/sets/difference.html
This is quite simple.
Create and populate A and B
Create a third intermediate vector, I
Copy the contents of B into I
For each element a_j of A, which contains j elements, search I for the element a_j; If the element is found in I, remove it
Finally, the code to remove an individual element can be found here:
How do I remove an item from a stl vector with a certain value?
And the code to search for an item is here:
How to find if an item is present in a std::vector?
Good luck!
Here's a more "hands-on" in-place method that doesn't require fancy functions nor do the vectors need to be sorted:
#include <vector>
template <class TYPE>
void remove_intersection(std::vector<TYPE> &items, const std::vector<TYPE> &items_to_remove)
{
for (int i = 0; i < (int)items_to_remove.size(); i++) {
for (int j = 0; j < (int)items.size(); j++) {
if (items_to_remove[i] == items[j]) {
items.erase(items.begin() + j);
j--;//Roll back the iterator to prevent skipping over
}
}
}
}
If you know that the multiplicity in each set is 1 (not a multiset), then you can actually replace the j--; line with a break; for better performance.
My question is twofold:
I have a vector of objects and a vector of integers, I want to iterate on my object vector in the order of the integer vector:
meaning if {water,juice,milk,vodka} is my object vector and {1,0,3,2} is my integer vector I wish to have a const iterator for my object vector that will have juice for the first object, water for the second, vodka and last milk.
is there a simple way of doing this?
suppose I have a function returning const iterator (itr) to a unknown (but accessible) vector
meaning, I can use (itr.getvalue()) but i don't have the size of the vector I'm iterating on, is there a way to make a while loop and know the end or the vector by iterator means?
Question 1:
Omitting most of the boilerplate needed for a proper iterator, the following is how it would work:
template<typename Container, typename Iterator>
class index_iterator
{
public:
typedef typename Container::value_type value_type;
index_iterator(Container& c, Iterator iter):
container(c),
iterator(iter)
{
}
value_type& operator*() { return container[*iterator]; }
index_iterator& operator++() { ++iterator; return *this; }
bool operator==(index_iterator const& other)
{
return &container == &other.container && iterator == other.iterator;
}
// ...
private:
Container& container;
Iterator iterator;
};
template<typename C, typename I>
index_iterator<C, I> indexer(C& container, I iter)
{
return index_iterator<C, I>(container, iter);
}
Then you could write e.g.
std::vector<std::string> vs;
std::vector<int> vi
// fill vs and vi
std::copy(indexer(vs, vi.begin()),
indexer(vs, vi.end()),
std::ostream_iterator<std::string>(std::cout, " "));
Question 2:
No, it isn't possible.
1
#include <iostream>
#include <vector>
std::vector<std::string> foods{"water", "juice", "milk", "vodka"};
std::vector<unsigned int> indexes{1,0,3,2};
for (int i : indexes) { // ranged-for; use normal iteration if you must
std::cout << foods[i] << " ";
}
// Output: juice water vodka milk
Live demo
If you really want to wrap this behaviour into a single iterator for foods, this can be done but it gets a bit more complicated.
2
suppose I have a function returning const iterator (itr) to a unknown (but accessible) vector meaning, I can use (itr.getvalue()) but i don't have the size of the vector I'm iterating on, is there a way to make a while loop and know the end or the vector by iterator means?
If you don't have the vector for its size, and you don't have the vector's end iterator then, no, you can't. You can't reliably iterate over anything with just one iterator; you need a pair or a distance.
Others have already covered number 1. For number 2, it basically comes down to a question of what you're willing to call an iterator. It's certainly possible to define a class that will do roughly what you're asking for -- a single object that both represents a current position and has some way of figuring out when it's been incremented as much as possible.
Most people would call that something like a range rather than an iterator though. You'd have to use it somewhat differently from a normal iterator. Most iterators are used by explicitly comparing them to another iterator representing the end of the range. In this case, you'd pass two separate positions when you created the "iterator" (one for the beginning/current position, the other for the end position) and you'd overload operator bool (for the most obvious choice) to indicate whether the current position had been incremented past the end. You'd use it something like: while (*my_iterator++) operator_on(*my_iterator); -- quite a bit different from using a normal iterator.
I wish to have a const iterator for my object vector that will have
juice for the first object
typedef std::vector<Drink> Drinks;
Drinks drinks;
drinks.push_back("water");
drinks.push_back("juice");
drinks.push_back("milk");
drinks.push_back("vodka");
Drinks::const_iterator i = drinks.begin();
const iterator (itr) to a unknown (but accessible) vector
Drinks::const_iterator itr = some_func();
while (itr != drinks.end()) {
doStuff;
++itr;
}
I have two vector<T> in my program, called active and non_active respectively. This refers to the objects it contains, as to whether they are in use or not.
I have some code that loops the active vector and checks for any objects that might have gone non active. I add these to a temp_list inside the loop.
Then after the loop, I take my temp_list and do non_active.insert of all elements in the temp_list.
After that, I do call erase on my active vector and pass it the temp_list to erase.
For some reason, however, the erase crashes.
This is the code:
non_active.insert(non_active.begin(), temp_list.begin(), temp_list.end());
active.erase(temp_list.begin(), temp_list.end());
I get this assertion:
Expression:("_Pvector == NULL || (((_Myvec*)_Pvector)->_Myfirst <= _Ptr && _Ptr <= ((_Myvect*)_Pvector)->_Mylast)",0)
I've looked online and seen that there is a erase-remove idiom, however not sure how I'd apply that to a removing a range of elements from a vector<T>
I'm not using C++11.
erase expects a range of iterators passed to it that lie within the current vector. You cannot pass iterators obtained from a different vector to erase.
Here is a possible, but inefficient, C++11 solution supported by lambdas:
active.erase(std::remove_if(active.begin(), active.end(), [](const T& x)
{
return std::find(temp_list.begin(), temp_list.end(), x) != temp_list.end();
}), active.end());
And here is the equivalent C++03 solution without the lambda:
template<typename Container>
class element_of
{
Container& container;
element_of(Container& container) : container(container) {}
public:
template<typename T>
bool operator()(const T& x) const
{
return std::find(container.begin(), container.end(), x)
!= container.end();
}
};
// ...
active.erase(std::remove_if(active.begin(), active.end(),
element_of<std::vector<T> >(temp_list)),
active.end());
If you replace temp_list with a std::set and the std::find_if with a find member function call on the set, the performance should be acceptable.
The erase method is intended to accept iterators to the same container object. You're trying to pass in iterators to temp_list to use to erase elements from active which is not allowed for good reasons, as a Sequence's range erase method is intended to specify a range in that Sequence to remove. It's important that the iterators are in that sequence because otherwise we're specifying a range of values to erase rather than a range within the same container which is a much more costly operation.
The type of logic you're trying to perform suggests to me that a set or list might be better suited for the purpose. That is, you're trying to erase various elements from the middle of a container that match a certain condition and transfer them to another container, and you could eliminate the need for temp_list this way.
With list, for example, it could be as easy as this:
for (ActiveList::iterator it = active.begin(); it != active.end();)
{
if (it->no_longer_active())
{
inactive.push_back(*it);
it = active.erase(it);
}
else
++it;
}
However, sometimes vector can outperform these solutions, and maybe you have need for vector for other reasons (like ensuring contiguous memory). In that case, std::remove_if is your best bet.
Example:
bool not_active(const YourObjectType& obj);
active_list.erase(
remove_if(active_list.begin(), active_list.end(), not_active),
active_list.end());
More info on this can be found under the topic, 'erase-remove idiom' and you may need predicate function objects depending on what external states are required to determine if an object is no longer active.
You can actually make the erase/remove idiom usable for your case. You just need to move the value over to the other container before std::remove_if possibly shuffles it around: in the predicate.
template<class OutIt, class Pred>
struct copy_if_predicate{
copy_if_predicate(OutIt dest, Pred p)
: dest(dest), pred(p) {}
template<class T>
bool operator()(T const& v){
if(pred(v)){
*dest++ = v;
return true;
}
return false;
}
OutIt dest;
Pred pred;
};
template<class OutIt, class Pred>
copy_if_predicate<OutIt,Pred> copy_if_pred(OutIt dest, Pred pred){
return copy_if_predicate<OutIt,Pred>(dest,pred);
}
Live example on Ideone. (I directly used bools to make the code shorter, not bothering with output and the likes.)
The function std::vector::erase requires the iterators to be iterators into this vector, but you are passing iterators from temp_list. You cannot erase elements from a container that are in a completely different container.
active.erase(temp_list.begin(), temp_list.end());
You try to erase elements from one list, but you use iterators for second list. First list iterators aren't the same, like in second list.
I would like to suggest that this is an example of where std::list should be used. You can splice members from one list to another. Look at std::list::splice()for this.
Do you need random access? If not then you don't need a std::vector.
Note that with list, when you splice, your iterators, and references to the objects in the list remain valid.
If you don't mind making the implementation "intrusive", your objects can contain their own iterator value, so they know where they are. Then when they change state, they can automate their own "moving" from one list to the other, and you don't need to transverse the whole list for them. (If you want this sweep to happen later, you can get them to "register" themselves for later moving).
I will write an algorithm here now to run through one collection and if a condition exists, it will effect a std::remove_if but at the same time will copy the element into your "inserter".
//fwd iterator must be writable
template< typename FwdIterator, typename InputIterator, typename Pred >
FwdIterator copy_and_remove_if( FwdIterator inp, FwdIterator end, InputIterator outp, Pred pred )
{
for( FwdIterator test = inp; test != end; ++test )
{
if( pred(*test) ) // insert
{
*outp = *test;
++outp;
}
else // keep
{
if( test != inp )
{
*inp = *test;
}
++inp;
}
}
return inp;
}
This is a bit like std::remove_if but will copy the ones being removed into an alternative collection. You would invoke it like this (for a vector) where isInactive is a valid predicate that indicates it should be moved.
active.erase( copy_and_remove_if( active.begin(), active.end(), std::back_inserter(inactive), isInactive ), active.end() );
The iterators you pass to erase() should point into the vector itself; the assertion is telling you that they don't. This version of erase() is for erasing a range out of the vector.
You need to iterate over temp_list yourself and call active.erase() on the result of dereferencing the iterator at each step.
I saw the following code used to delete one selected element from std::vector:
vector<hgCoord>::iterator it;
int iIndex = 0;
const int iSelected = 5;
for( it = vecPoints.begin(); it != vecPoints.end(); ++it, ++iIndex )
{
if( iIndex == iSelected )
{
vecPoints.erase( it );
break;
}
}
I argue that this code is not efficient and should be written as follows:
vector<hgCoord>::iterator it;
int iIndex = 0;
const int iSelected = 5; // we assume the vector has more than 5 elements.
vecPoints.erase( vecPoints.begin() + iSelected );
However, I am not sure whether or not this code is following the C++ STL standard.
To make this code generic, so it works no matter whether the iterator supports operator +, and uses the most efficient available implementation:
template <typename C>
void erase_at(C& container, typename C::size_type index) {
typename C::iterator i = container.begin();
std::advance(i, index);
container.erase(i);
}
Internally, std::advance uses operator + if the iterator type supports it. Otherwise (e.g. for std::list<>::iterator) it advances the iterator one step at a time in a loop, just like the first code you posted.
Random-access iterators support addition and subtraction, and std::vector iterators are random-access.
You argue correctly :)
That should work fine for a vector because vector iterators are random access iterators, so it's OK to add an offset as you have done. The same won't work for some other container types (such as deque or map).
So, your code is better for a vector, but the other code may have been intended to work with other types of container.
Profiling my cpu-bound code has suggested I that spend a long time checking to see if a container contains completely unique elements. Assuming that I have some large container of unsorted elements (with < and = defined), I have two ideas on how this might be done:
The first using a set:
template <class T>
bool is_unique(vector<T> X) {
set<T> Y(X.begin(), X.end());
return X.size() == Y.size();
}
The second looping over the elements:
template <class T>
bool is_unique2(vector<T> X) {
typename vector<T>::iterator i,j;
for(i=X.begin();i!=X.end();++i) {
for(j=i+1;j!=X.end();++j) {
if(*i == *j) return 0;
}
}
return 1;
}
I've tested them the best I can, and from what I can gather from reading the documentation about STL, the answer is (as usual), it depends. I think that in the first case, if all the elements are unique it is very quick, but if there is a large degeneracy the operation seems to take O(N^2) time. For the nested iterator approach the opposite seems to be true, it is lighting fast if X[0]==X[1] but takes (understandably) O(N^2) time if all the elements are unique.
Is there a better way to do this, perhaps a STL algorithm built for this very purpose? If not, are there any suggestions eek out a bit more efficiency?
Your first example should be O(N log N) as set takes log N time for each insertion. I don't think a faster O is possible.
The second example is obviously O(N^2). The coefficient and memory usage are low, so it might be faster (or even the fastest) in some cases.
It depends what T is, but for generic performance, I'd recommend sorting a vector of pointers to the objects.
template< class T >
bool dereference_less( T const *l, T const *r )
{ return *l < *r; }
template <class T>
bool is_unique(vector<T> const &x) {
vector< T const * > vp;
vp.reserve( x.size() );
for ( size_t i = 0; i < x.size(); ++ i ) vp.push_back( &x[i] );
sort( vp.begin(), vp.end(), ptr_fun( &dereference_less<T> ) ); // O(N log N)
return adjacent_find( vp.begin(), vp.end(),
not2( ptr_fun( &dereference_less<T> ) ) ) // "opposite functor"
== vp.end(); // if no adjacent pair (vp_n,vp_n+1) has *vp_n < *vp_n+1
}
or in STL style,
template <class I>
bool is_unique(I first, I last) {
typedef typename iterator_traits<I>::value_type T;
…
And if you can reorder the original vector, of course,
template <class T>
bool is_unique(vector<T> &x) {
sort( x.begin(), x.end() ); // O(N log N)
return adjacent_find( x.begin(), x.end() ) == x.end();
}
You must sort the vector if you want to quickly determine if it has only unique elements. Otherwise the best you can do is O(n^2) runtime or O(n log n) runtime with O(n) space. I think it's best to write a function that assumes the input is sorted.
template<class Fwd>
bool is_unique(In first, In last)
{
return adjacent_find(first, last) == last;
}
then have the client sort the vector, or a make a sorted copy of the vector. This will open a door for dynamic programming. That is, if the client sorted the vector in the past then they have the option to keep and refer to that sorted vector so they can repeat this operation for O(n) runtime.
The standard library has std::unique, but that would require you to make a copy of the entire container (note that in both of your examples you make a copy of the entire vector as well, since you unnecessarily pass the vector by value).
template <typename T>
bool is_unique(std::vector<T> vec)
{
std::sort(vec.begin(), vec.end());
return std::unique(vec.begin(), vec.end()) == vec.end();
}
Whether this would be faster than using a std::set would, as you know, depend :-).
Is it infeasible to just use a container that provides this "guarantee" from the get-go? Would it be useful to flag a duplicate at the time of insertion rather than at some point in the future? When I've wanted to do something like this, that's the direction I've gone; just using the set as the "primary" container, and maybe building a parallel vector if I needed to maintain the original order, but of course that makes some assumptions about memory and CPU availability...
For one thing you could combine the advantages of both: stop building the set, if you have already discovered a duplicate:
template <class T>
bool is_unique(const std::vector<T>& vec)
{
std::set<T> test;
for (typename std::vector<T>::const_iterator it = vec.begin(); it != vec.end(); ++it) {
if (!test.insert(*it).second) {
return false;
}
}
return true;
}
BTW, Potatoswatter makes a good point that in the generic case you might want to avoid copying T, in which case you might use a std::set<const T*, dereference_less> instead.
You could of course potentially do much better if it wasn't generic. E.g if you had a vector of integers of known range, you could just mark in an array (or even bitset) if an element exists.
You can use std::unique, but it requires the range to be sorted first:
template <class T>
bool is_unique(vector<T> X) {
std::sort(X.begin(), X.end());
return std::unique(X.begin(), X.end()) == X.end();
}
std::unique modifies the sequence and returns an iterator to the end of the unique set, so if that's still the end of the vector then it must be unique.
This runs in nlog(n); the same as your set example. I don't think you can theoretically guarantee to do it faster, although using a C++0x std::unordered_set instead of std::set would do it in expected linear time - but that requires that your elements be hashable as well as having operator == defined, which might not be so easy.
Also, if you're not modifying the vector in your examples, you'd improve performance by passing it by const reference, so you don't make an unnecessary copy of it.
If I may add my own 2 cents.
First of all, as #Potatoswatter remarked, unless your elements are cheap to copy (built-in/small PODs) you'll want to use pointers to the original elements rather than copying them.
Second, there are 2 strategies available.
Simply ensure there is no duplicate inserted in the first place. This means, of course, controlling the insertion, which is generally achieved by creating a dedicated class (with the vector as attribute).
Whenever the property is needed, check for duplicates
I must admit I would lean toward the first. Encapsulation, clear separation of responsibilities and all that.
Anyway, there are a number of ways depending on the requirements. The first question is:
do we have to let the elements in the vector in a particular order or can we "mess" with them ?
If we can mess with them, I would suggest keeping the vector sorted: Loki::AssocVector should get you started.
If not, then we need to keep an index on the structure to ensure this property... wait a minute: Boost.MultiIndex to the rescue ?
Thirdly: as you remarked yourself a simple linear search doubled yield a O(N2) complexity in average which is no good.
If < is already defined, then sorting is obvious, with its O(N log N) complexity.
It might also be worth it to make T Hashable, because a std::tr1::hash_set could yield a better time (I know, you need a RandomAccessIterator, but if T is Hashable then it's easy to have T* Hashable to ;) )
But in the end the real issue here is that our advises are necessary generic because we lack data.
What is T, do you intend the algorithm to be generic ?
What is the number of elements ? 10, 100, 10.000, 1.000.000 ? Because asymptotic complexity is kind of moot when dealing with a few hundreds....
And of course: can you ensure unicity at insertion time ? Can you modify the vector itself ?
Well, your first one should only take N log(N), so it's clearly the better worse case scenario for this application.
However, you should be able to get a better best case if you check as you add things to the set:
template <class T>
bool is_unique3(vector<T> X) {
set<T> Y;
typename vector<T>::const_iterator i;
for(i=X.begin(); i!=X.end(); ++i) {
if (Y.find(*i) != Y.end()) {
return false;
}
Y.insert(*i);
}
return true;
}
This should have O(1) best case, O(N log(N)) worst case, and average case depends on the distribution of the inputs.
If the type T You store in Your vector is large and copying it is costly, consider creating a vector of pointers or iterators to Your vector elements. Sort it based on the element pointed to and then check for uniqueness.
You can also use the std::set for that. The template looks like this
template <class Key,class Traits=less<Key>,class Allocator=allocator<Key> > class set
I think You can provide appropriate Traits parameter and insert raw pointers for speed or implement a simple wrapper class for pointers with < operator.
Don't use the constructor for inserting into the set. Use insert method. The method (one of overloads) has a signature
pair <iterator, bool> insert(const value_type& _Val);
By checking the result (second member) You can often detect the duplicate much quicker, than if You inserted all elements.
In the (very) special case of sorting discrete values with a known, not too big, maximum value N.
You should be able to start a bucket sort and simply check that the number of values in each bucket is below 2.
bool is_unique(const vector<int>& X, int N)
{
vector<int> buckets(N,0);
typename vector<int>::const_iterator i;
for(i = X.begin(); i != X.end(); ++i)
if(++buckets[*i] > 1)
return false;
return true;
}
The complexity of this would be O(n).
Using the current C++ standard containers, you have a good solution in your first example. But if you can use a hash container, you might be able to do better, as the hash set will be nO(1) instead of nO(log n) for a standard set. Of course everything will depend on the size of n and your particular library implementation.