Related
I need to construct a chain of pair of numbers where:
In each pair, the first one is smaller than the second
In order to form a chain between two consecutive nodes, they must have one number in common. In other words, the link (a,b) -- (c,d) can be made if and only if either a==c, b==c, a==d or b==d
A pair cannot be made of the same number. In other words, if (a,b) exists, then a!=b
This may look like a Longest increasing subsequence but I actually want to chain consecutive pairs that have one equal member.
Example:
Initial list (unordered):
(0,1)
(2,3)
(1,6)
(4,6)
(8,9)
(2,8)
Result:
----- chain #1
(0,1)
(1,6)
(4,6)
----- chain #2
(2,3)
(2,8)
(8,9)
I could do an algorithm that will iterate over the entire list for each cell (O(n^2)), but I want to make it faster and I have the flexibility of ordering my initial array in any way I want (std::set, std::map, std::unordered_map, etc.). My list is made of tens of thousands of pairs so I need an efficient solution in terms of processing time.
You can solve it in O(N * log(N)) when you manage two lists, one sorted with respect to first the other sorted with respect to second.
The code has some duplication that I didnt bother to clean up yet.
#include <iostream>
#include <list>
#include <algorithm>
#include <tuple>
#include <any>
struct pair_and_iter {
int first;
int second;
std::any other_iter;
};
struct compare_first {
bool operator()(int x,pair_and_iter p){ return x < p.first; }
bool operator()(pair_and_iter p, int x){ return p.first < x; }
};
struct compare_second {
bool operator()(int x,pair_and_iter p){ return x < p.second; }
bool operator()(pair_and_iter p, int x){ return p.second < x; }
};
template <typename Iter,typename Comp>
Iter my_find(Iter first,Iter last,int x, Comp comp) {
auto it = std::lower_bound(first,last,x,comp);
if (it != last && (!comp(x,*it) && !comp(*it,x))){
return it;
} else {
return last;
}
}
int main() {
std::list<pair_and_iter> a {{0,1},{2,3},{1,6},{4,6},{8,9},{2,8}};
std::list<pair_and_iter> b;
for (auto it = a.begin(); it != a.end(); ++it){
b.push_back({it->first,it->second,it});
it->other_iter = std::prev(b.end());
}
a.sort([](const auto& x,const auto& y){
return std::tie(x.first,x.second) < std::tie(y.first,y.second); });
b.sort([](const auto& x,const auto& y){
return std::tie(x.second,x.first) < std::tie(y.second,y.first); });
std::vector<std::vector<pair_and_iter>> result;
std::vector<pair_and_iter> current_result;
current_result.push_back(a.front());
auto current = current_result.begin();
b.erase(std::any_cast<std::list<pair_and_iter>::iterator>(current->other_iter));
a.erase(a.begin());
while (a.size() && b.size()) {
// look for an element with same first
auto it = my_find(a.begin(),a.end(),current->first,compare_first{});
if (it == a.end()) {
// look for element where current->second == elem.first
it = my_find(a.begin(),a.end(),current->second,compare_first{});
}
if (it != a.end()){
current_result.push_back(*it);
current = std::prev(current_result.end());
b.erase(std::any_cast<std::list<pair_and_iter>::iterator>(it->other_iter));
a.erase(it);
continue;
}
// look for element with current->first == elem.second
it = my_find(b.begin(),b.end(),current->first,compare_second{});
if (it == b.end()) {
// look for element with same second
it = my_find(b.begin(),b.end(),current->second,compare_second{});
}
if (it != b.end()) {
current_result.push_back(*it);
current = std::prev(current_result.end());
a.erase(std::any_cast<std::list<pair_and_iter>::iterator>(it->other_iter));
b.erase(it);
continue;
}
// no matching element found
result.push_back(current_result);
current_result.clear();
current_result.push_back(a.front());
current = current_result.begin();
b.erase(std::any_cast<std::list<pair_and_iter>::iterator>(current->other_iter));
a.erase(a.begin());
}
result.push_back(current_result);
for (const auto& chain : result){
for (const auto& elem : chain){
std::cout << elem.first << " " << elem.second << "\n";
}
std::cout << "\n";
}
}
Output:
0 1
1 6
4 6
2 3
2 8
8 9
I used std::list because it has stable iterators and constant time erase. std::any for type erasure because each list contains iterators to the other list. a is sorted with respect to first and b is sorted with respect to second. Hence std::lower_bound can be used to to find a match in O(logN). A single linear search is traded against 2 binary searchs to find either current->first or current->second in a first of a and 2 binary searchs to find either current->first or current->second in a second of b. In total it is O(N log(N)) for sorting plus O( log(N) + log(N-1) + log(N-2) + .... log(1)) which equals O(log( n! )) if I am not mistaken.
PS: You didn't mention that you are looking for a longest chain, and this algorithm is not finding the longest chain. It just picks the first element of the remaining ones and uses the next element it finds to continue the chain.
I need to implement the following datastructure for my project. I have a relation of
const MyClass*
to
uint64_t
For every pointer I want to save a counter connected to it, which can be changed over time (in fact only incremented). This would be no problem, I could simply store it in a std::map. The problem is that I need fast access to the pointers which have the highest values.
That is why I came to the conclusion to use a boost::bimap. It is defined is follows for my project:
typedef boost::bimaps::bimap<
boost::bimaps::unordered_set_of< const MyClass* >,
boost::bimaps::multiset_of< uint64_t, std::greater<uint64_t> >
> MyBimap;
MyBimap bimap;
This would work fine, but am I right that I can not modify the uint64_t on pair which were inserted once? The documentation says that multiset_of is constant and therefore I cannot change a value of pair in the bimap.
What can I do? What would be the correct way to change the value of one key in this bimap? Or is there a simpler data structure possible for this problem?
Here's a simple hand-made solution.
Internally it keeps a map to store the counts indexed by object pointer, and a further multi-set of iterators, ordered by descending count of their pointees.
Whenever you modify a count, you must re-index. I have done this piecemeal, but you could do it as a batch update, depending on requirements.
Note that in c++17 there is a proposed splice operation for sets and maps, which would make the re-indexing extremely fast.
#include <map>
#include <set>
#include <vector>
struct MyClass { };
struct store
{
std::uint64_t add_value(MyClass* p, std::uint64_t count = 0)
{
add_index(_map.emplace(p, count).first);
return count;
}
std::uint64_t increment(MyClass* p)
{
auto it = _map.find(p);
if (it == std::end(_map)) {
// in this case, we'll create one - we could throw instead
return add_value(p, 1);
}
else {
remove_index(it);
++it->second;
add_index(it);
return it->second;
}
}
std::uint64_t query(MyClass* p) const {
auto it = _map.find(p);
if (it == std::end(_map)) {
// in this case, we'll create one - we could throw instead
return 0;
}
else {
return it->second;
}
}
std::vector<std::pair<MyClass*, std::uint64_t>> top_n(std::size_t n)
{
std::vector<std::pair<MyClass*, std::uint64_t>> result;
result.reserve(n);
for (auto idx = _value_index.begin(), idx_end = _value_index.end() ;
n && idx != idx_end ;
++idx, --n) {
result.emplace_back((*idx)->first, (*idx)->second);
}
return result;
}
private:
using map_type = std::map<MyClass*, std::uint64_t>;
struct by_count
{
bool operator()(map_type::const_iterator l, map_type::const_iterator r) const {
// note: greater than orders by descending count
return l->second > r->second;
}
};
using value_index_type = std::multiset<map_type::iterator, by_count>;
void add_index(map_type::iterator iter)
{
_value_index.emplace(iter->second, iter);
}
void remove_index(map_type::iterator iter)
{
for(auto range = _value_index.equal_range(iter);
range.first != range.second;
++range.first)
{
if (*range.first == iter) {
_value_index.erase(range.first);
return;
}
}
}
map_type _map;
value_index_type _value_index;
};
What's the most widely used existing library in C++ to give all the combination and permutation of k elements out of n elements?
I am not asking the algorithm but the existing library or methods.
Thanks.
I decided to test the solutions by dman and Charles Bailey here. I'll call them solutions A and B respectively. My test is visiting each combination of of a vector<int> size = 100, 5 at a time. Here's the test code:
Test Code
struct F
{
unsigned long long count_;
F() : count_(0) {}
bool operator()(std::vector<int>::iterator, std::vector<int>::iterator)
{++count_; return false;}
};
int main()
{
typedef std::chrono::high_resolution_clock Clock;
typedef std::chrono::duration<double> sec;
typedef std::chrono::duration<double, std::nano> ns;
int n = 100;
std::vector<int> v(n);
std::iota(v.begin(), v.end(), 0);
std::vector<int>::iterator r = v.begin() + 5;
F f;
Clock::time_point t0 = Clock::now();
do
{
f(v.begin(), r);
} while (next_combination(v.begin(), r, v.end()));
Clock::time_point t1 = Clock::now();
sec s0 = t1 - t0;
ns pvt0 = s0 / f.count_;
std::cout << "N = " << v.size() << ", r = " << r-v.begin()
<< ", visits = " << f.count_ << '\n'
<< "\tnext_combination total = " << s0.count() << " seconds\n"
<< "\tnext_combination per visit = " << pvt0.count() << " ns";
}
All code was compiled using clang++ -O3 on a 2.8 GHz Intel Core i5.
Solution A
Solution A results in an infinite loop. Even when I make n very small, this program never completes. Subsequently downvoted for this reason.
Solution B
This is an edit. Solution B changed in the course of writing this answer. At first it gave incorrect answers and due to very prompt updating it now gives correct answers. It prints out:
N = 100, r = 5, visits = 75287520
next_combination total = 4519.84 seconds
next_combination per visit = 60034.3 ns
Solution C
Next I tried the solution from N2639 which looks very similar to solution A, but works correctly. I'll call this solution C and it prints out:
N = 100, r = 5, visits = 75287520
next_combination total = 6.42602 seconds
next_combination per visit = 85.3531 ns
Solution C is 703 times faster than solution B.
Solution D
Finally there is a solution D found here. This solution has a different signature / style and is called for_each_combination, and is used much like std::for_each. The driver code above changes between the timer calls like so:
Clock::time_point t0 = Clock::now();
f = for_each_combination(v.begin(), r, v.end(), f);
Clock::time_point t1 = Clock::now();
Solution D prints out:
N = 100, r = 5, visits = 75287520
for_each_combination = 0.498979 seconds
for_each_combination per visit = 6.62765 ns
Solution D is 12.9 times faster than solution C, and over 9000 times faster than solution B.
I consider this a relatively small problem: only 75 million visits. As the number of visits increases into the billions, the discrepancy in the performance between these algorithms continues to grow. Solution B is already unwieldy. Solution C eventually becomes unwieldy. Solution D is the highest performing algorithm to visit all combinations I'm aware of.
The link showing solution D also contains several other algorithms for enumerating and visiting permutations with various properties (circular, reversible, etc.). Each of these algorithms was designed with performance as one of the goals. And note that none of these algorithms requires the initial sequence to be in sorted order. The elements need not even be LessThanComparable.
Combinations: from Mark Nelson's article on the same topic we have next_combination Permutations: From STL we have std::next_permutation
template <typename Iterator>
inline bool next_combination(const Iterator first, Iterator k, const Iterator last)
{
if ((first == last) || (first == k) || (last == k))
return false;
Iterator itr1 = first;
Iterator itr2 = last;
++itr1;
if (last == itr1)
return false;
itr1 = last;
--itr1;
itr1 = k;
--itr2;
while (first != itr1)
{
if (*--itr1 < *itr2)
{
Iterator j = k;
while (!(*itr1 < *j)) ++j;
std::iter_swap(itr1,j);
++itr1;
++j;
itr2 = k;
std::rotate(itr1,j,last);
while (last != j)
{
++j;
++itr2;
}
std::rotate(k,itr2,last);
return true;
}
}
std::rotate(first,k,last);
return false;
}
This answer provides a minimal implementation effort solution. It may not have acceptable performance if you want to retrieve combinations for large input ranges.
The standard library has std::next_permutation and you can trivially build a next_k_permutation from it and a next_combination from that.
template<class RandIt, class Compare>
bool next_k_permutation(RandIt first, RandIt mid, RandIt last, Compare comp)
{
std::sort(mid, last, std::tr1::bind(comp, std::tr1::placeholders::_2
, std::tr1::placeholders::_1));
return std::next_permutation(first, last, comp);
}
If you don't have tr1::bind or boost::bind you would need to build a function object that swaps the arguments to a given comparison. Of course, if you're only interested in a std::less variant of next_combination then you can use std::greater directly:
template<class RandIt>
bool next_k_permutation(RandIt first, RandIt mid, RandIt last)
{
typedef typename std::iterator_traits<RandIt>::value_type value_type;
std::sort(mid, last, std::greater< value_type >());
return std::next_permutation(first, last);
}
This is a relatively safe version of next_combination. If you can guarantee that the range [mid, last) is in order as they would be after a call to next_combination then you can use the simpler:
template<class BiDiIt, class Compare>
bool next_k_permutation(BiDiIt first, BiDiIt mid, BiDiIt last, Compare comp)
{
std::reverse(mid, last);
return std::next_permutation(first, last, comp);
}
This also works with bi-directional iterators as well as random access iterators.
To output combinations instead of k-permutations, we have to ensure that we output each combination only once, so we'll return a combination it only if it is a k-permutation in order.
template<class BiDiIt, class Compare>
bool next_combination(BiDiIt first, BiDiIt mid, BiDiIt last, Compare comp)
{
bool result;
do
{
result = next_k_permutation(first, mid, last, comp);
} while (std::adjacent_find( first, mid,
std::tr1::bind(comp, std::tr1::placeholders::_2
, std::tr1::placeholders::_1) )
!= mid );
return result;
}
Alternatives would be to use a reverse iterator instead of the parameter swapping bind call or to use std::greater explicitly if std::less is the comparison being used.
# Charles Bailey above:
I could be wrong, but I think the first two algorithms above does not remove duplicates between first and mid? Maybe I am not sure how to use it.
4 choose 2 example:
12 34
12 43 (after sort)
13 24 (after next_permutation)
13 42 (after sort)
14 23 (after next_permutation)
14 32 (after sort)
21 34 (after next_permutation)
So I added a check to see if the value in italics is in order before returning, but definitely wouldn't have thought of the part you wrote though (very elegant! thanks!).
Not fully tested, just cursory tests..
template
bool next_combination(RandIt first, RandIt mid, RandIt last)
{
typedef typename std::iterator_traits< RandIt >::value_type value_type;
std::sort(mid, last, std::greater< value_type >() );
while(std::next_permutation(first, last)){
if(std::adjacent_find(first, mid, std::greater< value_type >() ) == mid){
return true;
}
std::sort(mid, last, std::greater< value_type >() );
return false;
}
Maybe it's already stated within the previous answers, but here I cannot find a full generic way for this with respect to the parameter types and I also didn't find it within existing library routines besides Boost. This is a generic way I needed during test case construction for scenarios with a wide spread of various parameter variations. Maybe it's helpful to you too, at least for similar scenarios. (Usable for permutation and combination with minor changes in doubt)
#include <vector>
#include <memory>
class SingleParameterToVaryBase
{
public:
virtual bool varyNext() = 0;
virtual void reset() = 0;
};
template <typename _DataType, typename _ParamVariationContType>
class SingleParameterToVary : public SingleParameterToVaryBase
{
public:
SingleParameterToVary(
_DataType& param,
const _ParamVariationContType& valuesToVary) :
mParameter(param)
, mVariations(valuesToVary)
{
if (mVariations.empty())
throw std::logic_error("Empty variation container for parameter");
reset();
}
// Step to next parameter value, return false if end of value vector is reached
virtual bool varyNext() override
{
++mCurrentIt;
const bool finished = mCurrentIt == mVariations.cend();
if (finished)
{
return false;
}
else
{
mParameter = *mCurrentIt;
return true;
}
}
virtual void reset() override
{
mCurrentIt = mVariations.cbegin();
mParameter = *mCurrentIt;
}
private:
typedef typename _ParamVariationContType::const_iterator ConstIteratorType;
// Iterator to the actual values this parameter can yield
ConstIteratorType mCurrentIt;
_ParamVariationContType mVariations;
// Reference to the parameter itself
_DataType& mParameter;
};
class GenericParameterVariator
{
public:
GenericParameterVariator() : mFinished(false)
{
reset();
}
template <typename _ParameterType, typename _ParameterVariationsType>
void registerParameterToVary(
_ParameterType& param,
const _ParameterVariationsType& paramVariations)
{
mParametersToVary.push_back(
std::make_unique<SingleParameterToVary<_ParameterType, _ParameterVariationsType>>(
param, paramVariations));
}
const bool isFinished() const { return mFinished; }
void reset()
{
mFinished = false;
mNumTotalCombinationsVisited = 0;
for (const auto& upParameter : mParametersToVary)
upParameter->reset();
}
// Step into next state if possible
bool createNextParameterPermutation()
{
if (mFinished || mParametersToVary.empty())
return false;
auto itPToVary = mParametersToVary.begin();
while (itPToVary != mParametersToVary.end())
{
const auto& upParameter = *itPToVary;
// If we are the very first configuration at all, do not vary.
const bool variedSomething = mNumTotalCombinationsVisited == 0 ? true : upParameter->varyNext();
++mNumTotalCombinationsVisited;
if (!variedSomething)
{
// If we were not able to vary the last parameter in our list, we are finished.
if (std::next(itPToVary) == mParametersToVary.end())
{
mFinished = true;
return false;
}
++itPToVary;
continue;
}
else
{
if (itPToVary != mParametersToVary.begin())
{
// Reset all parameters before this one
auto itBackwd = itPToVary;
do
{
--itBackwd;
(*itBackwd)->reset();
} while (itBackwd != mParametersToVary.begin());
}
return true;
}
}
return true;
}
private:
// Linearized parameter set
std::vector<std::unique_ptr<SingleParameterToVaryBase>> mParametersToVary;
bool mFinished;
size_t mNumTotalCombinationsVisited;
};
Possible usage:
GenericParameterVariator paramVariator;
size_t param1;
int param2;
char param3;
paramVariator.registerParameterToVary(param1, std::vector<size_t>{ 1, 2 });
paramVariator.registerParameterToVary(param2, std::vector<int>{ -1, -2 });
paramVariator.registerParameterToVary(param3, std::vector<char>{ 'a', 'b' });
std::vector<std::tuple<size_t, int, char>> visitedCombinations;
while (paramVariator.createNextParameterPermutation())
visitedCombinations.push_back(std::make_tuple(param1, param2, param3));
Generates:
(1, -1, 'a')
(2, -1, 'a')
(1, -2, 'a')
(2, -2, 'a')
(1, -1, 'b')
(2, -1, 'b')
(1, -2, 'b')
(2, -2, 'b')
For sure, this can be further optimized/specialized. For instance you can simply add a hashing scheme and/or an avoid functor if you want to avoid effective repetitions. Also, since the parameters are held as references, one might consider to protect the generator from possible error-prone usage via deleting copy/assignement constructors and operators.
Time complexity is within the theoretical permutation complexity range.
Well I think the question pretty much sums it up. I have a forward_list of unique items, and want to remove a single item from it:
std::forward_list<T> mylist;
// fill with stuff
mylist.remove_if([](T const& value)
{
return value == condition;
});
I mean, this method works fine but it's inefficient because it continues to search once the item is found and deleted. Is there a better way or do I need to do it manually?
If you only want to remove the first match, you can use std::adjacent_find followed by the member erase_after
#include <algorithm>
#include <cassert>
#include <forward_list>
#include <iostream>
#include <ios>
#include <iterator>
// returns an iterator before first element equal to value, or last if no such element is present
// pre-condition: before_first is incrementable and not equal to last
template<class FwdIt, class T>
FwdIt find_before(FwdIt before_first, FwdIt last, T const& value)
{
assert(before_first != last);
auto first = std::next(before_first);
if (first == last) return last;
if (*first == value) return before_first;
return std::adjacent_find(first, last, [&](auto const&, auto const& R) {
return R == value;
});
}
int main()
{
auto e = std::forward_list<int>{};
std::cout << std::boolalpha << (++e.before_begin() == end(e)) << "\n";
std::cout << (find_before(e.before_begin(), end(e), 0) == end(e)) << "\n";
auto s = std::forward_list<int>{ 0 };
std::cout << (find_before(s.before_begin(), end(s), 0) == s.before_begin()) << "\n";
auto d = std::forward_list<int>{ 0, 1 };
std::cout << (find_before(d.before_begin(), end(d), 0) == d.before_begin()) << "\n";
std::cout << (find_before(d.before_begin(), end(d), 1) == begin(d)) << "\n";
std::cout << (find_before(d.before_begin(), end(d), 2) == end(d)) << "\n";
// erase after
auto m = std::forward_list<int>{ 1, 2, 3, 4, 1, 3, 5 };
auto it = find_before(m.before_begin(), end(m), 3);
if (it != end(m))
m.erase_after(it);
std::copy(begin(m), end(m), std::ostream_iterator<int>(std::cout, ","));
}
Live Example
This will stop as soon as a match is found. Note that the adjacent_find takes a binary predicate, and by comparing only the second argument, we get an iterator before the element we want to remove, so that erase_after can actually remove it. Complexity is O(N) so you won't get it more efficient than this.
FWIW, here's another short version
template< typename T, class Allocator, class Predicate >
bool remove_first_if( std::forward_list< T, Allocator >& list, Predicate pred )
{
auto oit = list.before_begin(), it = std::next( oit );
while( it != list.end() ) {
if( pred( *it ) ) { list.erase_after( oit ); return true; }
oit = it++;
}
return false;
}
Going to have to roll your own...
template <typename Container, typename Predicate>
void remove_first_of(Container& container, Predicate p)
{
auto it = container.before_begin();
for (auto nit = std::next(it); ; it = nit, nit = std::next(it))
{
if (nit == container.end())
return;
if (p(*nit))
{
container.erase_after(it);
return;
}
}
}
A more complete example...
There is nothing in the standard library which would be directly applicable. Actually, there is. See #TemplateRex's answer for that.
You can also write this yourself (especially if you want to combine the search with the erasure), something like this:
template <class T, class Allocator, class Predicate>
bool remove_first_if(std::forward_list<T, Allocator> &list, Predicate pred)
{
auto itErase = list.before_begin();
auto itFind = list.begin();
const auto itEnd = list.end();
while (itFind != itEnd) {
if (pred(*itFind)) {
list.erase_after(itErase);
return true;
} else {
++itErase;
++itFind;
}
}
return false;
}
This kind of stuff used to be a standard exercise when I learned programming way back in the early '80s. It might be interesting to to recall the solution, and compare that with what one can do in C++. Actually that was in Algol 68, but I won't impose that on you and give the translation into C. Given
typedef ... T;
typedef struct node *link;
struct node { link next; T data; };
one could write, realising that one needs to pass the address of the list head pointer if is to be possible to unlink the first node:
void search_and_destroy(link *p_addr, T y)
{
while (*p_addr!=NULL && (*p_addr)->data!=y)
p_addr = &(*p_addr)->next;
if (*p_addr!=NULL)
{
link old = *p_addr;
*p_addr = old->next; /* unlink node */
free(old); /* and free memory */
}
}
There are a lot of occurrences of *p_addr there; it is the last one, where it is the LHS of an assignment, that is the reason one needs the address of a pointer here in the first place. Note that in spite of the apparent complication, the statement p_addr = &(*p_addr)->next; is just replacing a pointer by the value it points to, and then adding an offset (which is 0 here).
One could introduce an auxiliary pointer value to lighten the code a bit up, as follows
void search_and_destroy(link *p_addr, T y)
{
link p=*p_addr;
while (p!=NULL && p->data!=y)
p=*(p_addr = &p->next);
if (p!=NULL)
{
*p_addr = p->next;
free(p);
}
}
but that is fundamentally the same code: any decent compiler should realise that the pointer value *p_addr is used multiple times in succession in the first example, and keep it in a register.
Now with std::forward_list<T>, we are not allowed access to the pointers that link the nodes, and get those awkward "iterators pointing one node before the real action" instead. Our solution becomes
void search_and_destroy(std::forward_list<T> list, T y)
{
std::forward_list<T>::iterator it = list.before_begin();
const std::forward_list<T>::iterator NIL = list.end();
while (std::next(it)!=NIL && *std::next(it)!=y)
++it;
if (std::next(it)!=NIL)
list.erase_after(it);
}
Again we could keep a second iterator variable to hold std::next(it) without having to spell it out each time (not forgetting to refresh its value when we increment it) and arrive at essentially the answer by Daniel Frey. (We could instead try to make that variable a pointer of type *T equal to &*std::next(it) instead, which suffices for the use we make of it, but it would actually be a bit of a hassle to ensure it becomes the null pointer when std::next(it)==NIL, as the standard will not let us take &*NIL).
I cannot help feel that since the old days the solution to this problem has not become more elegant.
I've a std::vector<int> and I need to remove all elements at given indexes (the vector usually has high dimensionality). I would like to know, which is the most efficient way to do such an operation having in mind that the order of the original vector should be preserved.
Although, I found related posts on this issue, some of them needed to remove one single element or multiple elements where the remove-erase idiom seemed to be a good solution.
In my case, however, I need to delete multiple elements and since I'm using indexes instead of direct values, the remove-erase idiom can't be applied, right?
My code is given below and I would like to know if it's possible to do better than that in terms of efficiency?
bool find_element(const vector<int> & vMyVect, int nElem){
return (std::find(vMyVect.begin(), vMyVect.end(), nElem)!=vMyVect.end()) ? true : false;
}
void remove_elements(){
srand ( time(NULL) );
int nSize = 20;
std::vector<int> vMyValues;
for(int i = 0; i < nSize; ++i){
vMyValues.push_back(i);
}
int nRandIdx;
std::vector<int> vMyIndexes;
for(int i = 0; i < 6; ++i){
nRandIdx = rand() % nSize;
vMyIndexes.push_back(nRandIdx);
}
std::vector<int> vMyResult;
for(int i=0; i < (int)vMyValues.size(); i++){
if(!find_element(vMyIndexes,i)){
vMyResult.push_back(vMyValues[i]);
}
}
}
I think it could be more efficient, if you just just sort your indices and then delete those elements from your vector from the highest to the lowest. Deleting the highest index on a list will not invalidate the lower indices you want to delete, because only the elements higher than the deleted ones change their index.
If it is really more efficient will depend on how fast the sorting is. One more pro about this solultion is, that you don't need a copy of your value vector, you can work directly on the original vector. code should look something like this:
... fill up the vectors ...
sort (vMyIndexes.begin(), vMyIndexes.end());
for(int i=vMyIndexes.size() - 1; i >= 0; i--){
vMyValues.erase(vMyValues.begin() + vMyIndexes[i])
}
to avoid moving the same elements many times, we can move them by ranges between deleted indexes
// fill vMyIndexes, take care about duplicated values
vMyIndexes.push_back(-1); // to handle range from 0 to the first index to remove
vMyIndexes.push_back(vMyValues.size()); // to handle range from the last index to remove and to the end of values
std::sort(vMyIndexes.begin(), vMyIndexes.end());
std::vector<int>::iterator last = vMyValues.begin();
for (size_t i = 1; i != vMyIndexes.size(); ++i) {
size_t range_begin = vMyIndexes[i - 1] + 1;
size_t range_end = vMyIndexes[i];
std::copy(vMyValues.begin() + range_begin, vMyValues.begin() + range_end, last);
last += range_end - range_begin;
}
vMyValues.erase(last, vMyValues.end());
P.S. fixed a bug, thanks to Steve Jessop that patiently tried to show me it
Erase-remove multiple elements at given indices
Update: after the feedback on performance from #kory, I've modified the algorithm not to use flagging and move/copy elements in chunks (not one-by-one).
Notes:
indices need to be sorted and unique
uses std::move (replace with std::copy for c++98):
Github
Live example
Code:
template <class ForwardIt, class SortUniqIndsFwdIt>
inline ForwardIt remove_at(
ForwardIt first,
ForwardIt last,
SortUniqIndsFwdIt ii_first,
SortUniqIndsFwdIt ii_last)
{
if(ii_first == ii_last) // no indices-to-remove are given
return last;
typedef typename std::iterator_traits<ForwardIt>::difference_type diff_t;
typedef typename std::iterator_traits<SortUniqIndsFwdIt>::value_type ind_t;
ForwardIt destination = first + static_cast<diff_t>(*ii_first);
while(ii_first != ii_last)
{
// advance to an index after a chunk of elements-to-keep
for(ind_t cur = *ii_first++; ii_first != ii_last; ++ii_first)
{
const ind_t nxt = *ii_first;
if(nxt - cur > 1)
break;
cur = nxt;
}
// move the chunk of elements-to-keep to new destination
const ForwardIt source_first =
first + static_cast<diff_t>(*(ii_first - 1)) + 1;
const ForwardIt source_last =
ii_first != ii_last ? first + static_cast<diff_t>(*ii_first) : last;
std::move(source_first, source_last, destination);
// std::copy(source_first, source_last, destination) // c++98 version
destination += source_last - source_first;
}
return destination;
}
Usage example:
std::vector<int> v = /*...*/; // vector to remove elements from
std::vector<int> ii = /*...*/; // indices of elements to be removed
// prepare indices
std::sort(ii.begin(), ii.end());
ii.erase(std::unique(ii.begin(), ii.end()), ii.end());
// remove elements at indices
v.erase(remove_at(v.begin(), v.end(), ii.begin(), ii.end()), v.end());
What you can do is split the vector (actually any non-associative container) in two
groups, one corresponding to the indices to be erased and one containing the rest.
template<typename Cont, typename It>
auto ToggleIndices(Cont &cont, It beg, It end) -> decltype(std::end(cont))
{
int helpIndx(0);
return std::stable_partition(std::begin(cont), std::end(cont),
[&](typename Cont::value_type const& val) -> bool {
return std::find(beg, end, helpIndx++) != end;
});
}
you can then delete from (or up to) the split point to erase (keep only)
the elements corresponding to the indices
std::vector<int> v;
v.push_back(0);
v.push_back(1);
v.push_back(2);
v.push_back(3);
v.push_back(4);
v.push_back(5);
int ar[] = { 2, 0, 4 };
v.erase(ToggleIndices(v, std::begin(ar), std::end(ar)), v.end());
If the 'keep only by index' operation is not needed you can use remove_if insted of stable_partition (O(n) vs O(nlogn) complexity)
To work for C arrays as containers the lambda function should be
[&](decltype(*(std::begin(cont))) const& val) -> bool
{ return std::find(beg, end, helpIndx++) != end; }
but then the .erase() method is no longer an option
If you want to ensure that every element is only moved once, you can simply iterate through each element, copy those that are to remain into a new, second container, do not copy the ones you wish to remove, and then delete the old container and replace it with the new one :)
This is an algorithm based on Andriy Tylychko's answer so that this can make it easier and faster to use the answer, without having to pick it apart. It also removes the need to have -1 at the beginning of the indices list and a number of items at the end. Also some debugging code to make sure the indices are valid (sorted and valid index into items).
template <typename Items_it, typename Indices_it>
auto remove_indices(
Items_it items_begin, Items_it items_end
, Indices_it indices_begin, Indices_it indices_end
)
{
static_assert(
std::is_same_v<std::random_access_iterator_tag
, typename std::iterator_traits<Items_it>::iterator_category>
, "Can't remove items this way unless Items_it is a random access iterator");
size_t indices_size = std::distance(indices_begin, indices_end);
size_t items_size = std::distance(items_begin, items_end);
if (indices_size == 0) {
// Nothing to erase
return items_end;
}
// Debug check to see if the indices are already sorted and are less than
// size of items.
assert(indices_begin[0] < items_size);
assert(std::is_sorted(indices_begin, indices_end));
auto last = items_begin;
auto shift = [&last, &items_begin](size_t range_begin, size_t range_end) {
std::copy(items_begin + range_begin, items_begin + range_end, last);
last += range_end - range_begin;
};
size_t last_index = -1;
for (size_t i = 0; i != indices_size; ++i) {
shift(last_index + 1, indices_begin[i]);
last_index = indices_begin[i];
}
shift(last_index + 1, items_size);
return last;
}
Here is an example of usage:
template <typename T>
std::ostream& operator<<(std::ostream& os, std::vector<T>& v)
{
for (auto i : v) {
os << i << " ";
}
os << std::endl;
return os;
}
int main()
{
using std::begin;
using std::end;
std::vector<int> items = { 1, 3, 6, 8, 13, 17 };
std::vector<int> indices = { 0, 1, 2, 3, 4 };
std::cout << items;
items.erase(
remove_indices(begin(items), end(items), begin(indices), end(indices))
, std::end(items)
);
std::cout << items;
return 0;
}
Output:
1 3 6 8 13 17
17
The headers required are:
#include <iterator>
#include <vector>
#include <iostream> // only needed for output
#include <cassert>
#include <type_traits>
And a Demo can be found on godbolt.org.