Pick out the least recurring number in an array - c++

I need help picking out the least recurring element in an array. I can't think of any robust algorithm, is there any function defined in the c++ library that does that?
If there is an algorithm that you can come up with, please share. Not the code necessarily, but the idea
'Define least recurring' - suppose an array say a[4] holds 2,2,2,4. 4 is the least recurring element

Uses some C++14 features for brevity but easily adapted to C++11:
#include <algorithm>
#include <iostream>
#include <iterator>
#include <unordered_map>
using namespace std;
template <typename I>
auto leastRecurring(I first, I last) {
unordered_map<iterator_traits<I>::value_type, size_t> counts;
for_each(first, last, [&counts](auto e) { ++counts[e]; });
return min_element(begin(counts), end(counts), [](auto x, auto y) { return x.second < y.second; })->first;
}
int main() {
const int a[] = {2, 2, 2, 3, 3, 4};
cout << leastRecurring(begin(a), end(a)) << endl;
}

Using only std goodies (live demo on Coliru):
// Your original vector
auto original = { 2, 2, 2, 4, 4 };
// Sort numbers and remove duplicates (in a copy, because std::unique modifies the contents)
std::vector<int> uniques(original);
std::sort(std::begin(uniques), std::end(uniques));
auto end = std::unique(std::begin(uniques), std::end(uniques));
// Count occurences of each number in the original vector
// The key is the number of occurences of a number, the value is the number
std::map<int, int> population;
for (auto i = uniques.begin(); i != end; ++i) {
population.emplace(std::count(std::begin(original), std::end(original), *i), *i);
}
// The map is sorted by key, therefore the first element is the least recurring
std::cout << population.begin()->second;
Note that in the example you gave, the array is already sorted. If you know that this will always be the case, you can get rid of the call to std::sort.
If two numbers have the same population count, the greater one will be kept.

from collections import Counter
def leastFrequentToken(tokens):
counted = Counter(tokens)
leastFrequent = min(counted, key=counted.get)
return leastFrequent
Essentially, create a map of token:count, find the smallest value in the map and return its key.
Assuming the 'numbers' are ints:
// functor to compare k,v pair on value
typedef std::pair<int, size_t> MyPairType;
struct CompareSecond
{
bool operator()(const MyPairType& left, const MyPairType& right) const
{
return left.second < right.second;
}
};
vector<int> tokens[4] = { 2, 2, 2, 4 };
map<int, size_t> counted;
for (vector<int>::iterator i=tokens.begin(); i!=tokens.end(); ++i)
{
++counted[*i];
}
MyPairType min
= *min_element(counted.begin(), counted.end(), CompareSecond());
int leastFrequentValue = min.second;
C++ translation using these SO question answers:
C++ counting instances / histogram using std::map,
Finding minimum value in a Map

in C++11, assuming your type support strict weak ordering (for std::sort), following may help: https://ideone.com/poxRxV
template <typename IT>
IT least_freq_elem(IT begin, IT end)
{
std::sort(begin, end);
IT next = std::find_if(begin, end, [begin](decltype(*begin) el) { return el != *begin; });
IT best_it = begin;
std::size_t best_count = next - begin;
for (IT it = next; it != end; it = next) {
next = std::find_if(it, end, [it](decltype(*begin) el) { return el != *it; });
const std::size_t count = next - it;
if (count < best_count) {
best_count = count;
best_it = it;
}
}
return best_it;
}

Related

How to write iterator wrapper that transforms several values from base container

I have algorithm that uses iterators, but there is a problem with transforming values, when we need more than single source value.
All transform iterators just get some one arg and transforms it. (see similar question from the past)
Code example:
template<typename ForwardIt>
double some_algorithm(ForwardIt begin, ForwardIt end) {
double result = 0;
for (auto it = begin; it != end; ++it) {
double t = *it;
/*
do some calculations..
*/
result += t;
}
return result;
}
int main() {
{
std::vector<double> distances{ 1, 2, 3, 4 };
double t = some_algorithm(distances.begin(), distances.end());
std::cout << t << std::endl;
/* works great */
}
{
/* lets now work with vector of points.. */
std::vector<double> points{ 1, 2, 4, 7, 11 };
/* convert to distances.. */
std::vector<double> distances;
distances.resize(points.size() - 1);
for (size_t i = 0; i + 1 < points.size(); ++i)
distances[i] = points[i + 1] - points[i];
/* invoke algorithm */
double t = some_algorithm(distances.begin(), distances.end());
std::cout << t << std::endl;
}
}
Is there a way (especialy using std) to create such an iterator wrapper to avoid explicitly generating distances value?
It could be fine to perform something like this:
template<typename BaseIterator, typename TransformOperator>
struct GenericTransformIterator {
GenericTransformIterator(BaseIterator it, TransformOperator op) : it(it), op(op) {}
auto operator*() {
return op(it);
}
GenericTransformIterator& operator++() {
++it;
return *this;
}
BaseIterator it;
TransformOperator op;
friend bool operator!=(GenericTransformIterator a, GenericTransformIterator b) {
return a.it != b.it;
}
};
and use like:
{
/* lets now work with vector of points.. */
std::vector<double> points{ 1, 2, 4, 7, 11 };
/* use generic transform iterator.. */
/* invoke algorithm */
auto distance_op = [](auto it) {
auto next_it = it;
++next_it;
return *next_it - *it;
};
double t = some_algorithm(
generic_transform_iterator(points.begin(), distance_op),
generic_transform_iterator(points.end() -1 , distance_op));
std::cout << t << std::endl;
}
So general idea is that transform function is not invoked on underlying object, but on iterator (or at least has some index value, then lambda can capture whole container and access via index).
I used to use boost which has lot of various iterator wrapping class.
But since cpp20 and ranges I'm curious if there is a way to use something existing from std:: rather than writing own wrappers.
With C++23, use std::views::pairwise.
In the meantime, you can use iota_view. Here's a solution which will work with any bidirectional iterators (e.g. points could be a std::list):
auto distances =
std::views::iota(points.cbegin(), std::prev(points.cend()))
| std::views::transform([](auto const &it) { return *std::next(it) - *it; });
This can also be made to work with any forward iterators. Example:
std::forward_list<double> points{1, 2, 4, 7, 11};
auto distances =
std::views::iota(points.cbegin())
| std::views::take_while([end = points.cend()](auto const &it) { return std::next(it) != end; })
| std::views::transform([](auto const &it) { return *std::next(it) - *it; })
| std::views::common;
Note that both of these snippets have undefined behaviour if points is empty.
I'm not sure this addresses your problem (let me know if it doesn't and I'll remove the answer), but you may be able to achieve that with ranges (unfortunately, not with standard ranges yet, but Eric Niebler's range-v3).
The code below:
groups the points vector in pairs,
calculates the difference between the second and the first element of each pair, and then
sums all those differences up.
[Demo]
auto t{ accumulate(
points | views::sliding(2) | views::transform([](const auto& v) { return v[1] - v[0]; }),
0.0
)};

C++ remove_if without iterating through whole vector

I have a vector of pointers, pointing to approx 10MB of packets. In that, from first 2MB, I wanna delete all those that matches my predicate. The problem here is remove_if iterates through the whole vector, even though its not required in my use case. Is there any other efficient way?
fn_del_first_2MB
{
uint32 deletedSoFar = 0;
uint32 deleteLimit = 2000000;
auto it = std::remove_if (cache_vector.begin(), cache_vector.end(),[deleteLimit,&deletedSoFar](const rc_vector& item){
if(item.ptr_rc->ref_count <= 0) {
if (deletedSoFar < deleteLimit) {
deletedSoFar += item.ptr_rc->u16packet_size;
delete(item.ptr_rc->packet);
delete(item.ptr_rc);
return true;
}
else
return false;
}
else
return false;
});
cache_vector.erase(it, cache_vector.end());
}
In the above code, once the deletedSoFar is greater than deleteLimit, any iteration more than that is unwanted.
Instead of cache_vector.end() put your own iterator marker myIter. With the remove_if option you should follow the erase-remove idiom. Here is an example that affects only the first 4 elements:
#include <iostream>
#include <vector>
#include <algorithm>
int main()
{
std::vector<int> vec = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
size_t index = 4; // index is something you need to calculate
auto myIter = vec.begin() + index; // Your iterator instead of vec.end()
vec.erase(std::remove_if(vec.begin(), myIter, [](int x){return x < 3; }), myIter);
// modified vector:
for (const auto& a : vec)
{
std::cout << a << std::endl;
}
return 0;
}
You may use your own loop:
void fn_del_first_2MB()
{
const uint32 deleteLimit = 2000000;
uint32 deletedSoFar = 0;
auto dest = cache_vector.begin();
auto it = dest
for (; it != cache_vector.end(); ++it) {
const auto& item = *it;
if (item.ptr_rc->ref_count <= 0) {
deletedSoFar += item.ptr_rc->u16packet_size;
delete(item.ptr_rc->packet);
delete(item.ptr_rc);
if (deletedSoFar >= deleteLimit) {
++it;
break;
}
} else if (dest != it) {
*dest = std::move(*it);
++dest;
}
}
cache_vector.erase(dest, it);
}
There is no need for std::remove_if() to pass the .end() iterator as the second argument: as long as the first argument can reach the second argument by incrementing, any iterators can be passed.
There is somewhat of a complication as your condition depends on the accumulated size of the elements encountered so far. As it turns out, it looks as if std::remove_if() won't be used. Something like this should work (although I'm not sure if this use of std::find_if() is actually legal as it keeps changing the predicate):
std::size_t accumulated_size(0u);
auto send(std::find_if(cache_vector.begin(), cache_vector.end(),
[&](rc_vector const& item) {
bool rc(accumulated_size < delete_limit);
accumulated_size += item.ptr_rc->u16packet_size;
return rc;
});
std::for_each(cache_vector.begin(), send, [](rc_vector& item) {
delete(item.ptr_rc->packet);
delete(item.ptr_rc);
});
cache_vector.erase(cache_vector.begin(), send);
The std::for_each() could be folded into the use of std::find_if() as well but I prefer to keep things logically separate. For a sufficiently large sequence there could be a performance difference when the memory needs to be transferred to the cache twice. For the tiny numbers quoted I doubt that the difference can be measured.

Find last element in std::vector which satisfies a condition

I have this requirement to find the last element in the vector which is smaller than a value.
Like find_first_of but instead of first i want last.
I searched and found that there is no find_last_of but there is find_first_of.
Why is that so? Is the standard way is to use find_first_of with reverse iterators?
Use reverse iterators, like this:
#include <iostream>
#include <vector>
int main()
{
std::vector<int> v{1,2,42,42,63};
auto result = std::find_if(v.rbegin(), v.rend(),
[](int i) { return i == 42; });
std::cout << std::distance(result, v.rend()) << '\n';
}
Live demo.
This is how it is done with reverse iterators:
std::vector<int> vec = {2,3,10,5,7,11,3,6};
//below outputs '3':
std::cout << *(std::find_if(vec.rbegin(), vec.rend(), [](int i) { return i < 4; }));
Just one thing. Be careful with the predicate if you're looking to find the tail-end of the range which includes the predicated element:
int main()
{
std::vector<int> x { 0, 1, 2, 3, 4, 5 };
// finds the reverse iterator pointing at '2'
// but using base() to convert back to a forward iterator
// also 'advances' the resulting forward iterator.
// in effect, inverting the sense of the predicate to 'v >= 3'
auto iter = std::find_if(std::make_reverse_iterator(x.end()),
std::make_reverse_iterator(x.begin()),
[](auto& v) { return v < 3; }).base();
std::copy(iter,
x.end(),
std::ostream_iterator<int>(std::cout, ", "));
}
result:
3, 4, 5,
From ZenXml:
template <class BidirectionalIterator, class T> inline
BidirectionalIterator find_last(const BidirectionalIterator first, const
BidirectionalIterator last, const T& value)
{
for (BidirectionalIterator it = last; it != first;)
//reverse iteration: 1. check 2. decrement 3. evaluate
{
--it; //
if (*it == value)
return it;
}
return last;
}

Sorting one std::vector based on the content of another [duplicate]

This question already has answers here:
How can I sort two vectors in the same way, with criteria that uses only one of the vectors?
(9 answers)
Closed 9 months ago.
I have several std::vector, all of the same length. I want to sort one of these vectors, and apply the same transformation to all of the other vectors. Is there a neat way of doing this? (preferably using the STL or Boost)? Some of the vectors hold ints and some of them std::strings.
Pseudo code:
std::vector<int> Index = { 3, 1, 2 };
std::vector<std::string> Values = { "Third", "First", "Second" };
Transformation = sort(Index);
Index is now { 1, 2, 3};
... magic happens as Transformation is applied to Values ...
Values are now { "First", "Second", "Third" };
friol's approach is good when coupled with yours. First, build a vector consisting of the numbers 1…n, along with the elements from the vector dictating the sorting order:
typedef vector<int>::const_iterator myiter;
vector<pair<size_t, myiter> > order(Index.size());
size_t n = 0;
for (myiter it = Index.begin(); it != Index.end(); ++it, ++n)
order[n] = make_pair(n, it);
Now you can sort this array using a custom sorter:
struct ordering {
bool operator ()(pair<size_t, myiter> const& a, pair<size_t, myiter> const& b) {
return *(a.second) < *(b.second);
}
};
sort(order.begin(), order.end(), ordering());
Now you've captured the order of rearrangement inside order (more precisely, in the first component of the items). You can now use this ordering to sort your other vectors. There's probably a very clever in-place variant running in the same time, but until someone else comes up with it, here's one variant that isn't in-place. It uses order as a look-up table for the new index of each element.
template <typename T>
vector<T> sort_from_ref(
vector<T> const& in,
vector<pair<size_t, myiter> > const& reference
) {
vector<T> ret(in.size());
size_t const size = in.size();
for (size_t i = 0; i < size; ++i)
ret[i] = in[reference[i].first];
return ret;
}
typedef std::vector<int> int_vec_t;
typedef std::vector<std::string> str_vec_t;
typedef std::vector<size_t> index_vec_t;
class SequenceGen {
public:
SequenceGen (int start = 0) : current(start) { }
int operator() () { return current++; }
private:
int current;
};
class Comp{
int_vec_t& _v;
public:
Comp(int_vec_t& v) : _v(v) {}
bool operator()(size_t i, size_t j){
return _v[i] < _v[j];
}
};
index_vec_t indices(3);
std::generate(indices.begin(), indices.end(), SequenceGen(0));
//indices are {0, 1, 2}
int_vec_t Index = { 3, 1, 2 };
str_vec_t Values = { "Third", "First", "Second" };
std::sort(indices.begin(), indices.end(), Comp(Index));
//now indices are {1,2,0}
Now you can use the "indices" vector to index into "Values" vector.
Put your values in a Boost Multi-Index container then iterate over to read the values in the order you want. You can even copy them to another vector if you want to.
Only one rough solution comes to my mind: create a vector that is the sum of all other vectors (a vector of structures, like {3,Third,...},{1,First,...}) then sort this vector by the first field, and then split the structures again.
Probably there is a better solution inside Boost or using the standard library.
You can probably define a custom "facade" iterator that does what you need here. It would store iterators to all your vectors or alternatively derive the iterators for all but the first vector from the offset of the first. The tricky part is what that iterator dereferences to: think of something like boost::tuple and make clever use of boost::tie. (If you wanna extend on this idea, you can build these iterator types recursively using templates but you probably never want to write down the type of that - so you either need c++0x auto or a wrapper function for sort that takes ranges)
I think what you really need (but correct me if I'm wrong) is a way to access elements of a container in some order.
Rather than rearranging my original collection, I would borrow a concept from Database design: keep an index, ordered by a certain criterion. This index is an extra indirection that offers great flexibility.
This way it is possible to generate multiple indices according to different members of a class.
using namespace std;
template< typename Iterator, typename Comparator >
struct Index {
vector<Iterator> v;
Index( Iterator from, Iterator end, Comparator& c ){
v.reserve( std::distance(from,end) );
for( ; from != end; ++from ){
v.push_back(from); // no deref!
}
sort( v.begin(), v.end(), c );
}
};
template< typename Iterator, typename Comparator >
Index<Iterator,Comparator> index ( Iterator from, Iterator end, Comparator& c ){
return Index<Iterator,Comparator>(from,end,c);
}
struct mytype {
string name;
double number;
};
template< typename Iter >
struct NameLess : public binary_function<Iter, Iter, bool> {
bool operator()( const Iter& t1, const Iter& t2 ) const { return t1->name < t2->name; }
};
template< typename Iter >
struct NumLess : public binary_function<Iter, Iter, bool> {
bool operator()( const Iter& t1, const Iter& t2 ) const { return t1->number < t2->number; }
};
void indices() {
mytype v[] = { { "me" , 0.0 }
, { "you" , 1.0 }
, { "them" , -1.0 }
};
mytype* vend = v + _countof(v);
Index<mytype*, NameLess<mytype*> > byname( v, vend, NameLess<mytype*>() );
Index<mytype*, NumLess <mytype*> > bynum ( v, vend, NumLess <mytype*>() );
assert( byname.v[0] == v+0 );
assert( byname.v[1] == v+2 );
assert( byname.v[2] == v+1 );
assert( bynum.v[0] == v+2 );
assert( bynum.v[1] == v+0 );
assert( bynum.v[2] == v+1 );
}
A slightly more compact variant of xtofl's answer for if you are just looking to iterate through all your vectors based on the of a single keys vector. Create a permutation vector and use this to index into your other vectors.
#include <boost/iterator/counting_iterator.hpp>
#include <vector>
#include <algorithm>
std::vector<double> keys = ...
std::vector<double> values = ...
std::vector<size_t> indices(boost::counting_iterator<size_t>(0u), boost::counting_iterator<size_t>(keys.size()));
std::sort(begin(indices), end(indices), [&](size_t lhs, size_t rhs) {
return keys[lhs] < keys[rhs];
});
// Now to iterate through the values array.
for (size_t i: indices)
{
std::cout << values[i] << std::endl;
}
ltjax's answer is a great approach - which is actually implemented in boost's zip_iterator http://www.boost.org/doc/libs/1_43_0/libs/iterator/doc/zip_iterator.html
It packages together into a tuple whatever iterators you provide it.
You can then create your own comparison function for a sort based on any combination of iterator values in your tuple. For this question, it would just be the first iterator in your tuple.
A nice feature of this approach is that it allows you to keep the memory of each individual vector contiguous (if you're using vectors and that's what you want). You also don't need to store a separate index vector of ints.
This would have been an addendum to Konrad's answer as it an approach for a in-place variant of applying the sort order to a vector. Anyhow since the edit won't go through I will put it here
Here is a in-place variant with a slightly higher time complexity that is due to a primitive operation of checking a boolean. The additional space complexity is of a vector which can be a space efficient compiler dependent implementation. The complexity of a vector can be eliminated if the given order itself can be modified.
Here is a in-place variant with a slightly higher time complexity that is due to a primitive operation of checking a boolean. The additional space complexity is of a vector which can be a space efficient compiler dependent implementation. The complexity of a vector can be eliminated if the given order itself can be modified. This is a example of what the algorithm is doing.
If the order is 3 0 4 1 2, the movement of the elements as indicated by the position indices would be 3--->0; 0--->1; 1--->3; 2--->4; 4--->2.
template<typename T>
struct applyOrderinPlace
{
void operator()(const vector<size_t>& order, vector<T>& vectoOrder)
{
vector<bool> indicator(order.size(),0);
size_t start = 0, cur = 0, next = order[cur];
size_t indx = 0;
T tmp;
while(indx < order.size())
{
//find unprocessed index
if(indicator[indx])
{
++indx;
continue;
}
start = indx;
cur = start;
next = order[cur];
tmp = vectoOrder[start];
while(next != start)
{
vectoOrder[cur] = vectoOrder[next];
indicator[cur] = true;
cur = next;
next = order[next];
}
vectoOrder[cur] = tmp;
indicator[cur] = true;
}
}
};
Here is a relatively simple implementation using index mapping between the ordered and unordered names that will be used to match the ages to the ordered names:
void ordered_pairs()
{
std::vector<std::string> names;
std::vector<int> ages;
// read input and populate the vectors
populate(names, ages);
// print input
print(names, ages);
// sort pairs
std::vector<std::string> sortedNames(names);
std::sort(sortedNames.begin(), sortedNames.end());
std::vector<int> indexMap;
for(unsigned int i = 0; i < sortedNames.size(); ++i)
{
for (unsigned int j = 0; j < names.size(); ++j)
{
if (sortedNames[i] == names[j])
{
indexMap.push_back(j);
break;
}
}
}
// use the index mapping to match the ages to the names
std::vector<int> sortedAges;
for(size_t i = 0; i < indexMap.size(); ++i)
{
sortedAges.push_back(ages[indexMap[i]]);
}
std::cout << "Ordered pairs:\n";
print(sortedNames, sortedAges);
}
For the sake of completeness, here are the functions populate() and print():
void populate(std::vector<std::string>& n, std::vector<int>& a)
{
std::string prompt("Type name and age, separated by white space; 'q' to exit.\n>>");
std::string sentinel = "q";
while (true)
{
// read input
std::cout << prompt;
std::string input;
getline(std::cin, input);
// exit input loop
if (input == sentinel)
{
break;
}
std::stringstream ss(input);
// extract input
std::string name;
int age;
if (ss >> name >> age)
{
n.push_back(name);
a.push_back(age);
}
else
{
std::cout <<"Wrong input format!\n";
}
}
}
and:
void print(const std::vector<std::string>& n, const std::vector<int>& a)
{
if (n.size() != a.size())
{
std::cerr <<"Different number of names and ages!\n";
return;
}
for (unsigned int i = 0; i < n.size(); ++i)
{
std::cout <<'(' << n[i] << ", " << a[i] << ')' << "\n";
}
}
And finally, main() becomes:
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
#include <algorithm>
void ordered_pairs();
void populate(std::vector<std::string>&, std::vector<int>&);
void print(const std::vector<std::string>&, const std::vector<int>&);
//=======================================================================
int main()
{
std::cout << "\t\tSimple name - age sorting.\n";
ordered_pairs();
}
//=======================================================================
// Function Definitions...
**// C++ program to demonstrate sorting in vector
// of pair according to 2nd element of pair
#include <iostream>
#include<string>
#include<vector>
#include <algorithm>
using namespace std;
// Driver function to sort the vector elements
// by second element of pairs
bool sortbysec(const pair<char,char> &a,
const pair<int,int> &b)
{
return (a.second < b.second);
}
int main()
{
// declaring vector of pairs
vector< pair <char, int> > vect;
// Initialising 1st and 2nd element of pairs
// with array values
//int arr[] = {10, 20, 5, 40 };
//int arr1[] = {30, 60, 20, 50};
char arr[] = { ' a', 'b', 'c' };
int arr1[] = { 4, 7, 1 };
int n = sizeof(arr)/sizeof(arr[0]);
// Entering values in vector of pairs
for (int i=0; i<n; i++)
vect.push_back( make_pair(arr[i],arr1[i]) );
// Printing the original vector(before sort())
cout << "The vector before sort operation is:\n" ;
for (int i=0; i<n; i++)
{
// "first" and "second" are used to access
// 1st and 2nd element of pair respectively
cout << vect[i].first << " "
<< vect[i].second << endl;
}
// Using sort() function to sort by 2nd element
// of pair
sort(vect.begin(), vect.end(), sortbysec);
// Printing the sorted vector(after using sort())
cout << "The vector after sort operation is:\n" ;
for (int i=0; i<n; i++)
{
// "first" and "second" are used to access
// 1st and 2nd element of pair respectively
cout << vect[i].first << " "
<< vect[i].second << endl;
}
getchar();
return 0;`enter code here`
}**
with C++11 lambdas and the STL algorithms based on answers from Konrad Rudolph and Gabriele D'Antona:
template< typename T, typename U >
std::vector<T> sortVecAByVecB( std::vector<T> & a, std::vector<U> & b ){
// zip the two vectors (A,B)
std::vector<std::pair<T,U>> zipped(a.size());
for( size_t i = 0; i < a.size(); i++ ) zipped[i] = std::make_pair( a[i], b[i] );
// sort according to B
std::sort(zipped.begin(), zipped.end(), []( auto & lop, auto & rop ) { return lop.second < rop.second; });
// extract sorted A
std::vector<T> sorted;
std::transform(zipped.begin(), zipped.end(), std::back_inserter(sorted), []( auto & pair ){ return pair.first; });
return sorted;
}
So many asked this question and nobody came up with a satisfactory answer. Here is a std::sort helper that enables to sort two vectors simultaneously, taking into account the values of only one vector. This solution is based on a custom RadomIt (random iterator), and operates directly on the original vector data, without temporary copies, structure rearrangement or additional indices:
C++, Sort One Vector Based On Another One

Erasing elements in std::vector by using indexes

I've a std::vector<int> and I need to remove all elements at given indexes (the vector usually has high dimensionality). I would like to know, which is the most efficient way to do such an operation having in mind that the order of the original vector should be preserved.
Although, I found related posts on this issue, some of them needed to remove one single element or multiple elements where the remove-erase idiom seemed to be a good solution.
In my case, however, I need to delete multiple elements and since I'm using indexes instead of direct values, the remove-erase idiom can't be applied, right?
My code is given below and I would like to know if it's possible to do better than that in terms of efficiency?
bool find_element(const vector<int> & vMyVect, int nElem){
return (std::find(vMyVect.begin(), vMyVect.end(), nElem)!=vMyVect.end()) ? true : false;
}
void remove_elements(){
srand ( time(NULL) );
int nSize = 20;
std::vector<int> vMyValues;
for(int i = 0; i < nSize; ++i){
vMyValues.push_back(i);
}
int nRandIdx;
std::vector<int> vMyIndexes;
for(int i = 0; i < 6; ++i){
nRandIdx = rand() % nSize;
vMyIndexes.push_back(nRandIdx);
}
std::vector<int> vMyResult;
for(int i=0; i < (int)vMyValues.size(); i++){
if(!find_element(vMyIndexes,i)){
vMyResult.push_back(vMyValues[i]);
}
}
}
I think it could be more efficient, if you just just sort your indices and then delete those elements from your vector from the highest to the lowest. Deleting the highest index on a list will not invalidate the lower indices you want to delete, because only the elements higher than the deleted ones change their index.
If it is really more efficient will depend on how fast the sorting is. One more pro about this solultion is, that you don't need a copy of your value vector, you can work directly on the original vector. code should look something like this:
... fill up the vectors ...
sort (vMyIndexes.begin(), vMyIndexes.end());
for(int i=vMyIndexes.size() - 1; i >= 0; i--){
vMyValues.erase(vMyValues.begin() + vMyIndexes[i])
}
to avoid moving the same elements many times, we can move them by ranges between deleted indexes
// fill vMyIndexes, take care about duplicated values
vMyIndexes.push_back(-1); // to handle range from 0 to the first index to remove
vMyIndexes.push_back(vMyValues.size()); // to handle range from the last index to remove and to the end of values
std::sort(vMyIndexes.begin(), vMyIndexes.end());
std::vector<int>::iterator last = vMyValues.begin();
for (size_t i = 1; i != vMyIndexes.size(); ++i) {
size_t range_begin = vMyIndexes[i - 1] + 1;
size_t range_end = vMyIndexes[i];
std::copy(vMyValues.begin() + range_begin, vMyValues.begin() + range_end, last);
last += range_end - range_begin;
}
vMyValues.erase(last, vMyValues.end());
P.S. fixed a bug, thanks to Steve Jessop that patiently tried to show me it
Erase-remove multiple elements at given indices
Update: after the feedback on performance from #kory, I've modified the algorithm not to use flagging and move/copy elements in chunks (not one-by-one).
Notes:
indices need to be sorted and unique
uses std::move (replace with std::copy for c++98):
Github
Live example
Code:
template <class ForwardIt, class SortUniqIndsFwdIt>
inline ForwardIt remove_at(
ForwardIt first,
ForwardIt last,
SortUniqIndsFwdIt ii_first,
SortUniqIndsFwdIt ii_last)
{
if(ii_first == ii_last) // no indices-to-remove are given
return last;
typedef typename std::iterator_traits<ForwardIt>::difference_type diff_t;
typedef typename std::iterator_traits<SortUniqIndsFwdIt>::value_type ind_t;
ForwardIt destination = first + static_cast<diff_t>(*ii_first);
while(ii_first != ii_last)
{
// advance to an index after a chunk of elements-to-keep
for(ind_t cur = *ii_first++; ii_first != ii_last; ++ii_first)
{
const ind_t nxt = *ii_first;
if(nxt - cur > 1)
break;
cur = nxt;
}
// move the chunk of elements-to-keep to new destination
const ForwardIt source_first =
first + static_cast<diff_t>(*(ii_first - 1)) + 1;
const ForwardIt source_last =
ii_first != ii_last ? first + static_cast<diff_t>(*ii_first) : last;
std::move(source_first, source_last, destination);
// std::copy(source_first, source_last, destination) // c++98 version
destination += source_last - source_first;
}
return destination;
}
Usage example:
std::vector<int> v = /*...*/; // vector to remove elements from
std::vector<int> ii = /*...*/; // indices of elements to be removed
// prepare indices
std::sort(ii.begin(), ii.end());
ii.erase(std::unique(ii.begin(), ii.end()), ii.end());
// remove elements at indices
v.erase(remove_at(v.begin(), v.end(), ii.begin(), ii.end()), v.end());
What you can do is split the vector (actually any non-associative container) in two
groups, one corresponding to the indices to be erased and one containing the rest.
template<typename Cont, typename It>
auto ToggleIndices(Cont &cont, It beg, It end) -> decltype(std::end(cont))
{
int helpIndx(0);
return std::stable_partition(std::begin(cont), std::end(cont),
[&](typename Cont::value_type const& val) -> bool {
return std::find(beg, end, helpIndx++) != end;
});
}
you can then delete from (or up to) the split point to erase (keep only)
the elements corresponding to the indices
std::vector<int> v;
v.push_back(0);
v.push_back(1);
v.push_back(2);
v.push_back(3);
v.push_back(4);
v.push_back(5);
int ar[] = { 2, 0, 4 };
v.erase(ToggleIndices(v, std::begin(ar), std::end(ar)), v.end());
If the 'keep only by index' operation is not needed you can use remove_if insted of stable_partition (O(n) vs O(nlogn) complexity)
To work for C arrays as containers the lambda function should be
[&](decltype(*(std::begin(cont))) const& val) -> bool
{ return std::find(beg, end, helpIndx++) != end; }
but then the .erase() method is no longer an option
If you want to ensure that every element is only moved once, you can simply iterate through each element, copy those that are to remain into a new, second container, do not copy the ones you wish to remove, and then delete the old container and replace it with the new one :)
This is an algorithm based on Andriy Tylychko's answer so that this can make it easier and faster to use the answer, without having to pick it apart. It also removes the need to have -1 at the beginning of the indices list and a number of items at the end. Also some debugging code to make sure the indices are valid (sorted and valid index into items).
template <typename Items_it, typename Indices_it>
auto remove_indices(
Items_it items_begin, Items_it items_end
, Indices_it indices_begin, Indices_it indices_end
)
{
static_assert(
std::is_same_v<std::random_access_iterator_tag
, typename std::iterator_traits<Items_it>::iterator_category>
, "Can't remove items this way unless Items_it is a random access iterator");
size_t indices_size = std::distance(indices_begin, indices_end);
size_t items_size = std::distance(items_begin, items_end);
if (indices_size == 0) {
// Nothing to erase
return items_end;
}
// Debug check to see if the indices are already sorted and are less than
// size of items.
assert(indices_begin[0] < items_size);
assert(std::is_sorted(indices_begin, indices_end));
auto last = items_begin;
auto shift = [&last, &items_begin](size_t range_begin, size_t range_end) {
std::copy(items_begin + range_begin, items_begin + range_end, last);
last += range_end - range_begin;
};
size_t last_index = -1;
for (size_t i = 0; i != indices_size; ++i) {
shift(last_index + 1, indices_begin[i]);
last_index = indices_begin[i];
}
shift(last_index + 1, items_size);
return last;
}
Here is an example of usage:
template <typename T>
std::ostream& operator<<(std::ostream& os, std::vector<T>& v)
{
for (auto i : v) {
os << i << " ";
}
os << std::endl;
return os;
}
int main()
{
using std::begin;
using std::end;
std::vector<int> items = { 1, 3, 6, 8, 13, 17 };
std::vector<int> indices = { 0, 1, 2, 3, 4 };
std::cout << items;
items.erase(
remove_indices(begin(items), end(items), begin(indices), end(indices))
, std::end(items)
);
std::cout << items;
return 0;
}
Output:
1 3 6 8 13 17
17
The headers required are:
#include <iterator>
#include <vector>
#include <iostream> // only needed for output
#include <cassert>
#include <type_traits>
And a Demo can be found on godbolt.org.