How to remove non contiguous elements from a vector in c++ - c++

I have a vector std::vector<inputInfo> inputList and another vector std::vector<int> selection.
inputInfo is a struct that has some information stored.
The vector selection corresponds to positions inside inputList vector.
I need to remove elements from inputList which correspond to entries in the selection vector.

Here's my attempt on this removal algorithm.
Assuming the selection vector is sorted and using some (unavoidable ?) pointer arithmetic, this can be done in one line:
template <class T>
inline void erase_selected(std::vector<T>& v, const std::vector<int>& selection)
{
v.resize(std::distance(
v.begin(),
std::stable_partition(v.begin(), v.end(),
[&selection, &v](const T& item) {
return !std::binary_search(
selection.begin(), selection.end(),
static_cast<int>(static_cast<const T*>(&item) - &v[0]));
})));
}
This is based on an idea of Sean Parent (see this C++ Seasoning video) to use std::stable_partition ("stable" keeps elements sorted in the output array) to move all selected items to the end of an array.
The line with pointer arithmetic
static_cast<int>(static_cast<const T*>(&item) - &v[0])
can, in principle, be replaced with STL algorithms and index-free expression
std::distance(std::find(v.begin(), v.end(), item), std::begin(v))
but this way we have to spend O(n) in std::find.
The shortest way to remove non-contiguous elements:
template <class T> void erase_selected(const std::vector<T>& v, const std::vector<int>& selection)
{
std::vector<int> sorted_sel = selection;
std::sort(sorted_sel.begin(), sorted_sel.end());
// 1) Define checker lambda
// 'filter' is called only once for every element,
// all the calls respect the original order of the array
// We manually keep track of the item which is filtered
// and this way we can look this index in 'sorted_sel' array
int itemIndex = 0;
auto filter = [&itemIndex, &sorted_sel](const T& item) {
return !std::binary_search(
sorted_sel.begin(),
sorted_sel.end(),
itemIndex++);
}
// 2) Move all 'not-selected' to the end
auto end_of_selected = std::stable_partition(
v.begin(),
v.end(),
filter);
// 3) Cut off the end of the std::vector
v.resize(std::distance(v.begin(), end_of_selected));
}
Original code & test
If for some reason the code above does not work due to strangely behaving std::stable_partition(), then below is a workaround (wrapping the input array values with selected flags.
I do not assume that inputInfo structure contains the selected flag, so I wrap all the items in the T_withFlag structure which keeps pointers to original items.
#include <algorithm>
#include <iostream>
#include <vector>
template <class T>
std::vector<T> erase_selected(const std::vector<T>& v, const std::vector<int>& selection)
{
std::vector<int> sorted_sel = selection;
std::sort(sorted_sel.begin(), sorted_sel.end());
// Packed (data+flag) array
struct T_withFlag
{
T_withFlag(const T* ref = nullptr, bool sel = false): src(ref), selected(sel) {}
const T* src;
bool selected;
};
std::vector<T_withFlag> v_with_flags;
// should be like
// { {0, true}, {0, true}, {3, false},
// {0, true}, {2, false}, {4, false},
// {5, false}, {0, true}, {7, false} };
// for the input data in main()
v_with_flags.reserve(v.size());
// No "beautiful" way to iterate a vector
// and keep track of element index
// We need the index to check if it is selected
// The check takes O(log(n)), so the loop is O(n * log(n))
int itemIndex = 0;
for (auto& ii: v)
v_with_flags.emplace_back(
T_withFlag(&ii,
std::binary_search(
sorted_sel.begin(),
sorted_sel.end(),
itemIndex++)
));
// I. (The bulk of ) Removal algorithm
// a) Define checker lambda
auto filter = [](const T_withFlag& ii) { return !ii.selected; };
// b) Move every item marked as 'not-selected'
// to the end of an array
auto end_of_selected = std::stable_partition(
v_with_flags.begin(),
v_with_flags.end(),
filter);
// c) Cut off the end of the std::vector
v_with_flags.resize(
std::distance(v_with_flags.begin(), end_of_selected));
// II. Output
std::vector<T> v_out(v_with_flags.size());
std::transform(
// for C++20 you can parallelize this
// with 'std::execution::par' as first parameter
v_with_flags.begin(),
v_with_flags.end(),
v_out.begin(),
[](const T_withFlag& ii) { return *(ii.src); });
return v_out;
}
The test function is
int main()
{
// Obviously, I do not know the structure
// used by the topic starter,
// so I just declare a small structure for a test
// The 'erase_selected' does not assume
// this structure to be 'light-weight'
struct inputInfo
{
int data;
inputInfo(int v = 0): data(v) {}
};
// Source selection indices
std::vector<int> selection { 0, 1, 3, 7 };
// Source data array
std::vector<inputInfo> v{ 0, 0, 3, 0, 2, 4, 5, 0, 7 };
// Output array
auto v_out = erase_selected(v, selection);
for (auto ii : v_out)
std::cout << ii.data << ' ';
std::cout << std::endl;
}

Related

Unite elements that share the same value of a variable in a vector of structs

For example, I have this struct :
struct Time
{
char Day[10];
int pay;
int earn;
}
And suppose that the vector of this Time struct has the following elements:
vector<Time> mySelf = ({"Monday", 20, 40}, {"Tuesday", 15, 20}, {"Monday", 30, 10}, {"Tuesday", 10, 5});
So is there any algorithm to unite the data so that elements with the same day name will appear once and the other variables of those elements will combine together to form a new vector like this :
vector<Time> mySelf = ({"Monday", 50, 50}, {"Tuesday", 25, 25});
You can try to insert your elements to unordered_map, and then reconstruct a vector. Search and insertion to the map have constant-time complexity, so all the operation will be O(n), because we need to iterate over a vector twice.
std::unordered_map<std::string, Time> timeMap;
for (const auto& t : mySelf)
{
if (timeMap.count(t.day) == 0)
{
timeMap[t.day] = t;
}
else
{
timeMap[t.day].pay += t.pay;
timeMap[t.day].earn += t.earn;
}
}
or shorter version, since insert already checks if the element exists and will not overwrite it:
for (const auto& t : mySelf)
{
timeMap.insert({t.day, {t.day,0,0}});
timeMap[t.day].pay += t.pay;
timeMap[t.day].earn += t.earn;
}
and then the vector reconstruction:
std::vector<Time> result;
result.reserve(timeMap.size());
for (const auto&[key, val] : timeMap)
{
result.push_back(val);
}
Alternatively you could use std::unordered_set but then you need some hash function for your struct. Probably you could improve it further with move semantics.
live demo

Clustering example in C++

I have an increasing input vector like this {0, 1, 3, 5, 6, 7, 9} and want to cluster the inputs like this {{0, 1}, {3}, {5, 6, 7}, {9}} i.e cluster only the integers that are neighbors. The data structure std::vector<std::vector<int>> solution(const std::vector<int>& input)
I usually advocate for not giving away solutions, but it looks like you're getting bogged down with indices and temporary vectors. Instead, standard iterators and algorithms make this task a breeze:
std::vector<std::vector<int>> solution(std::vector<int> const &input) {
std::vector<std::vector<int>> clusters;
// Special-casing to avoid returning {{}} in case of an empty input
if(input.empty())
return clusters;
// Loop-and-a-half, no condition here
for(auto it = begin(input);;) {
// Find the last element of the current cluster
auto const last = std::adjacent_find(
it, end(input),
[](int a, int b) { return b - a > 1; }
);
if(last == end(input)) {
// We reached the end: register the last cluster and return
clusters.emplace_back(it, last);
return clusters;
}
// One past the end of the current cluster
auto const gap = next(last);
// Register the cluster
clusters.emplace_back(it, gap);
// One past the end of a cluster is the beginning of the next one
it = gap;
}
}
See it live on Coliru (lame output formatting free of charge)

Order a list depending on other list

Given:
struct Object {
int id;
...
};
list<Object> objectList;
list<int> idList;
What is the best way to order objectList depending on order of idList?
Example (pseudo code):
INPUT
objectList = {o1, o2, o3};
idList = {2, 3, 1};
ACTION
sort(objectList, idList);
OUTPUT
objectList = {o2, o3, o1};
I searched in documentation but I only found methods to order elements comparing among themselves.
You can store the objects in an std::map, with id as key. Then traverse idList, get the object out of map with its id.
std::map<int, Object> objectMap;
for (auto itr = objectList.begin(); itr != objectList.end(); itr++)
{
objectMap.insert(std::make_pair(itr->id, *itr));
}
std::list<Object> newObjectList;
for (auto itr = idList.begin(); itr != idList.end(); itr++)
{
// here may fail if your idList contains ids which does not appear in objectList
newObjectList.push_back(objectMap[*itr]);
}
// now newObjectList is sorted as order in idList
Here is another variant, which works in O(n log n). This is asymptotcally optimal.
#include <list>
#include <vector>
#include <algorithm>
#include <iostream>
#include <cassert>
int main() {
struct O {
int id;
};
std::list<O> object_list{{1}, {2}, {3}, {4}};
std::list<int> index_list{4, 2, 3, 1};
assert(object_list.size() == index_list.size());
// this vector is optional. It is needed if sizeof(O) is quite large.
std::vector<std::pair<int, O*>> tmp_vector(object_list.size());
// this is O(n)
std::transform(begin(object_list), end(object_list), begin(tmp_vector),
[](auto& o) { return std::make_pair(o.id, &o); });
// this is O(n log n)
std::sort(begin(tmp_vector), end(tmp_vector),
[](const auto& o1, const auto& o2) {
return o1.first < o2.first;
});
// at this point, tmp_vector holds pairs in increasing index order.
// Note that this may not be a contiguous list.
std::list<O> tmp_list(object_list.size());
// this is again O (n log n), because lower_bound is O (n)
// we then insert the objects into a new list (you may also use some
// move semantics here).
std::transform(begin(index_list), end(index_list), begin(tmp_list),
[&tmp_vector](const auto& i) {
return *std::lower_bound(begin(tmp_vector), end(tmp_vector),
std::make_pair(i, nullptr),
[](const auto& o1, const auto& o2) {
return o1.first < o2.first;
})->second;
});
// As we just created a new list, we swap the new list with the old one.
std::swap(object_list, tmp_list);
for (const auto& o : object_list)
std::cout << o.id << std::endl;
}
I assumed that O is quite large and not easily movable. Therefore i first create tmp_vector which only contains of pairs. Then I sort this vector.
Afterwards I can simply go through the index_list and find the matching indices using binary search.
Let me elaborate on why a map is not the best solution eventhough you get a quite small piece of code. If you use a map you need to rebalance your tree after each insertion. This doesn't cost asympatotically (because n times rebalancing costs you the same as sorting once), but the constant is way larger. A "constant map" makes not that much sense (except accessing it may be easier).
I then timed the "simple" map-approach against my "not-so-simple" vector-approach. I created a randomly sorted index_list with N entries. And this is what I get (in us):
N map vector
1000 90 75
10000 1400 940
100000 24500 15000
1000000 660000 250000
NOTE: This test shows the worst case as in my case only index_list was randomly sorted, while the object_list (which is inserted into the map in order) is sorted. So rebalancing shows all its effect. If the object_list is kind of random, performance will behave more similar, eventhough performance will always be worse. The vector list will even behave better when the object list is completely random.
So already with 1000 entries the difference is already quite large. So I would strongly vote for a vector-based approach.
Assuming the data is handled to you externally and you don't have the choice of the containers:
assert( objectList.size() == idList.size() );
std::vector<std::pair<int,Object>> wrapper( idList.size() );
auto idList_it = std::begin( idList );
auto objectList_it = std::begin( objectList );
for( auto& e: wrapper )
e = std::make_pair( *idList_it++, *objectList_it++ );
std::sort(
std::begin(wrapper),
std::end(wrapper),
[]
(const std::pair<int,Object>& a, const std::pair<int,Object>& b) -> bool
{ return a.first<b.first; }
);
Then, copy back to original container.
{
auto objectList_it = std::begin( objectList );
for( const auto& e: wrapper )
*objectList_it++ = e;
}
But this solution is not optimal, I'm sure somebody will come with a better solution.
Edit: The default comparison operator for pairs requires that it is defined both for first and second members. Thus the easiest way is to provide a lambda.
Edit2: for some reason, this doesn't build if using a std::list for the wrapper. But it's ok if you use a std::vector (see here).
std::list has a sort member function you can use with a custom comparison functor.
That custom functor has to look up an object's id in the idList and can then use std::distance to calculate the position of the element in idList. It does so for both objects to be compared and returns true if the first position is smaller than the second.
Here is an example:
#include <iostream>
#include <list>
#include <algorithm>
#include <stdexcept>
struct Object
{
int id;
};
int main()
{
Object o1 = { 1 };
Object o2 = { 2 };
Object o3 = { 3 };
std::list<Object> objectList = { o1, o2, o3 };
std::list<int> const idList = { 2, 3, 1 };
objectList.sort([&](Object const& first, Object const& second)
{
auto const id_find_iter1 = std::find(begin(idList), end(idList), first.id);
auto const id_find_iter2 = std::find(begin(idList), end(idList), second.id);
if (id_find_iter1 == end(idList) || id_find_iter2 == end(idList))
{
throw std::runtime_error("ID not found");
}
auto const pos1 = std::distance(begin(idList), id_find_iter1);
auto const pos2 = std::distance(begin(idList), id_find_iter2);
return pos1 < pos2;
});
for (auto const& object : objectList)
{
std::cout << object.id << '\n';
}
}
It's probably not terribly efficient, but chances are you will never notice. If it still bothers you, you might want to look for a solution with std::vector, which unlike std::list provides random-access iterators. That turns std::distance from O(n) to O(1).
I would find it strange to end up in this situation as I would use the pointers instead of the ids. Though; there might be usecases for this.
Note that in all examples below, I assume that the ids-list contains all ids exactly ones.
Writing it yourself
The issue you like to solve is creating/sorting a list of objects based on the order of the ids in another list.
The naive way of doing this, is simply writing it yourself:
void sortByIdVector(std::list<Object> &list, const std::list<int> &ids)
{
auto oldList = std::move(list);
list = std::list<Object>{};
for (auto id : ids)
{
auto itElement = std::find_if(oldList.begin(), oldList.end(), [id](const Object &obj) { return id == obj.id; });
list.emplace_back(std::move(*itElement));
oldList.erase(itElement);
}
}
If you use a sorted vector as input, you can optimize this code to get the best performance out of it. I'm leaving it up-to you to do so.
Using sort
For this implementation, I'm gonna assume this are std::vector instead of std::list, as this is the better container to request the index of an element. (You can with some more code do the same for list)
size_t getIntendedIndex(const std::vector<int> &ids, const Object &obj)
{
auto itElement = std::find_if(ids.begin(), ids.end(), [obj](int id) { return id == obj.id; });
return itElement - ids.begin();
}
void sortByIdVector(std::list<Object> &list, const std::vector<int> &ids)
{
list.sort([&ids](const Object &lhs, const Object &rhs){ return getIntendedIndex(ids, lhs) < getIntendedIndex(ids, rhs); });
}
Insertion
Another approach, also more suitable for std::vector would be simply inserting the elements at the right place and will be more performant than the std::sort.
void sortByIdVector(std::vector<Object> &list, const std::vector<int> &ids)
{
auto oldList = std::move(list);
list = std::vector<Object>{};
list.resize(oldList.size());
for (Object &obj : oldList)
{
auto &newLocation = list[getIntendedIndex(ids, obj)];
newLocation = std::move(obj);
}
}
objectList.sort([&idList] (const Object& o1, const Object& o2) -> bool
{ return std::find(++std::find(idList.begin(), idList.end(), o1.id),
idList.end(), o2.id)
!= idList.end();
});
The idea is to check if we find o1.id before o2.id in the idList.
We search o1.id, increment the found position then we search o2.id: if found, that implies o1 < o2.
Test
#include <iostream>
#include <string>
#include <list>
#include <algorithm>
struct Object {
int id;
string name;
};
int main()
{
list<Object> objectList {{1, "one_1"}, {2, "two_1"}, {3, "three_1"}, {2, "two_2"}, {1, "one_2"}, {4, "four_1"}, {3, "Three_2"}, {4, "four_2"}};
list<int> idList {3, 2, 4, 1};
objectList.sort([&idList] (const Object& o1, const Object& o2) -> bool
{ return std::find(++std::find(idList.begin(), idList.end(), o1.id), idList.end(), o2.id) != idList.end(); });
for(const auto& o: objectList) cout << o.id << " " << o.name << "\n";
}
/* OUTPUT:
3 three_1
3 Three_2
2 two_1
2 two_2
4 four_1
4 four_2
1 one_1
1 one_2
*/

Implementing partition_unique and stable_partition_unique algorithms

I'm looking for a way to partition a set of ordered elements such that all unique elements occur before their respective duplicates, noting that std::unique is not applicable as duplicate elements are overwritten, I thought of using std::partition. Calling this algorithm partition_unique, I also need the corresponding stable_partition_unique (i.e. like stable_partition).
A basic implementation of partition_unique is:
#include <algorithm>
#include <iterator>
#include <unordered_set>
#include <functional>
template <typename BidirIt, typename BinaryPredicate = std::equal_to<void>>
BidirIt partition_unique(BidirIt first, BidirIt last, BinaryPredicate p = BinaryPredicate {})
{
using ValueTp = typename std::iterator_traits<BidirIt>::value_type;
std::unordered_set<ValueTp, std::hash<ValueTp>, BinaryPredicate> seen {};
seen.reserve(std::distance(first, last));
return std::partition(first, last,
[&p, &seen] (const ValueTp& value) {
return seen.insert(value).second;
});
}
Which can be used like:
#include <vector>
#include <iostream>
int main()
{
std::vector<int> vals {1, 1, 2, 4, 5, 5, 5, 7, 7, 9, 10};
const auto it = partition_unique(std::begin(vals), std::end(vals));
std::cout << "Unique values: ";
std::copy(std::begin(vals), it, std::ostream_iterator<int> {std::cout, " "}); // Unique values: 1 10 2 4 5 9 7
std::cout << '\n' << "Duplicate values: ";
std::copy(it, std::end(vals), std::ostream_iterator<int> {std::cout, " "}); // Duplicate values: 7 5 5 1
}
The corresponding stable_partition_unqiue can be achieved by replacing std::partition with std::stable_partition.
The problem with these approaches is that they unnecessarily buffer all unique values in the std::unordered_set (which also adds a hash function requirement), which shouldn't be required as the elements are sorted. It's not too much work to come up with a better implementation for partition_unique, but an implementation of stable_partition_unique seems considerably more difficult, and I'd rather not implement this myself if possible.
Is there a way to use existing algorithms to achieve optimal partition_unique and stable_ partition_unique algorithms?
Create a queue to hold the duplicates. Then, initialize two indexes, src and dest, starting at index 1, and go through the list. If the current item (list[src]) is equal to the previous item (list[dest-1]), then copy it to the queue. Otherwise, copy it to list[dest] and increment dest.
When you've exhausted the list, copy items from the queue to the tail of the original list.
Something like:
Queue dupQueue
int src = 1
int dest = 1
while (src < list.count)
{
if (list[src] == list[dest-1])
{
// it's a duplicate.
dupQueue.push(list[src])
}
else
{
list[dest] = list[src]
++dest
}
++src
}
while (!dupQueue.IsEmpty)
{
list[dest] = dupQueue.pop()
++dest
}
I know the STL has a queue. Whether it has an algorithm similar to the above, I don't know.

Issue with CUDA array compaction using thrust zip_iterator [duplicate]

I have two arrays of integers dmap and dflag on the device of
the same length
and I have wrapped them with thrust device pointers, dmapt and
dflagt
There are some elements in the dmap array with value -1. I want to
remove these -1's and the corresponding values from
the dflag array.
I am using the remove_if function to do this, but I cannot figure out
what the return value of this call is or how I should use this
returned value to get .
( I want to pass these reduced arrays to the reduce_by_key function
where dflagt will be used as the keys. )
I am using the following call for doing the reduction. Please let me
know how I can store the returned value in a variable and
use it to address the individual arrays dflag and dmap
thrust::remove_if(
thrust::make_zip_iterator(thrust::make_tuple(dmapt, dflagt)),
thrust::make_zip_iterator(thrust::make_tuple(dmapt+numindices, dflagt+numindices)),
minus_one_equality_test()
);
where the predicate functor used above is defined as
struct minus_one_equality_test
{
typedef typename thrust::tuple<int,int> Tuple;
__host__ __device__
bool operator()(const Tuple& a )
{
return thrust::get<0>(a) == (-1);
}
}
The return value is a zip_iterator which marks the new end of the sequence of tuples for which your functor returned true during the remove_if call. To access the new end iterator of the underlying array you will need to retrieve a tuple iterator from the zip_iterator; the contents of that tuple are then the new end iterators of the original arrays you used to build the zip_iterator. It is a lot more convoluted in words than in code:
#include <thrust/tuple.h>
#include <thrust/device_vector.h>
#include <thrust/device_ptr.h>
#include <thrust/remove.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/copy.h>
#include <iostream>
struct minus_one_equality_test
{
typedef thrust::tuple<int,int> Tuple;
__host__ __device__
bool operator()(const Tuple& a )
{
return thrust::get<0>(a) == (-1);
};
};
int main(void)
{
const int numindices = 10;
int mapt[numindices] = { 1, 2, -1, 4, 5, -1, 7, 8, -1, 10 };
int flagt[numindices] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
thrust::device_vector<int> vmapt(10);
thrust::device_vector<int> vflagt(10);
thrust::copy(mapt, mapt+numindices, vmapt.begin());
thrust::copy(flagt, flagt+numindices, vflagt.begin());
thrust::device_ptr<int> dmapt = vmapt.data();
thrust::device_ptr<int> dflagt = vflagt.data();
typedef thrust::device_vector< int >::iterator VIt;
typedef thrust::tuple< VIt, VIt > TupleIt;
typedef thrust::zip_iterator< TupleIt > ZipIt;
ZipIt Zend = thrust::remove_if(
thrust::make_zip_iterator(thrust::make_tuple(dmapt, dflagt)),
thrust::make_zip_iterator(thrust::make_tuple(dmapt+numindices, dflagt+numindices)),
minus_one_equality_test()
);
TupleIt Tend = Zend.get_iterator_tuple();
VIt vmapt_end = thrust::get<0>(Tend);
for(VIt x = vmapt.begin(); x != vmapt_end; x++) {
std::cout << *x << std::endl;
}
return 0;
}
If you compile this and run it, you should see something like this:
$ nvcc -arch=sm_12 remove_if.cu
$ ./a.out
1
2
4
5
7
8
10
In this example I only "retrieve" the shorted contents of the first element of the tuple, the second is accessed in the same way, ie. the iterator marking the new end of the vector is thrust::get<1>(Tend).