Find the indices and inverse mapping of a unique vector - c++

I have a std::vector<int> with duplicate values. I can find the unique values using std::unique() and std::vector::erase(), but how can I efficiently find the vector of indices and construct the original vector given the vector of unique values, through an inverse mapping vector. Allow me to illustrate this using an example:
std::vector<int> vec = {3, 2, 3, 3, 6, 5, 5, 6, 2, 6};
std::vector<int> uvec = {3, 2, 6, 5}; // vector of unique values
std::vector<int> idx_vec = {0, 1, 4, 5}; // vector of indices
std::vector<int> inv_vec = {0, 1, 0, 0, 2, 3, 3, 2, 1, 2}; // inverse mapping
The inverse mapping vector is such that with its indices one can construct the original vector using the unique vector i.e.
std::vector<int> orig_vec(ivec.size()); // construct the original vector
std::for_each(ivec.begin(), ivec.end(),
[&uvec,&inv_vec,&orig_vec](int idx) {orig_vec[idx] = uvec[inv_vec[idx]];});
And the indices vector is simply a vector indices of first occurrence of unique values in the original vector.
My rudimentary solution is far from efficient. It does not use STL algorithms and is O(n^2) at worst.
template <typename T>
inline std::tuple<std::vector<T>,std::vector<int>,vector<int>>
unique_idx_inv(const std::vector<T> &a) {
auto size_a = size(a);
std::vector<T> uniques;
std::vector<int> idx; // vector of indices
vector<int> inv(size_a); // vector of inverse mapping
for (auto i=0; i<size_a; ++i) {
auto counter = 0;
for (auto j=0; j<uniques.size(); ++j) {
if (uniques[j]==a[i]) {
counter +=1;
break;
}
}
if (counter==0) {
uniques.push_back(a[i]);
idx.push_back(i);
}
}
for (auto i=0; i<size_a; ++i) {
for (auto j=0; j<uniques.size(); ++j) {
if (uniques[j]==a[i]) {
inv[i] = j;
break;
}
}
}
return std::make_tuple(uniques,idx,inv);
}
Comparing this with the typical std::sort+std::erase+std::unique approach (which by the way only computes unique values and not indices or inverse), I get the following timing on my laptop with g++ -O3 [for a vector of size=10000 with only one duplicate value]
Find uniques+indices+inverse: 145ms
Find only uniques using STL's sort+erase+unique 0.48ms
Of course the two approaches are not exactly identical, as the latter one sorts the indices, but still I believe the solution I have posted above can be optimised considerably. Any thoughts how on I can achieve this?

If I'm not wrong, the following solution should be O(n log(n))
(I've changed the indexes in std::size_t values)
template <typename T>
inline std::tuple<std::vector<T>,
std::vector<std::size_t>,
std::vector<std::size_t>>
unique_idx_inv(const std::vector<T> &a)
{
std::size_t ind;
std::map<T, std::size_t> m;
std::vector<T> uniques;
std::vector<std::size_t> idx;
std::vector<std::size_t> inv;
inv.reserve(a.size());
ind = 0U;
for ( std::size_t i = 0U ; i < a.size() ; ++i )
{
auto e = m.insert(std::make_pair(a[i], ind));
if ( e.second )
{
uniques.push_back(a[i]);
idx.push_back(i);
++ind;
}
inv.push_back(e.first->second);
}
return std::make_tuple(uniques,idx,inv);
}

The O(n^2) arises from your approach to identify duplicates with nested loops over vectors. However, to find out if an element has already been read, a sorted vector or - imho better - an unordered map is more appropriate.
So, without writing the code here, I'd suggest to use an unordered map of the form
unordered_map<int,int>, which can hold both the unique values and the indices. I'm not sure if you still need the vectors for this information, but you can easily derive these vectors from the map.
The complexity should reduce to O(n log(n)).

Related

An efficient algorithm to sample non-duplicate random elements from an array

I'm looking for an algorithm to pick M random elements from a given array. The prerequisites are:
the sampled elements must be unique,
the array to sample from may contain duplicates,
the array to sample from is not necessarily sorted.
This is what I've managed to come up with. Here I'm also making an assumption that the amount of unique elements in the array is greater (or equal) than M.
#include <random>
#include <vector>
#include <algorithm>
#include <iostream>
const std::vector<int> sample(const std::vector<int>& input, size_t n) {
std::random_device rd;
std::mt19937 engine(rd());
std::uniform_int_distribution<int> dist(0, input.size() - 1);
std::vector<int> result;
result.reserve(n);
size_t id;
do {
id = dist(engine);
if (std::find(result.begin(), result.end(), input[id]) == result.end())
result.push_back(input[id]);
} while (result.size() < n);
return result;
}
int main() {
std::vector<int> input{0, 0, 1, 1, 2, 2, 3, 3, 4, 4};
std::vector<int> result = sample(input, 3);
for (const auto& item : result)
std::cout << item << ' ';
std::cout << std::endl;
}
This algorithm does not seem to be the best. Is there a more efficient (with less time complexity) algorithm to solve this task? It would be good if this algorithm could also assert the amount of unique elements in the input array is not less than M (or pick as many unique elements as possible if this is not the case).
Possible solution
As MSalters suggested, I use std::unordered_set to remove duplicates and std::shuffle to shuffle elements in a vector constructed from the set. Then I resize the vector and return it.
const std::vector<int> sample(const std::vector<int>& input, size_t M) {
std::unordered_set<int> rem_dups(input.begin(), input.end());
if (rem_dups.size() < M) M = rem_dups.size();
std::vector<int> result(rem_dups.begin(), rem_dups.end());
std::mt19937 g(std::random_device{}());
std::shuffle(result.begin(), result.end(), g);
result.resize(M);
return result;
}
The comments already note the use of std::set. The additional request to check for M unique elements in the input make that a bit more complicated. Here's an alternative implementation:
Put all inputs in a std::set or std::unordered_set. This removes duplicates.
Copy all elements to the return vector
If that has more than M elements, std::shuffle it and resize it to M elements.
Return it.
Use a set S to store the output, initially empty.
i = 0
while |S| < M && i <= n-1
swap the i'th element of the input with a random greater element
add the newly swapped i'th element to your set if it isn't already there
i++
This will end with S having M distinct elements from your input array (if there are M distinct elements). However, elements which are more common in the input array are more likely to be in S (unless you go through the additional work of eliminating duplicates from the input first).

Repeat elements in a vector [duplicate]

This question already has answers here:
Repeat contents of a std::vector
(3 answers)
Closed 9 months ago.
I have a vector
vector<int>v = {1,2,3,4,5};
I'd like to repeat the elements in the vector for, say, 3 times, such that the vector becoms
v = {1,2,3,4,5, 1,2,3,4,5, 1,2,3,4,5};
EDIT: In fact, if I need to repeat the elements for many times, say 1000, obviously I have to come with something quick and light?
How do I do it?
This can be tricky. If you want to avoid creating a temporary working object you have to be careful to avoid invalidating iterators as you go. This should do it:
std::vector<int> v = {1, 2, 3, 4, 5};
// to avoid invalidating iterators, preallocate the memory
v.reserve(v.size() * 3);
// remember the end of the range to be duplicated
// (this is the iterator we don't want to invalidate)
auto end = std::end(v);
// insert two duplicates
v.insert(std::end(v), std::begin(v), end);
v.insert(std::end(v), std::begin(v), end);
for(auto i: v)
std::cout << i << '\n';
More generally you could modify this to add multiple duplicates like this:
std::vector<int> v = {1, 2, 3, 4, 5};
std::size_t const no_of_duplicates = 1000;
// to avoid invalidating iterators, preallocate the memory
v.reserve(v.size() * no_of_duplicates);
// remember the end of the range to be duplicated
// (this is the iterator we don't want to invalidate)
auto end = std::end(v);
// insert duplicates (start from one because already have the first)
for(std::size_t i = 1; i < no_of_duplicates; ++i)
v.insert(std::end(v), std::begin(v), end);
Use the insert method of vector class
v.insert(v.end(), v.begin(), v.end());
Use std::copy
std::vector<int> v = { 1 , 2, 3, 4, 5};
std::vector<int> r;
for (auto i = 0; i < 3; ++i) {
std::copy(v.begin(), v.end(), std::back_inserter(r));
}
v.swap(r);

Selection sort - loop stops too early

I'm trying to write selection sort. Everything works but my algorithm is not looping through whole vector _item leaving my v_sorted too short. Elements are sorted properly.
sort.hpp
template<typename T>
std::vector<T> selection_sort(std::vector<T>);
sort.cpp
template<typename T>
std::vector<T> selection_sort(std::vector<T> _item) {
std::vector<T> v_sorted;
for(int i = 0; i < _item.size(); ++i) {
T smallest = _item[0];
for(auto const& j : _item) {
if(j < smallest) {
smallest = j;
}
}
v_sorted.push_back(smallest);
auto it = std::find(_item.begin(), _item.end(), smallest);
if (it != _item.end()) {
// to prevent moving all of items in vector
// https://stackoverflow.com/a/15998752
std::swap(*it, _item.back());
_item.pop_back();
}
}
return v_sorted;
}
template std::vector<int> selection_sort(std::vector<int> _item);
sort_tests.hpp
BOOST_AUTO_TEST_CASE(selection_sort_int)
{
std::vector<int> v_unsorted = {3, 1, 2, 7, 6};
std::vector<int> v_sorted = {1, 2, 3, 6, 7};
auto v_test = exl::selection_sort(v_unsorted);
BOOST_CHECK_EQUAL_COLLECTIONS(v_sorted.begin(), v_sorted.end(),
v_test.begin(), v_test.end());
}
This test is failing with Collections size mismatch: 5 != 3. Any test is failing with size mismatch. Loop is stopping (in this case) after three iterations. Thanks in advance for any clues.
The simultaneous effects of the for loop's ++i and the _item.pop_back() has the effect of incrementing twice, when you only wanted to increment once.
Change the for loop to a while loop:
while(!_item.empty())
Live Demo
You are re-implementing std::min_element, and you if you use it, you don't need to find the element again, you also don't want to change the size of _item whilst looping over it's size().
You can also sort in place, as follows:
template<typename T>
std::vector<T> selection_sort(std::vector<T> _item) {
for(auto it = _item.begin(); it != _item.end(); ++it) {
auto smallest = std::min_element(it, _item.end());
std::iter_swap(it, smallest);
}
return _item;
}

C++ Sort based on other int array

suppose i have two vector
std::vector<int>vec_int = {4,3,2,1,5};
std::vector<Obj*>vec_obj = {obj1,obj2,obj3,obj4,obj5};
How do we sort vec_obj in regard of sorted vec_int position?
So the goal may look like this:
std::vector<int>vec_int = {1,2,3,4,5};
std::vector<Obj*>vec_obj = {obj4,obj3,obj2,obj1,obj5};
I've been trying create new vec_array:
for (int i = 0; i < vec_int.size(); i++) {
new_vec.push_back(vec_obj[vec_int[i]]);
}
But i think it's not the correct solution. How do we do this? thanks
std library may be the best solution,but i can't find the correct solution to implement std::sort
You don't have to call std::sort, what you need can be done in linear time (provided the indices are from 1 to N and not repeating)
std::vector<Obj*> new_vec(vec_obj.size());
for (size_t i = 0; i < vec_int.size(); ++i) {
new_vec[i] = vec_obj[vec_int[i] - 1];
}
But of course for this solution you need the additional new_vec vector.
If the indices are arbitrary and/or you don't want to allocate another vector, you have to use a different data structure:
typedef pair<int, Obj*> Item;
vector<Item> vec = {{4, obj1}, {3, obj2}, {2, obj3}, {1, obj4}, {5, obj5}};
std::sort(vec.begin(), vec.end(), [](const Item& l, const Item& r) -> bool {return l.first < r.first;});
Maybe there is a better solution, but personally I would use the fact that items in a std::map are automatically sorted by key. This gives the following possibility (untested!)
// The vectors have to be the same size for this to work!
if( vec_int.size() != vec_obj.size() ) { return 0; }
std::vector<int>::const_iterator intIt = vec_int.cbegin();
std::vector<Obj*>::const_iterator objIt = vec_obj.cbegin();
// Create a temporary map
std::map< int, Obj* > sorted_objects;
for(; intIt != vec_int.cend(); ++intIt, ++objIt )
{
sorted_objects[ *intIt ] = *objIt;
}
// Iterating through map will be in order of key
// so this adds the items to the vector in the desired order.
std::vector<Obj*> vec_obj_sorted;
for( std::map< int, Obj* >::const_iterator sortedIt = sorted_objects.cbegin();
sortedIt != sorted_objects.cend(); ++sortedIt )
{
vec_obj_sorted.push_back( sortedIt->second );
}
[Not sure this fits your usecase, but putting the elements into a map will store the elements sorted by key by default.]
Coming to your precise solution if creation of the new vector is the issue you can avoid this using a simple swap trick (like selection sort)
//Place ith element in its place, while swapping to its position the current element.
for (int i = 0; i < vec_int.size(); i++) {
if (vec_obj[i] != vec_obj[vec_int[i])
swap_elements(i,vec_obj[i],vec_obj[vec_int[i]])
}
The generic form of this is known as "reorder according to", which is a variation of cycle sort. Unlike your example, the index vector needs to have the values 0 through size-1, instead of {4,3,2,1,5} it would need to be {3,2,1,0,4} (or else you have to adjust the example code below). The reordering is done by rotating groups of elements according to the "cycles" in the index vector or array. (In my adjusted example there are 3 "cycles", 1st cycle: index[0] = 3, index[3] = 0. 2nd cycle: index[1] = 2, index[2] = 1. 3rd cycle index[4] = 4). The index vector or array is also sorted in the process. A copy of the original index vector or array can be saved if you want to keep the original index vector or array. Example code for reordering vA according to vI in template form:
template <class T>
void reorder(vector<T>& vA, vector<size_t>& vI)
{
size_t i, j, k;
T t;
for(i = 0; i < vA.size(); i++){
if(i != vI[i]){
t = vA[i];
k = i;
while(i != (j = vI[k])){
// every move places a value in it's final location
vA[k] = vA[j];
vI[k] = k;
k = j;
}
vA[k] = t;
vI[k] = k;
}
}
}
Simple still would be to copy vA to another vector vB according to vI:
for(i = 0; i < vA.size(); i++){
vB[i] = vA[vI[i]];

Finding consecutive elements in a vector

I am working on a problem where I have to create subvectors from a bigger vector. If the elements in the vector are consecutive I have to create a vector of those elements. If there are elements which are not consecutive then a vector of that single elements is created. My logic is as below
vector<int> vect;
for (int nCount=0; nCount < 3; nCount++)
vect.push_back(nCount);
vect.push_back(5);
vect.push_back(8);
vector<int>::iterator itEnd;
itEnd = std::adjacent_find (vect.begin(), vect.end(), NotConsecutive());
The functor NotConsecutiveis as below
return (int first != int second-1);
So I am expecting the std::adjacent_find will give me back the iterators such that I can create vector one{0,1,2,3}, vector two{5} and vector{8}. But I am not sure if there is any simpler way?
Edit:I forgot to mention that I have std::adjacent_find in a loop as
while(itBegin != vect.end())
{
itEnd = std::adjacent_find (vect.begin(), vect.end(), NotConsecutive());
vector<int> groupe;
if( std::distance(itBegin, itEnd) < 1)
{
groupe.assign(itBegin, itBegin+1);
}
else
{
groupe.assign(itBegin, itEnd);
}
if(boost::next(itEnd) != vect.end())
{
itBegin = ++itEnd;
}
else
{
vector<int> last_element.push_back(itEnd);
}
}
Does it make any sense?
I think this is what is being requested. It does not use adjacent_find() but manually iterates through the vector populating a vector<vector<int>> containing the extracted sub-vectors. It is pretty simple, IMO.
#include <iostream>
#include <vector>
#include <algorithm>
int main()
{
std::vector<int> vect { 0, 1, 2, 3, 5, 8 };
// List of subvectors extracted from 'vect'.
// Initially populated with a single vector containing
// the first element from 'vect'.
//
std::vector<std::vector<int>> sub_vectors(1, std::vector<int>(1, vect[0]));
// Iterate over the elements of 'vect',
// skipping the first as it has already been processed.
//
std::for_each(vect.begin() + 1,
vect.end(),
[&](int i)
{
// It the current int is one more than previous
// append to current sub vector.
if (sub_vectors.back().back() == i - 1)
{
sub_vectors.back().push_back(i);
}
// Otherwise, create a new subvector contain
// a single element.
else
{
sub_vectors.push_back(std::vector<int>(1, i));
}
});
for (auto const& v: sub_vectors)
{
for (auto i: v) std::cout << i << ", ";
std::cout << std::endl;
}
}
Output:
0, 1, 2, 3,
5,
8,
See demo at http://ideone.com/ZM9ssk.
Due to the limitations of std::adjacent_find you can't use it quite the way you want to. However it can still be useful.
What you can do is to iterate over the collection, and use std::adjacent_find in a loop, with the last returned iterator (or your outer loop iterator for the first call) until it returns end. Then you will have a complete set of consecutive elements. Then continue the outer loop from where the last call to std::adjacent_find returned a non-end iterator.
Honestly, I don't find any clear disadvantage of using a simple hand-crafted loop instead of standard functions:
void split(const std::vector<int> &origin, vector<vector<int> > &result)
{
result.clear();
if(origin.empty()) return;
result.resize(1);
result[0].push_back(origin[0]);
for(size_t i = 1; i < origin.size(); ++i)
{
if(origin[i] != origin[i-1] + 1) result.push_back(vector<int>());
result.back().push_back(origin[i]);
}
}