performing vector intersection in C++ - c++

I have a vector of vector of unsigned. I need to find the intersection of all these vector of unsigned's for doing so I wrote the following code:
int func()
vector<vector<unsigned> > t;
vector<unsigned> intersectedValues;
bool firstIntersection=true;
for(int i=0;i<(t).size();i++)
vector<unsigned> tempIntersectedSubjects;
t[i].end(), intersectedValues.begin(),
std::inserter(tempIntersectedSubjects, tempIntersectedSubjects.begin()));
Each individual vector has 9000 elements and there are many such vectors in "t". When I profiled my code I found that set_intersection takes the maximum amount of time and hence makes the code slow when there are many invocations of func(). Can someone please suggest as to how can I make the code more efficient.
I am using: gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15)
EDIT: Individual vectors in vector "t" are sorted.

I don't have a framework to profile the operations but I'd certainly change the code to reuse the readily allocated vector. In addition, I'd hoist the initial intersection out of the loop. Also, std::back_inserter() should make sure that elements are added in the correct location rather than in the beginning:
int func()
vector<vector<unsigned> > t = some_initialization();
if (t.empty()) {
vector<unsigned> intersectedValues(t[0]);
vector<unsigned> tempIntersectedSubjects;
for (std::vector<std::vector<unsigned>>::size_type i(1u);
i < t.size() && !intersectedValues.empty(); ++i) {
std::set_intersection(t[i].begin(), t[i].end(),
intersectedValues.begin(), intersectedValues.end(),
std::swap(intersectedValues, tempIntersectedSubjects);
I think this code has a fair chance to be faster. It may also be reasonable to intersect the sets different: instead of keeping one set and intersecting with that you could create a new intersection for pairs of adjacent sets and then intersect the first sets with their respect adjacent ones:
std::vector<std::vector<unsigned>> intersections(
std::vector<std::vector<unsigned>> const& t) {
std::vector<std::vector<unsigned>> r;
std::vector<std::vector<unsignned>>::size_type i(0);
for (; i + 1 < t.size(); i += 2) {
r.push_back(intersect(t[i], t[i + 1]));
if (i < t.size()) {
return r;
std::vector<unsigned> func(std::vector<std::vector<unsigned>> const& t) {
if (t.empty()) { /* deal with t being empty... */ }
std::vector<std::vector<unsigned>> r(intersections(t))
return r.size() == 1? r[0]: func(r);
Of course, you wouldn't really implement it like this: you'd use Stepanov's binary counter to keep the intermediate sets. This approach assumes that the result is most likely non-empty. If the expectation is that the result will be empty that may not be an improvement.

I can't test this but maybe something like this would be faster?
int func()
vector<vector<unsigned> > t;
vector<unsigned> intersectedValues;
// remove if() branching from loop
return -1;
intersectedValues = t[0];
// now start from 1
for(size_t i = 1; i < t.size(); ++i)
vector<unsigned> tempIntersectedSubjects;
tempIntersectedSubjects.reserve(intersectedValues.size()); // pre-allocate
// insert at end() not begin()
t[i].end(), intersectedValues.begin(),
std::inserter(tempIntersectedSubjects, tempIntersectedSubjects.end()));
// as these are not used again you can move them rather than copy
intersectedValues = std::move(tempIntersectedSubjects);
return 0;
Another possibility:
Thinking about it using swap() could optimize the exchange of data and remove the need to re-allocate. Also then the temp constructor can be moved out of the loop.
int func()
vector<vector<unsigned> > t;
vector<unsigned> intersectedValues;
// remove if() branching from loop
return -1;
intersectedValues = t[0];
// no need to construct this every loop
vector<unsigned> tempIntersectedSubjects;
// now start from 1
for(size_t i = 1; i < t.size(); ++i)
// should already be the correct size from previous loop
// but just in case this should be cheep
// (profile removing this line)
// insert at end() not begin()
t[i].end(), intersectedValues.begin(),
std::inserter(tempIntersectedSubjects, tempIntersectedSubjects.end()));
// swap should leave tempIntersectedSubjects preallocated to the
// correct size
tempIntersectedSubjects.clear(); // will not deallocate
return 0;

You can make std::set_intersection as well as a bunch of other standard library algorithms run in parallel by defining _GLIBCXX_PARALLEL during compilation. That probably has the best work-gain ratio. For documentation see this.
Obligatory pitfall warning:
Note that the _GLIBCXX_PARALLEL define may change the sizes and behavior of standard class templates such as std::search, and therefore one can only link code compiled with parallel mode and code compiled without parallel mode if no instantiation of a container is passed between the two translation units. Parallel mode functionality has distinct linkage, and cannot be confused with normal mode symbols.
from here.
Another simple, though probably insignificantly small, optimization would be reserving enough space before filling your vectors.
Also, try to find out whether inserting the values at the back instead of the front and then reversing the vector helps. (Although I even think that your code is wrong right now and your intersectedValues is sorted the wrong way. If I'm not mistaken, you should use std::back_inserter instead of std::inserter(...,begin) and then not reverse.) While shifting stuff through memory is pretty fast, not shifting should be even faster.

To copy elements from vectors from vector for loop with emplace_back() may save your time. And no need of a flag if you change the iterator index of for loop. So for loop can be optimized, and condition check can be removed for each iteration.
void func()
vector<vector<unsigned > > t;
vector<unsigned int > intersectedValues;
for(unsigned int i=1;i<(t).size();i++)
vector<unsigned > tempIntersectedSubjects;
t[i].end(), intersectedValues.begin(),
for(auto &ele: tempIntersectedSubjects)
if( intersectedValues.empty())

set::set_intersection can be rather slow for large vectors. It's possible use create a similar function that uses lower_bound. Something like this:
template<typename Iterator1, typename Iterator2, typename Function>
void lower_bound_intersection(Iterator1 begin_1, Iterator1 end_1, Iterator2 begin_2, Iterator2 end_2, Function func)
for (; begin_1 != end_1 && begin_2 != end_2;)
if (*begin_1 < *begin_2)
begin_1 = begin_1.lower_bound(*begin_2);
else if (*begin_2 < *begin_1)
begin_2 = begin_2.lower_bound(*begin_1);
else // equivalent


C++ Simplify loop over map and extending/overwriting of vector

std::vector<int> vec1 of size s_vec and capacity c.
std::vector<int> vec2.
std::map<int, int> m of size s_m >= s_vec.
std::unordered_set<int> flags.
bool flag = False
I want to copy as many values of m (in order) into vec1 (overwriting previous values) without exceeding the capacity c. If any values remain I want to push those values to the end of vec2. For each of these, values I want to check if they are in flags. If they are, I'd like to set flag to true.
This is how I currently, achieve this:
int i = 0;
for (auto const& e : m) {
if(i < c) {
if(i == vec1.size()) {
} else { = e.second;
} else {
flag = true;
I am new to C++ coming from python and R. Therefore, I assume that this can be simplified quite a bit (with iterators?). What can I do to improve the code here?
Your code must increment i at the end of each loop for it to work.
If you can use c++20 and its ranges, I would probably rewrite it completely, to something like:
using namespace std::views; // for simplicity here
std::ranges::copy(m | take(c) | values, vec1.begin());
std::ranges::copy(m | drop(c) | values, std::back_inserter(vec2));
flag = std::ranges::any_of(vec2, [&flags](int i){return flags.contains(i);});
The beauty of this, is that it matches your requirements much better.
The first lines does: "I want to copy as many values of m (in order) into vec1 (overwriting previous values) without exceeding the capacity c."
The second line does: "If any values remain I want to push those values to the end of vec2."
The third line does: "For each of these, values I want to check if they are in flags. If they are, I'd like to set flag to true."
Building on the comments of #PaulMcKenzie and the answers provided by #Nelfeal and #cptFracassa, this is what I ended up with.
size_t new_size = std::min(vec1.capacity(), m.size());
std::next(m.begin(), new_size),
[](std::pair<int, int> p) { return p.second; });
std::transform(std::next(m.begin(), new_size),
[&flags, &flag](std::pair<int, int> p) {
if(flags.count(p.second)) {
flag = true;
return p.second;
In the first part, instead of doing either push_back or assignment to at, you can just clear the vector and push_back everything. clear does not change the capacity.
Your loop is doing two different things, one after the other (and by the way, I assume you forgot to increment i). You should split it into two loops.
With all that, your code becomes:
auto it = m.begin();
for (int i = 0; i < c; ++i) {
while (it != m.end()) {
flag = true;
At this point, you can also use standard algorithms (std::copy, std::transform as mentioned in the comments).

Rotate elements in a vector and how to return a vector

c++ newbie here. So for an assignment I have to rotate all the elements in a vector to the left one. So, for instance, the elements {1,2,3} should rotate to {2,3,1}.
I'm researching how to do it, and I saw the rotate() function, but I don't think that will work given my code. And then I saw a for loop that could do it, but I'm not sure how to translate that into a return statement. (i tried to adjust it and failed)
This is what I have so far, but it is very wrong (i haven't gotten a single result that hasn't ended in an error yet)
Edit: The vector size I have to deal with is just three, so it doesn't need to account for any sized vector
#include <vector>
using namespace std;
vector<int> rotate(const vector<int>& v)
vector<int> result;
int size = 3;
for (auto i = 0; i < size - 1; ++i)
{ = + 1); =;
return result;
All my teacher does it upload textbook pages that explain what certain parts of code are supposed to do but the textbook pages offer NO help in trying to figure out how to actually apply this stuff.
So could someone please give me a few pointers?
Since you know exactly how many elements you have, and it's the smallest number that makes sense to rotate, you don't need to do anything fancy - just place the items in the order that you need, and return the result:
vector<int> rotate3(const vector<int>& x) {
return vector<int> { x[1], x[2], x[0] };
Note that if your collection always has three elements, you could use std::array instead:
First, just pay attention that you have passed v as const reference (const vector<int>&) so you are forbbiden to modify the state of v in = + 1);
Although Sergey has already answered a straight forward solution, you could correct your code like this:
#include <vector>
using namespace std;
vector<int> left_rotate(const vector<int>& v)
vector<int> result;
int size = v.size(); // this way you are able to rotate vectors of any size
for (auto i = 1; i < size; ++i)
// adding first element of v at end of result
return result;
Use Sergey's answer. This answer deals with why what the asker attempted did not work. They're damn close, so it's worth going though it, explaining the problems, and showing how to fix it.
In = + 1);
v is constant. You can't write to it. The naïve solution (which won't work) is to cut out the middle-man and write directly to the result vector because it is NOT const = + 1);
This doesn't work because
vector<int> result;
defines an empty vector. There is no at(i) to write to, so at throws an exception that terminates the program.
As an aside, the [] operator does not check bounds like at does and will not throw an exception. This can lead you to thinking the program worked when instead it was writing to memory the vector did not own. This would probably crash the program, but it doesn't have to1.
The quick fix here is to ensure usable storage with
vector<int> result(v.size());
The resulting code
vector<int> rotate(const vector<int>& v)
vector<int> result(v.size()); // change here to size the vector
int size = 3;
for (auto i = 0; i < size - 1; ++i)
{ = + 1); // change here to directly assign to result
return result;
almost works. But when we run it on {1, 2, 3} result holds {2, 3, 0} at the end. We lost the 1. That's because + 1) never touches the first element of v. We could increase the number of for loop iterations and use the modulo operator
vector<int> rotate(const vector<int>& v)
vector<int> result(v.size());
int size = 3;
for (auto i = 0; i < size; ++i) // change here to iterate size times
{ = + 1) % size); // change here to make i + 1 wrap
return result;
and now the output is {2, 3, 1}. But it's just as easy, and probably a bit faster, to just do what we were doing and tack on the missing element after the loop.
vector<int> rotate(const vector<int>& v)
vector<int> result(v.size());
int size = 3;
for (auto i = 0; i < size - 1; ++i)
{ = + 1);
} - 1) =; // change here to store first element
return result;
Taking this a step further, the size of three is an unnecessary limitation for this function that I would get rid of and since we're guaranteeing that we never go out of bounds in our for loop, we don't need the extra testing in at
vector<int> rotate(const vector<int>& v)
if (v.empty()) // nothing to rotate.
return vector<int>{}; // return empty result
vector<int> result(v.size());
for (size_t i = 0; i < v.size() - 1; ++i) // Explicitly using size_t because
// 0 is an int, and v.size() is an
// unsigned integer of implementation-
// defined size but cannot be larger
// than size_t
// note v.size() - 1 is only safe because
// we made sure v is not empty above
// otherwise 0 - 1 in unsigned math
// Becomes a very, very large positive
// number
result[i] = v[i + 1];
result.back() = v.front(); // using direct calls to front and back because it's
// a little easier on my mind than math and []
return result;
We can go further still and use iterators and range-based for loops, but I think this is enough for now. Besides at the end of the day, you throw the function out completely and use std::rotate from the <algorithm> library.
1This is called Undefined Behaviour (UB), and one of the most fearsome things about UB is anything could happen including giving you the expected result. We put up with UB because it makes for very fast, versatile programs. Validity checks are not made where you don't need them (along with where you did) unless the compiler and library writers decide to make those checks and give guaranteed behaviour like an error message and crash. Microsoft, for example, does exactly this in the vector implementation in the implementation used when you make a debug build. The release version of Microsoft's vector make no checks and assumes you wrote the code correctly and would prefer the executable to be as fast as possible.
I saw the rotate() function, but I don't think that will work given my code.
Yes it will work.
When learning there is gain in "reinventing the wheel" (e.g. implementing rotate yourself) and there is also gain in learning how to use the existing pieces (e.g. use standard library algorithm functions).
Here is how you would use std::rotate from the standard library:
std::vector<int> rotate_1(const std::vector<int>& v)
std::vector<int> result = v;
std::rotate(result.begin(), result.begin() + 1, result.end());
return result;

How to merge sorted vectors into a single vector in C++

I have 10,000 vector<pair<unsigned,unsigned>> and I want to merge them into a single vector such that it is lexicographically sorted and does not contain duplicates. In order to do so I wrote the following code. However, to my surprise the below code is taking a lot of time. Can someone please suggest as to how can I reduce the running time of my code?
using obj = pair<unsigned, unsigned>
vector< vector<obj> > vecOfVec; // 10,000 vector<obj>, each sorted with size()=10M
vector<obj> result;
for(auto it=vecOfVec.begin(), l=vecOfVec.end(); it!=l; ++it)
// append vectors
// sort result
std::sort(result.begin(), result.end());
// remove duplicates from result
result.erase(std::unique(result.begin(), result.end()), result.end());
I think you should use the fact that the vector in vectOfVect are sorted.
So detecting the min value in the front on the single vectors, push_back() it in the result and remove all the values detected from the front of the vectors matching the min values (avoiding duplicates in result).
If you can delete the vecOfVec variable, something like (caution: code not tested: just to give an idea)
while ( vecOfVec.size() )
// detect the minimal front value
auto itc = vecOfVec.cbegin();
auto lc = vecOfVec.cend();
auto valMin = itc->front();
while ( ++itc != lc )
valMin = std::min(valMin, itc->front());
// push_back() the minimal front value in result
for ( auto it = vecOfVec.begin() ; it != vecOfVec.end() ; )
// remove all the front values equals to valMin (this remove the
// duplicates from result)
while ( (false == it->empty()) && (valMin == it->front()) )
// when a vector is empty is removed
it = ( it->empty() ? vecOfVec.erase(it) : ++it );
If you can, I suggest you to switch vecOfVec from a vector< vector<obj> > to something that permit an efficient removal from the front of single containers (stacks?) and an efficient removal of single containers (a list?).
If there are lot of duplicates, you should use set rather than vector for your result, as set is the most natural thing to store something without duplicates:
set< pair<unsigned,unsigned> > resultSet;
for (auto it=vecOfVec.begin(); it!=vecOfVec.end(); ++it)
resultSet.insert(it->begin(), it->end());
If you need to turn it into a vector, you can write
vector< pair<unsigned,unsigned> > resultVec(resultSet.begin(), resultSet.end());
Note that since your code runs over 800 billion elements, it would still take a lot of time, no matter what. At least hours, if not days.
Other ideas are:
recursively merge vectors (10000 -> 5000 -> 2500 -> ... -> 1)
to merge 10000 vectors, store 10000 iterators in a heap structure
One problem with your code is the excessive use of std::sort. Unfortunately, the quicksort algorithm (which usually is the working horse used by std::sort) is not particularly faster when encountering an already sorted array.
Moreover, you're not exploiting the fact that your initial vectors are already sorted. This can be exploited by using a heap of their next values, when you will not need to call sort again. This may be coded as follows (code tested using obj=int), but perhaps it can be made more concise.
// represents the next unused entry in one vector<obj>
template<typename obj>
struct feed
typename std::vector<obj>::const_iterator current, end;
feed(std::vector<obj> const&v)
: current(v.begin()), end(v.end()) {}
friend bool operator> (feed const&l, feed const&r)
{ return *(l.current) > *(r.current); }
// - returns the smallest element
// - set corresponding feeder to next and re-establish the heap
template<typename obj>
obj get_next(std::vector<feed<obj>>&heap)
auto&f = heap[0];
auto x = *(f.current++);
if(f.current == f.end) {
} else
return x;
template<typename obj>
std::vector<obj> merge(std::vector<std::vector<obj>>const&vecOfvec)
// create min heap of feed<obj> and count total number of objects
std::vector<feed<obj>> input;
size_t num_total = 0;
for(auto const&v:vecOfvec)
if(v.size()) {
num_total += v.size();
// append values in ascending order, avoiding duplicates
std::vector<obj> result;
while(!input.empty()) {
auto x = get_next(input);
while(!input.empty() &&
!(*(input[0].current) > x)) // remove duplicates
return result;

How to remove elements from a vector based on a condition in another vector?

I have two equal length vectors from which I want to remove elements based on a condition in one of the vectors. The same removal operation should be applied to both so that the indices match.
I have come up with a solution using std::erase, but it is extremely slow:
vector<myClass> a = ...;
vector<otherClass> b = ...;
assert(a.size() == b.size());
for(size_t i=0; i<a.size(); i++)
if( !a[i].alive() )
a.erase(a.begin() + i);
b.erase(b.begin() + i);
Is there a way that I can do this more efficiently and preferably using stl algorithms?
If order doesn't matter you could swap the elements to the back of the vector and pop them.
for(size_t i=0; i<a.size();)
if( !a[i].alive() )
std::swap(a[i], a.back());
std::swap(b[i], b.back());
If you have to maintain the order you could use std::remove_if. See this answer how to get the index of the dereferenced element in the remove predicate:
a.erase(remove_if(begin(a), end(a),
[b&](const myClass& d) { return b[&d - &*begin(a)].alive(); }),
b.erase(remove_if(begin(b), end(b),
[](const otherClass& d) { return d.alive(); }),
The reason it's slow is probably due to the O(n^2) complexity. Why not use list instead? As making a pair of a and b is a good idea too.
A quick win would be to run the loop backwards: i.e. start at the end of the vector. This tends to minimise the number of backward shifts due to element removal.
Another approach would be to consider std::vector<std::unique_ptr<myClass>> etc.: then you'll be essentially moving pointers rather than values.
I propose you create 2 new vectors, reserve memory and swap vectors content in the end.
vector<myClass> a = ...;
vector<otherClass> b = ...;
vector<myClass> new_a;
vector<myClass> new_b;
assert(a.size() == b.size());
for(size_t i=0; i<a.size(); i++)
if( a[i].alive() )
swap(a, new_a);
swap(b, new_b);
It can be memory consumed, but should work fast.
erasing from the middle of a vector is slow due to it needing to reshuffle everything after the deletion point. consider using another container instead that makes erasing quicker. It depends on your use cases, will you be iterating often? does the data need to be in order? If you aren't iterating often, consider a list. if you need to maintain order, consider a set. if you are iterating often and need to maintain order, depending on the number of elements, it may be quicker to push back all alive elements to a new vector and set a/b to point to that instead.
Also, since the data is intrinsically linked, it seems to make sense to have just one vector containing data a and b in a pair or small struct.
For performance reason need to use next.
vector<pair<myClass, otherClass>>
as say #Basheba and std::sort.
Use special form of std::sort with comparision predicate. And do not enumerate from 0 to n. Use std::lower_bound instead, becouse vector will be sorted. Insertion of element do like say CashCow in this question: "how do you insert the value in a sorted vector?"
I had a similar problem where I had two :
std::<Eigen::Vector3d> points;
std::<Eigen::Vector3d> colors;
for 3D pointclouds in Open3D and after removing the floor, I wanted to delete all points and colors if the points' z coordinate is greater than 0.05. I ended up overwriting the points based on the index and resizing the vector afterward.
bool invert = true;
std::vector<bool> mask = std::vector<bool>(points.size(), invert);
size_t pos = 0;
for (auto & point : points) {
if (point(2) < CONSTANTS::FLOOR_HEIGHT) { = false;
size_t counter = 0;
for (size_t i = 0; i < points.size(); i++) {
if (mask[i]) { =; =;
This maintains order and at least in my case, worked almost twice as fast than the remove_if method from the accepted answer:
for 921600 points the runtimes were:
33 ms for the accepted answer
17 ms for this approach.

Reorder vector using a vector of indices [duplicate]

This question already has answers here:
How do I sort a std::vector by the values of a different std::vector? [duplicate]
(13 answers)
Closed 12 months ago.
I'd like to reorder the items in a vector, using another vector to specify the order:
char A[] = { 'a', 'b', 'c' };
size_t ORDER[] = { 1, 0, 2 };
vector<char> vA(A, A + sizeof(A) / sizeof(*A));
vector<size_t> vOrder(ORDER, ORDER + sizeof(ORDER) / sizeof(*ORDER));
reorder_naive(vA, vOrder);
// A is now { 'b', 'a', 'c' }
The following is an inefficient implementation that requires copying the vector:
void reorder_naive(vector<char>& vA, const vector<size_t>& vOrder)
assert(vA.size() == vOrder.size());
vector vCopy = vA; // Can we avoid this?
for(int i = 0; i < vOrder.size(); ++i)
vA[i] = vCopy[ vOrder[i] ];
Is there a more efficient way, for example, that uses swap()?
This algorithm is based on chmike's, but the vector of reorder indices is const. This function agrees with his for all 11! permutations of [0..10]. The complexity is O(N^2), taking N as the size of the input, or more precisely, the size of the largest orbit.
See below for an optimized O(N) solution which modifies the input.
template< class T >
void reorder(vector<T> &v, vector<size_t> const &order ) {
for ( int s = 1, d; s < order.size(); ++ s ) {
for ( d = order[s]; d < s; d = order[d] ) ;
if ( d == s ) while ( d = order[d], d != s ) swap( v[s], v[d] );
Here's an STL style version which I put a bit more effort into. It's about 47% faster (that is, almost twice as fast over [0..10]!) because it does all the swaps as early as possible and then returns. The reorder vector consists of a number of orbits, and each orbit is reordered upon reaching its first member. It's faster when the last few elements do not contain an orbit.
template< typename order_iterator, typename value_iterator >
void reorder( order_iterator order_begin, order_iterator order_end, value_iterator v ) {
typedef typename std::iterator_traits< value_iterator >::value_type value_t;
typedef typename std::iterator_traits< order_iterator >::value_type index_t;
typedef typename std::iterator_traits< order_iterator >::difference_type diff_t;
diff_t remaining = order_end - 1 - order_begin;
for ( index_t s = index_t(), d; remaining > 0; ++ s ) {
for ( d = order_begin[s]; d > s; d = order_begin[d] ) ;
if ( d == s ) {
-- remaining;
value_t temp = v[s];
while ( d = order_begin[d], d != s ) {
swap( temp, v[d] );
-- remaining;
v[s] = temp;
And finally, just to answer the question once and for all, a variant which does destroy the reorder vector (filling it with -1's). For permutations of [0..10], It's about 16% faster than the preceding version. Because overwriting the input enables dynamic programming, it is O(N), asymptotically faster for some cases with longer sequences.
template< typename order_iterator, typename value_iterator >
void reorder_destructive( order_iterator order_begin, order_iterator order_end, value_iterator v ) {
typedef typename std::iterator_traits< value_iterator >::value_type value_t;
typedef typename std::iterator_traits< order_iterator >::value_type index_t;
typedef typename std::iterator_traits< order_iterator >::difference_type diff_t;
diff_t remaining = order_end - 1 - order_begin;
for ( index_t s = index_t(); remaining > 0; ++ s ) {
index_t d = order_begin[s];
if ( d == (diff_t) -1 ) continue;
-- remaining;
value_t temp = v[s];
for ( index_t d2; d != s; d = d2 ) {
swap( temp, v[d] );
swap( order_begin[d], d2 = (diff_t) -1 );
-- remaining;
v[s] = temp;
In-place reordering of vector
Warning: there is an ambiguity about the semantic what the ordering-indices mean. Both are answered here
move elements of vector to the position of the indices
Interactive version here.
#include <iostream>
#include <vector>
#include <assert.h>
using namespace std;
void REORDER(vector<double>& vA, vector<size_t>& vOrder)
assert(vA.size() == vOrder.size());
// for all elements to put in place
for( int i = 0; i < vA.size() - 1; ++i )
// while the element i is not yet in place
while( i != vOrder[i] )
// swap it with the element at its final place
int alt = vOrder[i];
swap( vA[i], vA[alt] );
swap( vOrder[i], vOrder[alt] );
int main()
std::vector<double> vec {7, 5, 9, 6};
std::vector<size_t> inds {1, 3, 0, 2};
REORDER(vec, inds);
for (size_t vv = 0; vv < vec.size(); ++vv)
std::cout << vec[vv] << std::endl;
return 0;
note that you can save one test because if n-1 elements are in place the last nth element is certainly in place.
On exit vA and vOrder are properly ordered.
This algorithm performs at most n-1 swapping because each swap moves the element to its final position. And we'll have to do at most 2N tests on vOrder.
draw the elements of vector from the position of the indices
Try it interactively here.
#include <iostream>
#include <vector>
#include <assert.h>
template<typename T>
void reorder(std::vector<T>& vec, std::vector<size_t> vOrder)
assert(vec.size() == vOrder.size());
for( size_t vv = 0; vv < vec.size() - 1; ++vv )
if (vOrder[vv] == vv)
size_t oo;
for(oo = vv + 1; oo < vOrder.size(); ++oo)
if (vOrder[oo] == vv)
std::swap( vec[vv], vec[vOrder[vv]] );
std::swap( vOrder[vv], vOrder[oo] );
int main()
std::vector<double> vec {7, 5, 9, 6};
std::vector<size_t> inds {1, 3, 0, 2};
reorder(vec, inds);
for (size_t vv = 0; vv < vec.size(); ++vv)
std::cout << vec[vv] << std::endl;
return 0;
It appears to me that vOrder contains a set of indexes in the desired order (for example the output of sorting by index). The code example here follows the "cycles" in vOrder, where following a sub-set (could be all of vOrder) of indexes will cycle through the sub-set, ending back at the first index of the sub-set.
Wiki article on "cycles"
In the following example, every swap places at least one element in it's proper place. This code example effectively reorders vA according to vOrder, while "unordering" or "unpermuting" vOrder back to its original state (0 :: n-1). If vA contained the values 0 through n-1 in order, then after reorder, vA would end up where vOrder started.
template <class T>
void reorder(vector<T>& vA, vector<size_t>& vOrder)
assert(vA.size() == vOrder.size());
// for all elements to put in place
for( size_t i = 0; i < vA.size(); ++i )
// while vOrder[i] is not yet in place
// every swap places at least one element in it's proper place
while( vOrder[i] != vOrder[vOrder[i]] )
swap( vA[vOrder[i]], vA[vOrder[vOrder[i]]] );
swap( vOrder[i], vOrder[vOrder[i]] );
This can also be implemented a bit more efficiently using moves instead swaps. A temp object is needed to hold an element during the moves. Example C code, reorders A[] according to indexes in I[], also sorts I[] :
void reorder(int *A, int *I, int n)
int i, j, k;
int tA;
/* reorder A according to I */
/* every move puts an element into place */
/* time complexity is O(n) */
for(i = 0; i < n; i++){
if(i != I[i]){
tA = A[i];
j = i;
while(i != (k = I[j])){
A[j] = A[k];
I[j] = j;
j = k;
A[j] = tA;
I[j] = j;
If it is ok to modify the ORDER array then an implementation that sorts the ORDER vector and at each sorting operation also swaps the corresponding values vector elements could do the trick, I think.
A survey of existing answers
You ask if there is "a more efficient way". But what do you mean by efficient and what are your requirements?
Potatoswatter's answer works in O(N²) time with O(1) additional space and doesn't mutate the reordering vector.
chmike and rcgldr give answers which use O(N) time with O(1) additional space, but they achieve this by mutating the reordering vector.
Your original answer allocates new space and then copies data into it while Tim MB suggests using move semantics. However, moving still requires a place to move things to and an object like an std::string has both a length variable and a pointer. In other words, a move-based solution requires O(N) allocations for any objects and O(1) allocations for the new vector itself. I explain why this is important below.
Preserving the reordering vector
We might want that reordering vector! Sorting costs O(N log N). But, if you know you'll be sorting several vectors in the same way, such as in a Structure of Arrays (SoA) context, you can sort once and then reuse the results. This can save a lot of time.
You might also want to sort and then unsort data. Having the reordering vector allows you to do this. A use case here is for performing genomic sequencing on GPUs where maximal speed efficiency is obtained by having sequences of similar lengths processed in batches. We cannot rely on the user providing sequences in this order so we sort and then unsort.
So, what if we want the best of all worlds: O(N) processing without the costs of additional allocation but also without mutating our ordering vector (which we might, after all, want to reuse)? To find that world, we need to ask:
Why is extra space bad?
There are two reasons you might not want to allocate additional space.
The first is that you don't have much space to work with. This can occur in two situations: you're on an embedded device with limited memory. Usually this means you're working with small datasets, so the O(N²) solution is probably fine here. But it can also happen when you are working with really large datasets. In this case O(N²) is unacceptable and you have to use one of the O(N) mutating solutions.
The other reason extra space is bad is because allocation is expensive. For smaller datasets it can cost more than the actual computation. Thus, one way to achieve efficiency is to eliminate allocation.
When we mutate the ordering vector we are doing so as a way to indicate whether elements are in their permuted positions. Rather than doing this, we could use a bit-vector to indicate that same information. However, if we allocate the bit vector each time that would be expensive.
Instead, we could clear the bit vector each time by resetting it to zero. However, that incurs an additional O(N) cost per function use.
Rather, we can store a "version" value in a vector and increment this on each function use. This gives us O(1) access, O(1) clear, and an amoritzed allocation cost. This works similarly to a persistent data structure. The downside is that if we use an ordering function too often the version counter needs to be reset, though the O(N) cost of doing so is amortized.
This raises the question: what is the optimal data type for the version vector? A bit-vector maximizes cache utilization but requires a full O(N) reset after each use. A 64-bit data type probably never needs to be reset, but has poor cache utilization. Experimenting is the best way to figure this out.
Two types of permutations
We can view an ordering vector as having two senses: forward and backward. In the forward sense, the vector tell us where elements go to. In the backward sense, the vector tells us where elements are coming from. Since the ordering vector is implicitly a linked list, the backward sense requires O(N) additional space, but, again, we can amortize the allocation cost. Applying the two senses sequentially brings us back to our original ordering.
Running single-threaded on my "Intel(R) Xeon(R) E-2176M CPU # 2.70GHz", the following code takes about 0.81ms per reordering for sequences 32,767 elements long.
Fully commented code for both senses with tests:
#include <algorithm>
#include <cassert>
#include <random>
#include <stack>
#include <stdexcept>
#include <vector>
///#brief Reorder a vector by moving its elements to indices indicted by another
/// vector. Takes O(N) time and O(N) space. Allocations are amoritzed.
///#param[in,out] values Vector to be reordered
///#param[in] ordering A permutation of the vector
///#param[in,out] visited A black-box vector to be reused between calls and
/// shared with with `backward_reorder()`
template<class ValueType, class OrderingType, class ProgressType>
void forward_reorder(
std::vector<ValueType> &values,
const std::vector<OrderingType> &ordering,
std::vector<ProgressType> &visited
throw std::runtime_error("ordering and values must be the same size!");
//Size the visited vector appropriately. Since vectors don't shrink, this will
//shortly become large enough to handle most of the inputs. The vector is 1
//larger than necessary because the first element is special.
if(visited.empty() || visited.size()-1<values.size());
//If the visitation indicator becomes too large, we reset everything. This is
//O(N) expensive, but unlikely to occur in most use cases if an appropriate
//data type is chosen for the visited vector. For instance, an unsigned 32-bit
//integer provides ~4B uses before it needs to be reset. We subtract one below
//to avoid having to think too much about off-by-one errors. Note that
//choosing the biggest data type possible is not necessarily a good idea!
//Smaller data types will have better cache utilization.
std::fill(visited.begin(), visited.end(), 0);
//We increment the stored visited indicator and make a note of the result. Any
//value in the visited vector less than `visited_indicator` has not been
const auto visited_indicator =;
//For doing an early exit if we get everything in place
auto remaining = values.size();
//For all elements that need to be placed
for(size_t s=0;s<ordering.size() && remaining>0;s++){
//Ignore already-visited elements
//Don't rearrange if we don't have to
//Follow this cycle, putting elements in their places until we get back
//around. Use move semantics for speed.
auto temp = std::move(values[s]);
auto i = s;
std::swap(temp, values[ordering[i]]);
visited[i+1] = visited_indicator;
std::swap(temp, values[s]);
visited[i+1] = visited_indicator;
///#brief Reorder a vector by moving its elements to indices indicted by another
/// vector. Takes O(2N) time and O(2N) space. Allocations are amoritzed.
///#param[in,out] values Vector to be reordered
///#param[in] ordering A permutation of the vector
///#param[in,out] visited A black-box vector to be reused between calls and
/// shared with with `forward_reorder()`
template<class ValueType, class OrderingType, class ProgressType>
void backward_reorder(
std::vector<ValueType> &values,
const std::vector<OrderingType> &ordering,
std::vector<ProgressType> &visited
//The orderings form a linked list. We need O(N) memory to reverse a linked
//list. We use `thread_local` so that the function is reentrant.
thread_local std::stack<OrderingType> stack;
throw std::runtime_error("ordering and values must be the same size!");
//Size the visited vector appropriately. Since vectors don't shrink, this will
//shortly become large enough to handle most of the inputs. The vector is 1
//larger than necessary because the first element is special.
if(visited.empty() || visited.size()-1<values.size());
//If the visitation indicator becomes too large, we reset everything. This is
//O(N) expensive, but unlikely to occur in most use cases if an appropriate
//data type is chosen for the visited vector. For instance, an unsigned 32-bit
//integer provides ~4B uses before it needs to be reset. We subtract one below
//to avoid having to think too much about off-by-one errors. Note that
//choosing the biggest data type possible is not necessarily a good idea!
//Smaller data types will have better cache utilization.
std::fill(visited.begin(), visited.end(), 0);
//We increment the stored visited indicator and make a note of the result. Any
//value in the visited vector less than `visited_indicator` has not been
const auto visited_indicator =;
//For doing an early exit if we get everything in place
auto remaining = values.size();
//For all elements that need to be placed
for(size_t s=0;s<ordering.size() && remaining>0;s++){
//Ignore already-visited elements
//Don't rearrange if we don't have to
//The orderings form a linked list. We need to follow that list to its end
//in order to reverse it.
for(auto i=s;s!=(size_t)ordering[i];i=ordering[i]){
//Now we follow the linked list in reverse to its beginning, putting
//elements in their places. Use move semantics for speed.
auto temp = std::move(values[s]);
std::swap(temp, values[]);
visited[] = visited_indicator;
visited[s+1] = visited_indicator;
int main(){
std::mt19937 gen;
std::uniform_int_distribution<short> value_dist(0,std::numeric_limits<short>::max());
std::uniform_int_distribution<short> len_dist (0,std::numeric_limits<short>::max());
std::vector<short> data;
std::vector<short> ordering;
std::vector<short> original;
std::vector<size_t> progress;
for(int i=0;i<1000;i++){
const int len = len_dist(gen);
for(int i=0;i<len;i++){
original = data;
std::shuffle(ordering.begin(), ordering.end(), gen);
forward_reorder(data, ordering, progress);
backward_reorder(data, ordering, progress);
Never prematurely optimize. Meassure and then determine where you need to optimize and what. You can end with complex code that is hard to maintain and bug-prone in many places where performance is not an issue.
With that being said, do not early pessimize. Without changing the code you can remove half of your copies:
template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> const & order )
std::vector<T> tmp; // create an empty vector
tmp.reserve( data.size() ); // ensure memory and avoid moves in the vector
for ( std::size_t i = 0; i < order.size(); ++i ) {
tmp.push_back( data[order[i]] );
data.swap( tmp ); // swap vector contents
This code creates and empty (big enough) vector in which a single copy is performed in-order. At the end, the ordered and original vectors are swapped. This will reduce the copies, but still requires extra memory.
If you want to perform the moves in-place, a simple algorithm could be:
template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> const & order )
for ( std::size_t i = 0; i < order.size(); ++i ) {
std::size_t original = order[i];
while ( i < original ) {
original = order[original];
std::swap( data[i], data[original] );
This code should be checked and debugged. In plain words the algorithm in each step positions the element at the i-th position. First we determine where the original element for that position is now placed in the data vector. If the original position has already been touched by the algorithm (it is before the i-th position) then the original element was swapped to order[original] position. Then again, that element can already have been moved...
This algorithm is roughly O(N^2) in the number of integer operations and thus is theoretically worse in performance time as compare to the initial O(N) algorithm. But it can compensate if the N^2 swap operations (worst case) cost less than the N copy operations or if you are really constrained by memory footprint.
It's an interesting intellectual exercise to do the reorder with O(1) space requirement but in 99.9% of the cases the simpler answer will perform to your needs:
void permute(vector<T>& values, const vector<size_t>& indices)
vector<T> out;
for(size_t index: indices)
assert(0 <= index && index < values.size());
values = std::move(out);
Beyond memory requirements, the only way I can think of this being slower would be due to the memory of out being in a different cache page than that of values and indices.
You could do it recursively, I guess - something like this (unchecked, but it gives the idea):
// Recursive function
template<typename T>
void REORDER(int oldPosition, vector<T>& vA,
const vector<int>& vecNewOrder, vector<bool>& vecVisited)
// Keep a record of the value currently in that position,
// as well as the position we're moving it to.
// But don't move it yet, or we'll overwrite whatever's at the next
// position. Instead, we first move what's at the next position.
// To guard against loops, we look at vecVisited, and set it to true
// once we've visited a position.
T oldVal = vA[oldPosition];
int newPos = vecNewOrder[oldPosition];
if (vecVisited[oldPosition])
// We've hit a loop. Set it and return.
vA[newPosition] = oldVal;
// Guard against loops:
vecVisited[oldPosition] = true;
// Recursively re-order the next item in the sequence.
REORDER(newPos, vA, vecNewOrder, vecVisited);
// And, after we've set this new value,
vA[newPosition] = oldVal;
// The "main" function
template<typename T>
void REORDER(vector<T>& vA, const vector<int>& newOrder)
// Initialise vecVisited with false values
vector<bool> vecVisited(vA.size(), false);
for (int x = 0; x < vA.size(); x++)
REORDER(x, vA, newOrder, vecVisited);
Of course, you do have the overhead of vecVisited. Thoughts on this approach, anyone?
To iterate through the vector is O(n) operation. Its sorta hard to beat that.
Your code is broken. You cannot assign to vA and you need to use template parameters.
vector<char> REORDER(const vector<char>& vA, const vector<size_t>& vOrder)
assert(vA.size() == vOrder.size());
vector<char> vCopy(vA.size());
for(int i = 0; i < vOrder.size(); ++i)
vCopy[i] = vA[ vOrder[i] ];
return vA;
The above is slightly more efficient.
It is not clear by the title and the question if the vector should be ordered with the same steps it takes to order vOrder or if vOrder already contains the indexes of the desired order.
The first interpretation has already a satisfying answer (see chmike and Potatoswatter), I add some thoughts about the latter.
If the creation and/or copy cost of object T is relevant
template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> & order )
std::size_t i,j,k;
for(i = 0; i < order.size() - 1; ++i) {
j = order[i];
if(j != i) {
for(k = i + 1; order[k] != i; ++k);
If the creation cost of your object is small and memory is not a concern (see dribeas):
template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> const & order )
std::vector<T> tmp; // create an empty vector
tmp.reserve( data.size() ); // ensure memory and avoid moves in the vector
for ( std::size_t i = 0; i < order.size(); ++i ) {
tmp.push_back( data[order[i]] );
data.swap( tmp ); // swap vector contents
Note that the two pieces of code in dribeas answer do different things.
I was trying to use #Potatoswatter's solution to sort multiple vectors by a third one and got really confused by output from using the above functions on a vector of indices output from Armadillo's sort_index. To switch from a vector output from sort_index (the arma_inds vector below) to one that can be used with #Potatoswatter's solution (new_inds below), you can do the following:
vector<int> new_inds(arma_inds.size());
for (int i = 0; i < new_inds.size(); i++) new_inds[arma_inds[i]] = i;
I came up with this solution which has the space complexity of O(max_val - min_val + 1), but it can be integrated with std::sort and benefits from std::sort's O(n log n) decent time complexity.
std::vector<int32_t> dense_vec = {1, 2, 3};
std::vector<int32_t> order = {1, 0, 2};
int32_t max_val = *std::max_element(dense_vec.begin(), dense_vec.end());
std::vector<int32_t> sparse_vec(max_val + 1);
int32_t i = 0;
for(int32_t j: dense_vec)
sparse_vec[j] = order[i];
std::sort(dense_vec.begin(), dense_vec.end(),
[&sparse_vec](int32_t i1, int32_t i2) {return sparse_vec[i1] < sparse_vec[i2];});
The following assumptions made while writing this code:
Vector values start from zero.
Vector does not contain repeated values.
We have enough memory to sacrifice in order to use std::sort
This should avoid copying the vector:
void REORDER(vector<char>& vA, const vector<size_t>& vOrder)
assert(vA.size() == vOrder.size());
for(int i = 0; i < vOrder.size(); ++i)
if (i < vOrder[i])
swap(vA[i], vA[vOrder[i]]);