Fastest way to determine if a uint64 has been "seen" already - c++

I've been interested in optimizing "renumbering" algorithms that can relabel an arbitrary array of integers with duplicates into labels starting from 1. Sets and maps are too slow for what I've been trying to do, as are sorts. Is there a data structure that only remembers if a number has been seen or not reliably? I was considering experimenting with a bloom filter, but I have >12M integers and the target performance is faster than a good hashmap. Is this possible?
Here's a simple example pseudo-c++ algorithm that would be slow:
// note: all elements guaranteed > 0
std::vector<uint64_t> arr = { 21942198, 91292, 21942198, ... millions more };
std::unordered_map<uint64_t, uint64_t> renumber;
renumber.reserve(arr.size());
uint64_t next_label = 1;
for (uint64_t i = 0; i < arr.size(); i++) {
uint64_t elem = arr[i];
if (renumber[elem]) {
arr[i] = renumber[elem];
}
else {
renumber[elem] = next_label;
arr[i] = next_label;
++next_label;
}
}
Example input/output:
{ 12, 38, 1201, 120, 12, 39, 320, 1, 1 }
->
{ 1, 2, 3, 4, 1, 5, 6, 7, 7 }

Your algorithm is not bad, but the appropriate data structure to use for the map is a hash table with open addressing.
As explained in this answer, std::unordered_map can't be implemented that way: https://stackoverflow.com/a/31113618/5483526
So if the STL container is really too slow for you, then you can do better by making your own.
Note, however, that:
90% of the time, when someone complains about STL containers being too slow, they are running a debug build with optimizations turned off. Make sure you are running a release build compiled with optimizations on. Running your code on 12M integers should take a few milliseconds at most.
You are accessing the map multiple times when only once is required, like this:
uint64_t next_label = 1;
for (size_t i = 0; i < arr.size(); i++) {
uint64_t elem = arr[i];
uint64_t &label = renumber[elem];
if (!label) {
label = next_label++;
}
arr[i] = label;
}
Note that the unordered_map operator [] returns a reference to the associated value (creating it if it doesn't exist), so you can test and modify the value without having to search the map again.

Updated with bug fix
First, anytime you experience "slowness" with a std:: collection class like vector or map, just recompile with optimizations (release build). There is usually a 10x speedup.
Now to your problem. I'll show a two-pass solution that runs in O(N) time. I'll leave it as an exercise for you to convert to a one-pass solution. But I'll assert that this should be fast enough, even for vectors with millions of items.
First, declare not one, but two unordered maps:
std::unordered_map<uint64_t, uint64_t> element_to_label;
std::unordered_map<uint64_t, std::pair<uint64_t, std::vector<uint64_t>>> label_to_elements;
The first map, element_to_label maps an integer value found in the original array to it's unique label.
The second map, label_to_elements maps to both the element value and the list of indices that element occurs in the original array.
Now to build these maps:
element_to_label.reserve(arr.size());
label_to_elements.reserve(arr.size());
uint64_t next_label = 1;
for (size_t index = 0; index < arr.size(); index++)
{
const uint64_t elem = arr[index];
auto itor = element_to_label.find(elem);
if (itor == element_to_label.end())
{
// new element
element_to_label[elem] = next_label;
auto &p = label_to_elements[next_label];
p.first = elem;
p.second.push_back(index);
next_label++;
}
else
{
// existing element
uint64_t label = itor->second;
label_to_elements[label].second.push_back(index);
}
}
When the above code runs, it's built up a database all values in the array, their labels, and indices where they occur.
So now to renumber the array such that all elements are replaced with their smaller label value:
for (auto itor = label_to_elements.begin(); itor != label_to_elements.end(); itor++)
{
uint64_t label = itor->first;
auto& p = itor->second;
uint64_t elem = p.first; // technically, this isn't needed. It's just useful to know which element value we are replacing from the original array
const auto& vec = p.second;
for (size_t j = 0; j < vec.size(); j++)
{
size_t index = vec[j];
arr[index] = label;
}
}
Notice where I assign variables by reference with the & operator to avoid making an expensive copy of any value in the maps.
So if your original vector or array was this:
{ 100001, 2000002, 300003, 400004, 400004, 300003, 2000002, 100001 };
Then the application of labels would render the array as this:
{1,2,3,4,4,3,2,1}
And what's nice you still have a quick O(1) look operator to map any label in that set back to its original element value using label_to_elements

Related

Manipulating array's values in a certain way

So I was asked to write a function that changes array's values in a way that:
All of the values that are the smallest aren't changed
if, let's assume, the smallest number is 2 and there is no 3's and 4's then all 5's are changed for 3's etc.
for example, for an array = [2, 5, 7, 5] we would get [2, 3, 4, 3], which generalizes to getting a minimal value of an array which remains unchanged, and every other minimum (not including the first one) is changed depending on which minimum it is. On our example - 5 is the first minimum (besides 2), so it is 2 (first minimum) + 1 = 3, 7 is 2nd smallest after 2, so it is 2+2(as it is 2nd smallest).
I've come up with something like this:
int fillGaps(int arr[], size_t sz){
int min = *min_element(arr, arr+sz);
int w = 1;
for (int i = 0; i<sz; i++){
if (arr[i] == min) {continue;}
else{
int mini = *min_element(arr+i, arr+sz);
for (int j = 0; j<sz; j++){
if (arr[j] == mini){arr[j] = min+w;}
}
w++;}
}
return arr[sz-1];
}
However it works fine only for the 0th and 1st value, it doesnt affect any further items. Could anyone please help me with that?
I don't quite follow the logic of your function, so can't quite comment on that.
Here's how I interpret what needs to be done. Note that my example implementation is written to be as understandable as possible. There might be ways to make it faster.
Note that I'm also using an std::vector, to make things more readable and C++-like. You really shouldn't be passing raw pointers and sizes, that's super error prone. At the very least bundle them in a struct.
#include <algorithm>
#include <set>
#include <unordered_map>
#include <vector>
int fillGaps (std::vector<int> & data) {
// Make sure we don't have to worry about edge cases in the code below.
if (data.empty()) { return 0; }
/* The minimum number of times we need to loop over the data is two.
* First to check which values are in there, which lets us decide
* what each original value should be replaced with. Second to do the
* actual replacing.
*
* So let's trade some memory for speed and start by creating a lookup table.
* Each entry will map an existing value to its new value. Let's use the
* "define lambda and immediately invoke it" to make the scope of variables
* used to calculate all this as small as possible.
*/
auto const valueMapping = [&data] {
// Use an std::set so we get all unique values in sorted order.
std::set<int> values;
for (int e : data) { values.insert(e); }
std::unordered_map<int, int> result;
result.reserve(values.size());
// Map minimum value to itself, and increase replacement value by one for
// each subsequent value present in the data vector.
int replacement = *values.begin();
for (auto e : values) { result.emplace(e, replacement++); }
return result;
}();
// Now the actual algorithm is trivial: loop over the data and replace each
// element with its replacement value.
for (auto & e : data) { e = valueMapping.at(e); }
return data.back();
}

Why is my std::unordered_map access time not constant

I wrote some code to test my unordered map performance with a 2 component vector as a key.
std::unordered_map<Vector2i, int> m;
for(int i = 0; i < 1000; ++i)
for(int j = 0; j < 1000; ++j)
m[Vector2i(i,j)] = i*j+27*j;
clock.restart();
auto found = m.find(Vector2i(0,5));
std::cout << clock.getElapsedTime().asMicroseconds() << std::endl;
output for the code above: 56 (microseconds)
When I replace 1000 in the for loops by 100 the outputs is 2 (microseconds)
Isn't the time supposed to be constant ?
hash function for my Vector2i:
namespace std
{
template<>
struct hash<Vector2i>
{
std::size_t operator()(const Vector2i& k) const
{
using std::size_t;
using std::hash;
using std::string;
return (hash<int>()(k.x)) ^ (hash<int>()(k.y) << 1);
}
};
}
EDIT:
I added this code to count the collisions after the for loop:
for (size_t bucket = 0; bucket != m.bucket_count(); ++bucket)
if (m.bucket_size(bucket) > 1)
++collisions;
With 100*100 elements: collisions = 256
1000*1000 elements: collisions = 2048
A hash table guarantees constant amortized time. If the hash table is well balanced (i.e., the hash function is good), then most elements will be evenly distributed. However, if the hash function is not so good, you may have lots of collisions, in which case to access an element you'd need to traverse usually a linked list (where you store the elements that collided). So make sure first the load factor and hash function are OK in your case. Lastly, make sure you compiler your code in release mode, with optimizations turned on (e.g. -O3 for g++/clang++).
This question may be useful also: How to create a good hash_combine with 64 bit output (inspired by boost::hash_combine).

Vector performance suffering

I've been working on state space exploration and was originally using a map to store the assignment of the world states like map<Variable *, int>, where variables are objects in the world with a domain from 0 to n where n is finite. The implementation was extremely quick for performance, but I noticed that it does not scale well with the size of the state space. I changed the states to use vector<int> instead, where I use the id of a variable to find its index in the vector. Memory usage improved greatly, but the efficiency of the solver has tanked (gone from <30 seconds to 400+). The only code that I modified was generating the states and validating if the state is the goal. I can't figure out why using a vector has degraded performance, especially since the vector operations should only take linear time at worst.
Originally this is was how I generated nodes:
State * SuccessorGen::generate_successor(const Operator &op, map<Variable *, int> &var_assignment){
map<Variable *, int> values;
values.insert(var_assignment.begin(), var_assignment.end());
vector<Operator::Effect> effect = op.get_effect();
vector<Operator::Effect>::const_iterator eff_it = effect.begin();
for (; eff_it != effect.end(); eff_it++){
values[eff_it->var] = eff_it->after;
}
return new State(values);
}
And in my new implementation:
State* SuccessorGen::generate_successor(const Operator &op, const vector<int> &assignment){
vector<int> child;
child = assignment;
vector<Operator::Effect> effect = op.get_effect();
vector<Operator::Effect>::const_iterator eff_it = effect.begin();
for (; eff_it != effect.end(); eff_it++){
Variable *v = eff_it->var;
int id = v->get_id();
child[id] = eff_it->after;
}
return new State(child);
}
(The goal checking is similar, just looping over the goal assignment instead of operator effects.)
Are these vector operations really that much slower than using a map? Is there an equally efficient STL container I can use that has a lower overhead? The number of variables is relatively small (<50) and the vector never needs to be resized or modified after the for loop.
Edit:
I tried timing one loop through all the operators to see timing comparisons, with the effect list and assignment the vector version runs one loop in 0.3 seconds, while the map version is a little over 0.4 seconds. When I comment that section out the map was about the same, yet the vector jumped up to closer to 0.5 seconds. I added child.reserve(assignment.size()) but that did not make any change.
Edit 2:
From user63710's answer, I've also been digging through the rest of the code and noticed something really strange going on in the heuristic calculation. The vector version works fine, but for the map I use this line Node *n = new Node(i, transition.value, label_cost); open_list.push(n);, but once the loop finishes filling the queue the node gets totally screwed up. Nodes are a simple struct as:
struct Node{
// Source Value, Destination Value
int from;
int to;
int distance;
Node(int &f, int &t, int &d) : from(f), to(t), distance(d){}
};
Instead of having from, to, distance, it replaces from and to with id with some random number, and that search does not do what it should and is returning much faster then it should. When I tweak the map version to convert the map to a vector and run this:
Node n(i, transition.value, label_cost); open_list.push(n);
the performance is about equal to that of the vector. So that fixes my main issue, but this leaves me wondering why using Node *n gets this behaviour opposed to Node n()?
If as you say, the sizes of these structures are fairly small (~50 elements), I have to think that the issue is somewhere else. At least, I don't think it involves the memory accesses or allocation of the vector/map.
Some example code I made to test: Map version:
unique_ptr<map<int, int>> make_successor_map(const vector<int> &ids,
const map<int, int> &input)
{
auto new_map = make_unique<map<int, int>>(input.begin(), input.end());
for (size_t i = 0; i < ids.size(); ++i)
swap((*new_map)[ids[i]], (*new_map)[i]);
return new_map;
}
int main()
{
auto a_map = make_unique<map<int, int>>();
// ids to access
vector<int> ids;
const int n = 100;
for (int i = 0; i < n; ++i)
{
a_map->insert({i, rand()});
ids.push_back(i);
}
random_shuffle(ids.begin(), ids.end());
for (int i = 0; i < 1e6; ++i)
{
auto temp_map = make_successor_map(ids, *a_map);
swap(temp_map, a_map);
}
cout << a_map->begin()->second << endl;
}
Vector version:
unique_ptr<vector<int>> make_successor_vec(const vector<int> &ids,
const vector<int> &input)
{
auto new_vec = make_unique<vector<int>>(input);
for (size_t i = 0; i < ids.size(); ++i)
swap((*new_vec)[ids[i]], (*new_vec)[i]);
return new_vec;
}
int main()
{
auto a_vec = make_unique<vector<int>>();
// ids to access
vector<int> ids;
const int n = 100;
for (int i = 0; i < n; ++i)
{
a_vec->push_back(rand());
ids.push_back(i);
}
random_shuffle(ids.begin(), ids.end());
for (int i = 0; i < 1e6; ++i)
{
auto temp_vec = make_successor_vec(ids, *a_vec);
swap(temp_vec, a_vec);
}
cout << *a_vec->begin() << endl;
}
The map version takes around 15 seconds to run on my old Core 2 Duo T9600, and the vector version takes 0.406 seconds. Both we're compiled on G++ 4.9.2 with g++ -O3 --std=c++1y. So if your code takes 0.4s per iteration (note that it took my example code 0.4s for 1 million calls), then I'm really thinking your problem is somewhere else.
That's not to say you aren't having performance decreases due to switching from map->vector, but that the code you posted doesn't show much reason for that to happen.
The problem is that you create vectors without reserving space. Vectors store elements contiguously. That ensures constant access to elements.
So everytime you add an item to the vector (for example via your inserter), the vector has to reallocate more space and eventuelly move all the existing elements to a reallocated memory location. This causes slowdown and considerable heap fragmentation.
The solution to this is to reserve() elements if you know in advance how many elements you'll have. Or if you don't reserve() larger chunks and compare size() and capacity() to check if it's time to reserve more.

Remove elements from vector on given indexes, order does not matter

What I have is a vector of elements, I do not care about the order of them.
Than I have N indexes (each addressing unique position in the vector) of elements to be removed from the vector. I want the removal be as fast as possible.
Best I could come up with was to store indexes in set (order indexes):
std::set<unsigned int> idxs;
for (int i=0; i<N; ++i)
idxs.insert(some_index);
And than iterating over the set in reversed order and replacing index to remove by last element of the vector.
std::set<unsigned int>::reverse_iterator rit;
for (rit = idxs.rbegin(); rit != idxs.rend(); ++rit) {
vec[*rit].swap(vec[vec.size() - 1]);
vec.resize(vec.size() - 1);
}
However I was thinking whether there is some more efficient way of doing this since the usage of set seems a bit overkill to me and I would love to avoid sorting at all.
EDIT1:
Let us assume I use vector and sort it afterwards.
std::vector<unsigned int> idxs;
for (int i=0; i<N; ++i)
idxs.push_back(some_index);
std::sort(idxs.begin(), idxs.end());
Can I push it any further?
EDIT2:
I should have mentioned that the vector will have up to 10 elements. However the removal in my program occurs very often (hundreds of thousands times).
set is a good choice. I'd guess using another allocator (e.g. arena)would have the biggest impact. Why not use a set instead of a vector of elements to begin with?
I see the following relevant variations:
Instead of remove, create a new vector and copy preserved elements, then swap back.
This keeps your indices stable (unlike removal, which would require sorting or updating the indices).
Instead of a vector of indices, use a vector of bools of the same length as your data.
With the length of "maximum 10" given, a bit mask seems sufficient
So, roughly:
struct Index
{
DWORD removeMask = 0; // or use bit vector for larger N
void TagForRemove(int idx) { removeMask |= (1<<idx); }
boll DoRemove(int idx) const { return (removeMask & (1<<idx)) != 0; }
}
// create new vector, or remove, as you like
void ApplyRemoveIndex(vector<T> & v, Index remove)
{
vector<T> copy;
copy.reserve(v.size());
for (i=0..v.size())
if (!remove.DoRemove(i))
copy.push_back(v[i]);
copy.swap(v);
}
You can use swap/pop_back to remove an item at a given index, and keep track of which indices you've moved with a hash table. It's linear space & time in the number of removals.
std::vector<T> vec = ...;
std::vector<unsigned int> idxs;
std::unordered_map<unsigned int, unsigned int> map;
for(auto index : idxs) {
unsigned int trueIndex = index;
while (trueIndex >= vec.size()) {
trueIndex = map[trueIndex];
}
// element at index 'vec.size()-1' is being moved to index 'index'
map[vec.size()-1] = index;
swap(vec[trueIndex], vec[vec.size()-1]);
vec.pop_back();
}

Reorder vector using a vector of indices [duplicate]

This question already has answers here:
How do I sort a std::vector by the values of a different std::vector? [duplicate]
(13 answers)
Closed 12 months ago.
I'd like to reorder the items in a vector, using another vector to specify the order:
char A[] = { 'a', 'b', 'c' };
size_t ORDER[] = { 1, 0, 2 };
vector<char> vA(A, A + sizeof(A) / sizeof(*A));
vector<size_t> vOrder(ORDER, ORDER + sizeof(ORDER) / sizeof(*ORDER));
reorder_naive(vA, vOrder);
// A is now { 'b', 'a', 'c' }
The following is an inefficient implementation that requires copying the vector:
void reorder_naive(vector<char>& vA, const vector<size_t>& vOrder)
{
assert(vA.size() == vOrder.size());
vector vCopy = vA; // Can we avoid this?
for(int i = 0; i < vOrder.size(); ++i)
vA[i] = vCopy[ vOrder[i] ];
}
Is there a more efficient way, for example, that uses swap()?
This algorithm is based on chmike's, but the vector of reorder indices is const. This function agrees with his for all 11! permutations of [0..10]. The complexity is O(N^2), taking N as the size of the input, or more precisely, the size of the largest orbit.
See below for an optimized O(N) solution which modifies the input.
template< class T >
void reorder(vector<T> &v, vector<size_t> const &order ) {
for ( int s = 1, d; s < order.size(); ++ s ) {
for ( d = order[s]; d < s; d = order[d] ) ;
if ( d == s ) while ( d = order[d], d != s ) swap( v[s], v[d] );
}
}
Here's an STL style version which I put a bit more effort into. It's about 47% faster (that is, almost twice as fast over [0..10]!) because it does all the swaps as early as possible and then returns. The reorder vector consists of a number of orbits, and each orbit is reordered upon reaching its first member. It's faster when the last few elements do not contain an orbit.
template< typename order_iterator, typename value_iterator >
void reorder( order_iterator order_begin, order_iterator order_end, value_iterator v ) {
typedef typename std::iterator_traits< value_iterator >::value_type value_t;
typedef typename std::iterator_traits< order_iterator >::value_type index_t;
typedef typename std::iterator_traits< order_iterator >::difference_type diff_t;
diff_t remaining = order_end - 1 - order_begin;
for ( index_t s = index_t(), d; remaining > 0; ++ s ) {
for ( d = order_begin[s]; d > s; d = order_begin[d] ) ;
if ( d == s ) {
-- remaining;
value_t temp = v[s];
while ( d = order_begin[d], d != s ) {
swap( temp, v[d] );
-- remaining;
}
v[s] = temp;
}
}
}
And finally, just to answer the question once and for all, a variant which does destroy the reorder vector (filling it with -1's). For permutations of [0..10], It's about 16% faster than the preceding version. Because overwriting the input enables dynamic programming, it is O(N), asymptotically faster for some cases with longer sequences.
template< typename order_iterator, typename value_iterator >
void reorder_destructive( order_iterator order_begin, order_iterator order_end, value_iterator v ) {
typedef typename std::iterator_traits< value_iterator >::value_type value_t;
typedef typename std::iterator_traits< order_iterator >::value_type index_t;
typedef typename std::iterator_traits< order_iterator >::difference_type diff_t;
diff_t remaining = order_end - 1 - order_begin;
for ( index_t s = index_t(); remaining > 0; ++ s ) {
index_t d = order_begin[s];
if ( d == (diff_t) -1 ) continue;
-- remaining;
value_t temp = v[s];
for ( index_t d2; d != s; d = d2 ) {
swap( temp, v[d] );
swap( order_begin[d], d2 = (diff_t) -1 );
-- remaining;
}
v[s] = temp;
}
}
In-place reordering of vector
Warning: there is an ambiguity about the semantic what the ordering-indices mean. Both are answered here
move elements of vector to the position of the indices
Interactive version here.
#include <iostream>
#include <vector>
#include <assert.h>
using namespace std;
void REORDER(vector<double>& vA, vector<size_t>& vOrder)
{
assert(vA.size() == vOrder.size());
// for all elements to put in place
for( int i = 0; i < vA.size() - 1; ++i )
{
// while the element i is not yet in place
while( i != vOrder[i] )
{
// swap it with the element at its final place
int alt = vOrder[i];
swap( vA[i], vA[alt] );
swap( vOrder[i], vOrder[alt] );
}
}
}
int main()
{
std::vector<double> vec {7, 5, 9, 6};
std::vector<size_t> inds {1, 3, 0, 2};
REORDER(vec, inds);
for (size_t vv = 0; vv < vec.size(); ++vv)
{
std::cout << vec[vv] << std::endl;
}
return 0;
}
output
9
7
6
5
note that you can save one test because if n-1 elements are in place the last nth element is certainly in place.
On exit vA and vOrder are properly ordered.
This algorithm performs at most n-1 swapping because each swap moves the element to its final position. And we'll have to do at most 2N tests on vOrder.
draw the elements of vector from the position of the indices
Try it interactively here.
#include <iostream>
#include <vector>
#include <assert.h>
template<typename T>
void reorder(std::vector<T>& vec, std::vector<size_t> vOrder)
{
assert(vec.size() == vOrder.size());
for( size_t vv = 0; vv < vec.size() - 1; ++vv )
{
if (vOrder[vv] == vv)
{
continue;
}
size_t oo;
for(oo = vv + 1; oo < vOrder.size(); ++oo)
{
if (vOrder[oo] == vv)
{
break;
}
}
std::swap( vec[vv], vec[vOrder[vv]] );
std::swap( vOrder[vv], vOrder[oo] );
}
}
int main()
{
std::vector<double> vec {7, 5, 9, 6};
std::vector<size_t> inds {1, 3, 0, 2};
reorder(vec, inds);
for (size_t vv = 0; vv < vec.size(); ++vv)
{
std::cout << vec[vv] << std::endl;
}
return 0;
}
Output
5
6
7
9
It appears to me that vOrder contains a set of indexes in the desired order (for example the output of sorting by index). The code example here follows the "cycles" in vOrder, where following a sub-set (could be all of vOrder) of indexes will cycle through the sub-set, ending back at the first index of the sub-set.
Wiki article on "cycles"
https://en.wikipedia.org/wiki/Cyclic_permutation
In the following example, every swap places at least one element in it's proper place. This code example effectively reorders vA according to vOrder, while "unordering" or "unpermuting" vOrder back to its original state (0 :: n-1). If vA contained the values 0 through n-1 in order, then after reorder, vA would end up where vOrder started.
template <class T>
void reorder(vector<T>& vA, vector<size_t>& vOrder)
{
assert(vA.size() == vOrder.size());
// for all elements to put in place
for( size_t i = 0; i < vA.size(); ++i )
{
// while vOrder[i] is not yet in place
// every swap places at least one element in it's proper place
while( vOrder[i] != vOrder[vOrder[i]] )
{
swap( vA[vOrder[i]], vA[vOrder[vOrder[i]]] );
swap( vOrder[i], vOrder[vOrder[i]] );
}
}
}
This can also be implemented a bit more efficiently using moves instead swaps. A temp object is needed to hold an element during the moves. Example C code, reorders A[] according to indexes in I[], also sorts I[] :
void reorder(int *A, int *I, int n)
{
int i, j, k;
int tA;
/* reorder A according to I */
/* every move puts an element into place */
/* time complexity is O(n) */
for(i = 0; i < n; i++){
if(i != I[i]){
tA = A[i];
j = i;
while(i != (k = I[j])){
A[j] = A[k];
I[j] = j;
j = k;
}
A[j] = tA;
I[j] = j;
}
}
}
If it is ok to modify the ORDER array then an implementation that sorts the ORDER vector and at each sorting operation also swaps the corresponding values vector elements could do the trick, I think.
A survey of existing answers
You ask if there is "a more efficient way". But what do you mean by efficient and what are your requirements?
Potatoswatter's answer works in O(N²) time with O(1) additional space and doesn't mutate the reordering vector.
chmike and rcgldr give answers which use O(N) time with O(1) additional space, but they achieve this by mutating the reordering vector.
Your original answer allocates new space and then copies data into it while Tim MB suggests using move semantics. However, moving still requires a place to move things to and an object like an std::string has both a length variable and a pointer. In other words, a move-based solution requires O(N) allocations for any objects and O(1) allocations for the new vector itself. I explain why this is important below.
Preserving the reordering vector
We might want that reordering vector! Sorting costs O(N log N). But, if you know you'll be sorting several vectors in the same way, such as in a Structure of Arrays (SoA) context, you can sort once and then reuse the results. This can save a lot of time.
You might also want to sort and then unsort data. Having the reordering vector allows you to do this. A use case here is for performing genomic sequencing on GPUs where maximal speed efficiency is obtained by having sequences of similar lengths processed in batches. We cannot rely on the user providing sequences in this order so we sort and then unsort.
So, what if we want the best of all worlds: O(N) processing without the costs of additional allocation but also without mutating our ordering vector (which we might, after all, want to reuse)? To find that world, we need to ask:
Why is extra space bad?
There are two reasons you might not want to allocate additional space.
The first is that you don't have much space to work with. This can occur in two situations: you're on an embedded device with limited memory. Usually this means you're working with small datasets, so the O(N²) solution is probably fine here. But it can also happen when you are working with really large datasets. In this case O(N²) is unacceptable and you have to use one of the O(N) mutating solutions.
The other reason extra space is bad is because allocation is expensive. For smaller datasets it can cost more than the actual computation. Thus, one way to achieve efficiency is to eliminate allocation.
Outline
When we mutate the ordering vector we are doing so as a way to indicate whether elements are in their permuted positions. Rather than doing this, we could use a bit-vector to indicate that same information. However, if we allocate the bit vector each time that would be expensive.
Instead, we could clear the bit vector each time by resetting it to zero. However, that incurs an additional O(N) cost per function use.
Rather, we can store a "version" value in a vector and increment this on each function use. This gives us O(1) access, O(1) clear, and an amoritzed allocation cost. This works similarly to a persistent data structure. The downside is that if we use an ordering function too often the version counter needs to be reset, though the O(N) cost of doing so is amortized.
This raises the question: what is the optimal data type for the version vector? A bit-vector maximizes cache utilization but requires a full O(N) reset after each use. A 64-bit data type probably never needs to be reset, but has poor cache utilization. Experimenting is the best way to figure this out.
Two types of permutations
We can view an ordering vector as having two senses: forward and backward. In the forward sense, the vector tell us where elements go to. In the backward sense, the vector tells us where elements are coming from. Since the ordering vector is implicitly a linked list, the backward sense requires O(N) additional space, but, again, we can amortize the allocation cost. Applying the two senses sequentially brings us back to our original ordering.
Performance
Running single-threaded on my "Intel(R) Xeon(R) E-2176M CPU # 2.70GHz", the following code takes about 0.81ms per reordering for sequences 32,767 elements long.
Code
Fully commented code for both senses with tests:
#include <algorithm>
#include <cassert>
#include <random>
#include <stack>
#include <stdexcept>
#include <vector>
///#brief Reorder a vector by moving its elements to indices indicted by another
/// vector. Takes O(N) time and O(N) space. Allocations are amoritzed.
///
///#param[in,out] values Vector to be reordered
///#param[in] ordering A permutation of the vector
///#param[in,out] visited A black-box vector to be reused between calls and
/// shared with with `backward_reorder()`
template<class ValueType, class OrderingType, class ProgressType>
void forward_reorder(
std::vector<ValueType> &values,
const std::vector<OrderingType> &ordering,
std::vector<ProgressType> &visited
){
if(ordering.size()!=values.size()){
throw std::runtime_error("ordering and values must be the same size!");
}
//Size the visited vector appropriately. Since vectors don't shrink, this will
//shortly become large enough to handle most of the inputs. The vector is 1
//larger than necessary because the first element is special.
if(visited.empty() || visited.size()-1<values.size());
visited.resize(values.size()+1);
//If the visitation indicator becomes too large, we reset everything. This is
//O(N) expensive, but unlikely to occur in most use cases if an appropriate
//data type is chosen for the visited vector. For instance, an unsigned 32-bit
//integer provides ~4B uses before it needs to be reset. We subtract one below
//to avoid having to think too much about off-by-one errors. Note that
//choosing the biggest data type possible is not necessarily a good idea!
//Smaller data types will have better cache utilization.
if(visited.at(0)==std::numeric_limits<ProgressType>::max()-1)
std::fill(visited.begin(), visited.end(), 0);
//We increment the stored visited indicator and make a note of the result. Any
//value in the visited vector less than `visited_indicator` has not been
//visited.
const auto visited_indicator = ++visited.at(0);
//For doing an early exit if we get everything in place
auto remaining = values.size();
//For all elements that need to be placed
for(size_t s=0;s<ordering.size() && remaining>0;s++){
assert(visited[s+1]<=visited_indicator);
//Ignore already-visited elements
if(visited[s+1]==visited_indicator)
continue;
//Don't rearrange if we don't have to
if(s==visited[s])
continue;
//Follow this cycle, putting elements in their places until we get back
//around. Use move semantics for speed.
auto temp = std::move(values[s]);
auto i = s;
for(;s!=(size_t)ordering[i];i=ordering[i],--remaining){
std::swap(temp, values[ordering[i]]);
visited[i+1] = visited_indicator;
}
std::swap(temp, values[s]);
visited[i+1] = visited_indicator;
}
}
///#brief Reorder a vector by moving its elements to indices indicted by another
/// vector. Takes O(2N) time and O(2N) space. Allocations are amoritzed.
///
///#param[in,out] values Vector to be reordered
///#param[in] ordering A permutation of the vector
///#param[in,out] visited A black-box vector to be reused between calls and
/// shared with with `forward_reorder()`
template<class ValueType, class OrderingType, class ProgressType>
void backward_reorder(
std::vector<ValueType> &values,
const std::vector<OrderingType> &ordering,
std::vector<ProgressType> &visited
){
//The orderings form a linked list. We need O(N) memory to reverse a linked
//list. We use `thread_local` so that the function is reentrant.
thread_local std::stack<OrderingType> stack;
if(ordering.size()!=values.size()){
throw std::runtime_error("ordering and values must be the same size!");
}
//Size the visited vector appropriately. Since vectors don't shrink, this will
//shortly become large enough to handle most of the inputs. The vector is 1
//larger than necessary because the first element is special.
if(visited.empty() || visited.size()-1<values.size());
visited.resize(values.size()+1);
//If the visitation indicator becomes too large, we reset everything. This is
//O(N) expensive, but unlikely to occur in most use cases if an appropriate
//data type is chosen for the visited vector. For instance, an unsigned 32-bit
//integer provides ~4B uses before it needs to be reset. We subtract one below
//to avoid having to think too much about off-by-one errors. Note that
//choosing the biggest data type possible is not necessarily a good idea!
//Smaller data types will have better cache utilization.
if(visited.at(0)==std::numeric_limits<ProgressType>::max()-1)
std::fill(visited.begin(), visited.end(), 0);
//We increment the stored visited indicator and make a note of the result. Any
//value in the visited vector less than `visited_indicator` has not been
//visited.
const auto visited_indicator = ++visited.at(0);
//For doing an early exit if we get everything in place
auto remaining = values.size();
//For all elements that need to be placed
for(size_t s=0;s<ordering.size() && remaining>0;s++){
assert(visited[s+1]<=visited_indicator);
//Ignore already-visited elements
if(visited[s+1]==visited_indicator)
continue;
//Don't rearrange if we don't have to
if(s==visited[s])
continue;
//The orderings form a linked list. We need to follow that list to its end
//in order to reverse it.
stack.emplace(s);
for(auto i=s;s!=(size_t)ordering[i];i=ordering[i]){
stack.emplace(ordering[i]);
}
//Now we follow the linked list in reverse to its beginning, putting
//elements in their places. Use move semantics for speed.
auto temp = std::move(values[s]);
while(!stack.empty()){
std::swap(temp, values[stack.top()]);
visited[stack.top()+1] = visited_indicator;
stack.pop();
--remaining;
}
visited[s+1] = visited_indicator;
}
}
int main(){
std::mt19937 gen;
std::uniform_int_distribution<short> value_dist(0,std::numeric_limits<short>::max());
std::uniform_int_distribution<short> len_dist (0,std::numeric_limits<short>::max());
std::vector<short> data;
std::vector<short> ordering;
std::vector<short> original;
std::vector<size_t> progress;
for(int i=0;i<1000;i++){
const int len = len_dist(gen);
data.clear();
ordering.clear();
for(int i=0;i<len;i++){
data.push_back(value_dist(gen));
ordering.push_back(i);
}
original = data;
std::shuffle(ordering.begin(), ordering.end(), gen);
forward_reorder(data, ordering, progress);
assert(original!=data);
backward_reorder(data, ordering, progress);
assert(original==data);
}
}
Never prematurely optimize. Meassure and then determine where you need to optimize and what. You can end with complex code that is hard to maintain and bug-prone in many places where performance is not an issue.
With that being said, do not early pessimize. Without changing the code you can remove half of your copies:
template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> const & order )
{
std::vector<T> tmp; // create an empty vector
tmp.reserve( data.size() ); // ensure memory and avoid moves in the vector
for ( std::size_t i = 0; i < order.size(); ++i ) {
tmp.push_back( data[order[i]] );
}
data.swap( tmp ); // swap vector contents
}
This code creates and empty (big enough) vector in which a single copy is performed in-order. At the end, the ordered and original vectors are swapped. This will reduce the copies, but still requires extra memory.
If you want to perform the moves in-place, a simple algorithm could be:
template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> const & order )
{
for ( std::size_t i = 0; i < order.size(); ++i ) {
std::size_t original = order[i];
while ( i < original ) {
original = order[original];
}
std::swap( data[i], data[original] );
}
}
This code should be checked and debugged. In plain words the algorithm in each step positions the element at the i-th position. First we determine where the original element for that position is now placed in the data vector. If the original position has already been touched by the algorithm (it is before the i-th position) then the original element was swapped to order[original] position. Then again, that element can already have been moved...
This algorithm is roughly O(N^2) in the number of integer operations and thus is theoretically worse in performance time as compare to the initial O(N) algorithm. But it can compensate if the N^2 swap operations (worst case) cost less than the N copy operations or if you are really constrained by memory footprint.
It's an interesting intellectual exercise to do the reorder with O(1) space requirement but in 99.9% of the cases the simpler answer will perform to your needs:
void permute(vector<T>& values, const vector<size_t>& indices)
{
vector<T> out;
out.reserve(indices.size());
for(size_t index: indices)
{
assert(0 <= index && index < values.size());
out.push_back(std::move(values[index]));
}
values = std::move(out);
}
Beyond memory requirements, the only way I can think of this being slower would be due to the memory of out being in a different cache page than that of values and indices.
You could do it recursively, I guess - something like this (unchecked, but it gives the idea):
// Recursive function
template<typename T>
void REORDER(int oldPosition, vector<T>& vA,
const vector<int>& vecNewOrder, vector<bool>& vecVisited)
{
// Keep a record of the value currently in that position,
// as well as the position we're moving it to.
// But don't move it yet, or we'll overwrite whatever's at the next
// position. Instead, we first move what's at the next position.
// To guard against loops, we look at vecVisited, and set it to true
// once we've visited a position.
T oldVal = vA[oldPosition];
int newPos = vecNewOrder[oldPosition];
if (vecVisited[oldPosition])
{
// We've hit a loop. Set it and return.
vA[newPosition] = oldVal;
return;
}
// Guard against loops:
vecVisited[oldPosition] = true;
// Recursively re-order the next item in the sequence.
REORDER(newPos, vA, vecNewOrder, vecVisited);
// And, after we've set this new value,
vA[newPosition] = oldVal;
}
// The "main" function
template<typename T>
void REORDER(vector<T>& vA, const vector<int>& newOrder)
{
// Initialise vecVisited with false values
vector<bool> vecVisited(vA.size(), false);
for (int x = 0; x < vA.size(); x++)
{
REORDER(x, vA, newOrder, vecVisited);
}
}
Of course, you do have the overhead of vecVisited. Thoughts on this approach, anyone?
To iterate through the vector is O(n) operation. Its sorta hard to beat that.
Your code is broken. You cannot assign to vA and you need to use template parameters.
vector<char> REORDER(const vector<char>& vA, const vector<size_t>& vOrder)
{
assert(vA.size() == vOrder.size());
vector<char> vCopy(vA.size());
for(int i = 0; i < vOrder.size(); ++i)
vCopy[i] = vA[ vOrder[i] ];
return vA;
}
The above is slightly more efficient.
It is not clear by the title and the question if the vector should be ordered with the same steps it takes to order vOrder or if vOrder already contains the indexes of the desired order.
The first interpretation has already a satisfying answer (see chmike and Potatoswatter), I add some thoughts about the latter.
If the creation and/or copy cost of object T is relevant
template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> & order )
{
std::size_t i,j,k;
for(i = 0; i < order.size() - 1; ++i) {
j = order[i];
if(j != i) {
for(k = i + 1; order[k] != i; ++k);
std::swap(order[i],order[k]);
std::swap(data[i],data[j]);
}
}
}
If the creation cost of your object is small and memory is not a concern (see dribeas):
template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> const & order )
{
std::vector<T> tmp; // create an empty vector
tmp.reserve( data.size() ); // ensure memory and avoid moves in the vector
for ( std::size_t i = 0; i < order.size(); ++i ) {
tmp.push_back( data[order[i]] );
}
data.swap( tmp ); // swap vector contents
}
Note that the two pieces of code in dribeas answer do different things.
I was trying to use #Potatoswatter's solution to sort multiple vectors by a third one and got really confused by output from using the above functions on a vector of indices output from Armadillo's sort_index. To switch from a vector output from sort_index (the arma_inds vector below) to one that can be used with #Potatoswatter's solution (new_inds below), you can do the following:
vector<int> new_inds(arma_inds.size());
for (int i = 0; i < new_inds.size(); i++) new_inds[arma_inds[i]] = i;
I came up with this solution which has the space complexity of O(max_val - min_val + 1), but it can be integrated with std::sort and benefits from std::sort's O(n log n) decent time complexity.
std::vector<int32_t> dense_vec = {1, 2, 3};
std::vector<int32_t> order = {1, 0, 2};
int32_t max_val = *std::max_element(dense_vec.begin(), dense_vec.end());
std::vector<int32_t> sparse_vec(max_val + 1);
int32_t i = 0;
for(int32_t j: dense_vec)
{
sparse_vec[j] = order[i];
i++;
}
std::sort(dense_vec.begin(), dense_vec.end(),
[&sparse_vec](int32_t i1, int32_t i2) {return sparse_vec[i1] < sparse_vec[i2];});
The following assumptions made while writing this code:
Vector values start from zero.
Vector does not contain repeated values.
We have enough memory to sacrifice in order to use std::sort
This should avoid copying the vector:
void REORDER(vector<char>& vA, const vector<size_t>& vOrder)
{
assert(vA.size() == vOrder.size());
for(int i = 0; i < vOrder.size(); ++i)
if (i < vOrder[i])
swap(vA[i], vA[vOrder[i]]);
}