Heapify in logarithmic time using the C++ standard library

Heapify in logarithmic time using the C++ standard library - c++

I have a heap using std::make_heap:
std::vector<int> v{1,2,3,5,9,20,3};
std::make_heap(v.begin(), v.end());
now I update the heap by changing one random element:
v[3] = 35;
Is there a way in standard library in to adjust heap again in O(log n) time where n is size of container. Basically I am looking for heapify function. I know what element has been changed.
I understand that std::make_heap is O(n log n) time. I have also gone through duplicate question but that is different in sense that it is changing max element. For that solution is already given of O(log n) complexity in that question.
I am trying to change any random element within heap.

You can just do it yourself:
void modify_heap_element(std::vector<int> &heap, size_t index, int value)
{
//while value is too large for its position, bubble up
while(index > 0 && heap[(index-1)>>1] < value)
{
size_t parent = (index-1)>>1;
heap[index]=heap[parent];
index = parent;
}
//while value is too large for its position sift down
for (;;)
{
size_t left=index*2+1;
size_t right=left+1;
if (left >= heap.size())
break;
size_t bigchild = (right >= heap.size() || heap[right] < heap[left] ?
left : right );
if (!(value < heap[bigchild]))
break;
heap[index]=heap[bigchild];
index = bigchild;
}
heap[index] = value;
}

If we look closer at your statement:
now I disturb heap by changing one random element of heap.
For heapifying in O(log n) you can only directly "disturb" the back or the front of the vector (which corresponds somehow to inserting or deleting an element). In these cases, (re)heapification can be then achieved by means of the std::push_heap and std::pop_heap algorithms, which take logarithmic running time.
That is, the back:
v.back() = 35;
std::push_heap(v.begin(), v.end()); // heapify in O(log n)
or the front:
v.front() = 35;
// places the front at the back
std::pop_heap(v.begin(), v.end()); // O(log n)
// v.back() is now 35, but it does not belong to the heap anymore
// make the back belong to the heap again
std::push_heap(v.begin(), v.end()); // O(log n)
Otherwise you need to reheapify the whole vector with std::make_heap, which takes linear running time.
Summary
It's not possible to modify an arbitrary element of the heap and achieve the heapification in logarithmic running time with the standard library (i.e., the function templates std::push_heap and std::pop_heap). However, you can always implement the heap's swim and sink operations by yourself in order to heapify in logarithmic running time.

I have been facing this problem of wanting an "updateable heap" as well. However, in the end, instead of coding a custom updateable heap or anything like that, I solved it a bit differently.
To maintain access to the best element without needing to explicitly go through the heap, you can use versioned wrappers of the elements that you want to order. Each unique, true element has a version counter, which is increased every time the element gets changed. Each wrapper inside the heap then carries a version of the element, being the version at the time the wrapper was created:
struct HeapElemWrapper
{
HeapElem * e;
size_t version;
double priority;
HeapElemWrapper(HeapElem * elem)
: e(elem), version(elem->currentVersion), priority(0.0)
{}
bool upToDate() const
{
return version == e->currentVersion;
}
// operator for ordering with heap / priority queue:
// smaller error -> higher priority
bool operator<(const HeapElemWrapper & other) const
{
return this->priority> other.priority;
}
};
When popping the topmost element from the heap, you can then simply check this wrapper element to see if it's up-to-date with the original. If not, simply dispose it and pop the next one. This method is quite efficient, and I have it seen in other applications as well. The only thing you need to take care of is that you do a pass over the heap to clean it up from outdated elements, from time to time (say, every 1000 insertions or so).

It's not possible to modify an arbitrary element of the heap in logarithmic running time without violating the heap property by just using the function templates std::pop_heap() and std::push_heap() that the standard library provides.
However, you can define your own STL-like function template, set_heap_element(), for that purpose:
template<typename RandomIt, typename T, typename Cmp>
void set_heap_element(RandomIt first, RandomIt last, RandomIt pos, T value, Cmp cmp)
{
const auto n = last - first;
*pos = std::move(value); // replace previous value
auto i = pos - first;
using std::swap;
// percolate up
while (i > 0) { // non-root node
auto parent_it = first + (i-1)/2;
if (cmp(*pos, *parent_it))
break; // parent node satisfies the heap-property
swap(*pos, *parent_it); // swap with parent
pos = parent_it;
i = pos - first;
}
// percolate down
while (2*i + 1 < n) { // non-leaf node, since it has a left child
const auto lidx = 2*i + 1, ridx = 2*i + 2;
auto lchild_it = first + lidx;
auto rchild_it = ridx < n? first + ridx: last;
auto it = pos;
if (cmp(*it, *lchild_it))
it = lchild_it;
if (rchild_it != last && cmp(*it, *rchild_it))
it = rchild_it;
if (pos == it)
break; // node satisfies the heap-property
swap(*pos, *it); // swap with child
pos = it;
i = pos - first;
}
}
Then, you can provide the following simplified overload of set_heap_element() for a max heap:
#include <functional> // std::less
template<typename RandomIt, typename T>
void set_heap_element(RandomIt first, RandomIt last, RandomIt pos, T value) {
return set_heap_element(first, last, pos, value, std::less<T>{});
}
This overload uses a std::less<T> object as the comparison function object for the original function template.
Example
In your max-heap example, set_heap_element() could be used as follows:
std::vector<int> v{1,2,3,5,9,20,3};
std::make_heap(v.begin(), v.end());
// set 4th element to 35 in O(log n)
set_heap_element(v.begin(), v.end(), v.begin() + 3, 35);
You could use std::is_heap(), which takes linear time, whenever you want to check whether the max-heap property is still satisfied by v after setting an element with the set_heap_element() function template above:
assert(std::is_heap(v.begin(), v.end()));
What about min heaps?
You can achieve the same for a min heap by passing a std::greater<int> object as the last argument of the function calls to std::make_heap(), set_heap_element() and std::is_heap():
std::vector<int> v{1,2,3,5,9,20,3};
// create a min heap
std::make_heap(v.begin(), v.end(), std::greater<int>{});
// set 4th element to 35 in O(log n)
set_heap_element(v.begin(), v.end(), v.begin() + 3, 35, std::greater<int>{});
// is the min-heap property satisfied?
assert(std::is_heap(v.begin(), v.end(), std::greater<int>{}));

Related

Is there an efficient, time saving method of maintaining a heap in which elements are removed in the middle?

I'm working on a path planning program where I have a priority queue 'U':
using HeapKey = pair<float, float>;
vector<pair<HeapKey, unsigned int>> U;
I order and maintain my priority queue as a binary min-heap (aka. the cheapest node first in the queue) using greater as my comparison function to get a min-heap (maybe not important). While the program is executing and planning a path it is adding nodes to 'U' with push_back() followed by push_heap() to get that node into the correct order and everything is working fine there...
However, the algorithm I'm using calls for sometimes updating a node already present in 'U' with new values. It does this by removing it from 'U' (I find it with find_if() and remove it with erase(), if that's important) and then call the function to re-insert (again push_back() followed by push_heap()) so the node have its updated values.
This have proved a bit of an unexpected problem for me. I'm no expert at this, but as far as I've been able to think, since the node is removed some place INSIDE 'U' then it messes up the order of the heap. I've been able to get the program to work by using make_heap() after the node is removed. However, this solution brought another issue since the program now takes a lot more time to complete, longer the larger my map/nodes in the heap, presumably because make_heap() is re-organizing/iterating through the entire heap every time I update a node, thus slowing down the overall planning.
Sadly I don't have time to change my program and get new results, I can only make use of simple, easy solutions I can implement fast. I'm mostly here to learn and perhaps see if there are some suggestions I can pass on about how to solve this issue of efficiently maintaining a heap/priority queue when you aren't just removing the first or last elements but elements maybe in the middle. Reducing the time taken to plan is the only thing I am missing.
Attempt at minimal reproducible example without going into the actual algorithm and such:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
using Cost = float;
using HeapKey = pair<Cost, Cost>;
pair<Cost, Cost> PAIR1;
vector<pair<HeapKey, unsigned int>> U;
using KeyCompare = std::greater<std::pair<HeapKey, unsigned int>>;
int in_U[20];
ostream& operator<<(ostream& os, pair<Cost, Cost> const& p) {
return os << "<" << p.first << ", " << p.second << ">";
}
bool has_neightbor(unsigned int id) {
if ( (in_U[id+1]) && (in_U[id-1])) {
return true;
}
return false;
}
void insert(unsigned int id, HeapKey k) {
U.push_back({ k, id });
push_heap(U.begin(), U.end(), KeyCompare());
in_U[id]++;
}
void update(unsigned int id) {
Cost x;
Cost y;
if (id != 21) { //lets say 21 is the goal
x = U[id].first.first;
y = U[id].first.second;
}
if (in_U[id]) {
auto it = find_if(U.begin(), U.end(), [=](auto p) { return p.second == id; });
U.erase(it);
make_heap(U.begin(), U.end(), KeyCompare());
in_U[id]--;
}
int r1 = rand() % 10 + 1;
int r2 = rand() % 10 + 1;
if (x != y) {
insert(id, {x + r1, y + r2});
}
}
int main() {
U.push_back({ {8, 2}, 1 });
in_U[1]++;
U.push_back({ {5, 1}, 2 });
in_U[2]++;
U.push_back({ {6, 1}, 3 });
in_U[3]++;
U.push_back({ {6, 5}, 4 });
in_U[4]++;
U.push_back({ {2, 3}, 5 });
in_U[5]++;
U.push_back({ {2, 9}, 6 });
in_U[6]++;
U.push_back({ {9, 2}, 7 });
in_U[7]++;
U.push_back({ {4, 7}, 8 });
in_U[8]++;
U.push_back({ {11, 4}, 9 });
in_U[9]++;
U.push_back({ {2, 2}, 10 });
in_U[10]++;
U.push_back({ {1, 2}, 11 });
in_U[11]++;
U.push_back({ {7, 2}, 12 });
in_U[12]++;
make_heap(U.begin(), U.end(), KeyCompare());
PAIR1.first = 14;
PAIR1.second = 6;
while (U.front().first < PAIR1) {
cout << "Is_heap?: " << is_heap(U.begin(), U.end(), KeyCompare()) << endl;
cout << "U: ";
for (auto p : U) {
cout << p.second << p.first << " - ";
}
cout << endl;
auto uid = U.front().second;
pop_heap(U.begin(), U.end(), KeyCompare());
U.pop_back();
if (has_neightbor(uid)) {
update(uid - 1);
update(uid + 1);
}
}
//getchar();
}

Yes, the algorithm is relatively simple. Note that when considering an item at index i, it's "parent" in a heap is at index (i-1)/2, and it's children are at indecies i*2+1 and i*2+2.
Swap item_to_pop for the last item in the range. This moves that item to the desired (last) position, but inserts a "small" item in the middle of the heap. This needs to be fixed.
If the "small" item at item_to_pop position is larger than it's current parent, then swap with it's parent. Repeat until that item is either no longer larger than it's current parent or is the new root. Then we're done. Notably, this is the same algorithm as push_heap, except with the shortcut that we start in the middle instead of at the end.
If the "small" item at item_to_pop position is smaller than either current child, then swap with the larger child. Repeat until that item is larger than any of its current children (note that near the end it might only have one or no children). Then we're done. Notably, this is the same algorithm as pop_heap, except with the shortcut that we start in the middle instead of at the top.
This algorithm will do at most log2(n)+1 swaps, and log2(n)*2+1 comparisons, making it almost as fast as pop_heap and push_heap. Which isn't really surprising since it's the same algorithm.
template< class RandomIt, class Compare >
constexpr void pop_mid_heap(RandomIt first, RandomIt last, RandomIt item_to_pop, Compare comp) {
assert(std::is_heap(first, last)); //this is compiled out of release builds
assert(first<=item_to_pop);
assert(item_to_pop<last);
using std::swap;
std::size_t new_size = last - first - 1;
if (new_size == 0) return;
//swap the end of the range and item_to_pop, so that item_to_pop is at the end
swap(*item_to_pop, *--last);
if (new_size == 1) return;
//If item_to_pop is bigger than it's parent, then swap with the parent
bool moved_up = false;
RandomIt swap_itr;
while (true) {
std::size_t offset = item_to_pop - first;
if (offset == 0) break; //item_to_pop is at root: exit loop
swap_itr = first + (offset-1) / 2;
if (comp(*item_to_pop, *swap_itr))
break; //item_to_pop smaller than it's parent: exit loop
swap(*item_to_pop, *swap_itr); //swap with parent and repeat
item_to_pop = swap_itr;
moved_up = true;
}
if (moved_up) return; //if we moved the item up, then heap is complete: exit
//If biggest child is bigger than item_to_pop, then swap with that child
while (true) {
std::size_t offset = item_to_pop - first;
std::size_t swap_idx = offset * 2 + 1;
if (swap_idx >= new_size) break; //no children: exit loop
swap_itr = first + swap_idx;
if (swap_idx+1 < new_size && comp(*swap_itr, *(swap_itr+1))) //if right child exists and is bigger, swap that instead
++swap_itr;
if (!comp(item_to_pop, swap_itr)) break; //item_to_pop bigger than biggest child: exit loop
swap(*item_to_pop, *swap_itr); //swap with bigger child and repeat
item_to_pop = swap_itr;
}
}
template< class RandomIt >
constexpr void pop_mid_heap(RandomIt first, RandomIt last, RandomIt item_to_pop) {
pop_mid_heap(first, last, item_to_pop, std::less<>{});
}
https://ideone.com/zNW7h7
Theoretically one can optimize out the "or is the new root" check in the push_heap part, but the checks to detect that case adds complexity that doesn't seem worth it.
IMO, this is useful and should be part of the C++ standard library.

In general it's expensive to update a node in the middle of a binary heap not because the update operation is expensive but because finding the node is an O(n) operation. If you know where the node is in the heap, updating its priority is very easy. My answer at https://stackoverflow.com/a/8706363/56778 shows how to delete a node. Updating a node's priority is similar: rather than replacing the node with the last one in the heap, you just sift the node up or down as required.
If you want the ability to find a node quickly, then you have to build an indexed heap. Basically, you have a dictionary entry for each node. The dictionary key is the node's ID (or whatever you use to identify it), and the value is the node's index in the binary heap. You modify the heap code so that it updates the dictionary entry whenever the node is moved around in the heap. It makes the heap a little bit slower (by a constant factor), but makes finding an arbitrary node an O(1) operation.
Or, you can replace the binary heap with a Pairing Heap, skip list, or any of the other "heap" types that work with node pointers. My experience has been that although the theoretical performance of those two isn't as good as the theoretical performance of Fibonacci heap, the real-world performance is much better.
With either of those it's a whole lot easier to maintain an index: you just add a node reference to it when you add a node to the heap, and remove a reference when the node is removed from the heap. Both of those heap types are easy to build and performance will be about the same as for a binary heap although they will use somewhat more memory. From experience I'll say that Pairing heap is easier to build than skip list, but skip list is a more generally useful data structure.

Inserting multiple values into a vector at specific positions

Say I have a vector of integers like this std::vector<int> _data;
I know that if I want to remove multiple items from _data, then I can simply call
_data.erase( std::remove_if( _data.begin(), _data.end(), [condition] ), _data.end() );
Which is much faster than eraseing multiple elements, as less movement of data is required within the vector. I'm wondering if there's something similar for insertions.
For example, if I have the following pairs
auto pair1 = { _data.begin() + 5, 5 };
auto pair2 = { _data.begin() + 12, 12 };
Can I insert both of these in one iteration using some existing std function? I know I can do something like:
_data.insert( pair2.first, pair2.second );
_data.insert( pair1.first, pair1.second );
But this is (very) slow for large vectors (talking 100,000+ elements).
EDIT: Basically, I have a custom set (and map) which use a vector as the underlying containers. I know I can just use std::set or std::map, but the number of traversals I do far outweighs the insertion/removals. Switching from a set and map to this custom set/map already cut 20% of run-time off. Currently though, insertions take approximately 10% of the remaining run time, so reducing that is important.
The order is also required, unfortunately. As much as possible, I use the unordered_ versions, but in some places the order does matter.

One way is to create another vector with capacity equal to the original size plus the number of the elements being inserted and then do an insert loop with no reallocations, O(N) complexity:
template<class T>
std::vector<T> insert_elements(std::vector<T> const& v, std::initializer_list<std::pair<std::size_t, T>> new_elements) {
std::vector<T> u;
u.reserve(v.size() + new_elements.size());
auto src = v.begin();
size_t copied = 0;
for(auto const& element : new_elements) {
auto to_copy = element.first - copied;
auto src_end = src + to_copy;
u.insert(u.end(), src, src_end);
src = src_end;
copied += to_copy;
u.push_back(element.second);
}
u.insert(u.end(), src, v.end());
return u;
}
int main() {
std::vector<int> v{1, 3, 5};
for(auto e : insert_elements(v, {{1,2}, {2,4}}))
std::cout << e << ' ';
std::cout << '\n';
}
Output:
1 2 3 4 5

Ok, we need some assumptions. Let old_end be a reverse iterator to the last element of your vector. Assume that your _data has been resized to exactly fit both its current content and what you want to insert. Assume that inp is a container of std::pair containing your data to be inserted that is ordered reversely (so first the element that is to be inserted at the hindmost position and so on). Then we can do:
std::merge(old_end, _data.rend(), inp.begin(), inp.end(), data.rend(), [int i = inp.size()-1](const &T t, const &std::pair<Iter, T> p) mutable {
if( std::distance(_data.begin(), p.first) == i ) {
--i;
return false;
}
return true;
}
But I think that is not more clear than using a good old for. The problem with the stl-algorithms is that the predicates work on values and not on iterators thats a bit annoying for this problem.

Here's my take:
template<class Key, class Value>
class LinearSet
{
public:
using Node = std::pair<Key, Value>;
template<class F>
void insert_at_multiple(F&& f)
{
std::queue<Node> queue;
std::size_t index = 0;
for (auto it = _kvps.begin(); it != _kvps.end(); ++it)
{
// The container size is left untouched here, no iterator invalidation.
if (std::optional<Node> toInsert = f(index))
{
queue.push(*it);
*it = std::move(*toInsert);
}
else
{
++index;
// Replace current node with queued one.
if (!queue.empty())
{
queue.push(std::move(*it));
*it = std::move(queue.front());
queue.pop();
}
}
}
// We now have as many displaced items in the queue as were inserted,
// add them to the end.
while (!queue.empty())
{
_kvps.emplace_back(std::move(queue.front()));
queue.pop();
}
}
private:
std::vector<Node> _kvps;
};
https://godbolt.org/z/EStKgQ
This is a linear time algorithm that doesn't need to know the number of inserted elements a priori. For each index, it asks for an element to insert there. If it gets one, it pushes the corresponding existing vector element to a queue and replaces it with the new one. Otherwise, it extracts the current item to the back of the queue and puts the item at the front of the queue into the current position (noop if no elements were inserted yet). Note that the vector size is left untouched during all this. Only at the end do we push back all items still in the queue.
Note that the indices we use for determining inserted item locations here are all pre-insertion. I find this a point of potential confusion (and it is a limitation - you can't add an element at the very end with this algorithm. Could be remedied by calling f during the second loop too, working on that...).
Here's a version that allows inserting arbitrarily many elements at the end (and everywhere else). It passes post-insertion indices to the functor!
template<class F>
void insert_at_multiple(F&& f)
{
std::queue<Node> queue;
std::size_t index = 0;
for (auto it = _kvps.begin(); it != _kvps.end(); ++it)
{
if (std::optional<Node> toInsert = f(index))
queue.push(std::move(*toInsert));
if (!queue.empty())
{
queue.push(std::move(*it));
*it = std::move(queue.front());
queue.pop();
}
++index;
}
// We now have as many displaced items in the queue as were inserted,
// add them to the end.
while (!queue.empty())
{
if (std::optional<Node> toInsert = f(index))
{
queue.push(std::move(*toInsert));
}
_kvps.emplace_back(std::move(queue.front()));
queue.pop();
++index;
}
}
https://godbolt.org/z/DMuCtJ
Again, this leaves potential for confusion over what it means to insert at indices 0 and 1 (do you end up with an original element in between the two? In the first snippet you would, in the second you wouldn't). Can you insert at the same index multiple times? With pre-insertion indices that makes sense, with post-insertion indices it doesn't. You could also write this in terms of passing the current *it (i.e. key value pair) to the functor, but that alone seems not too useful...

This is an attempt I made, which inserts in reverse order. I did get rid of the iterators/indices for this.
template<class T>
void insert( std::vector<T> &vector, const std::vector<T> &values ) {
size_t last_index = vector.size() - 1;
vector.resize( vector.size() + values.size() ); // relies on T being default constructable
size_t move_position = vector.size() - 1;
size_t last_value_index = values.size() - 1;
size_t values_size = values.size();
bool isLastIndex = false;
while ( !isLastIndex && values_size ) {
if ( values[last_value_index] > vector[last_index] ) {
vector[move_position] = std::move( values[last_value_index--] );
--values_size;
} else {
isLastIndex = last_index == 0;
vector[move_position] = std::move( vector[last_index--] );
}
--move_position;
}
if ( isLastIndex && values_size ) {
while ( values_size ) {
vector[move_position--] = std::move( values[last_value_index--] );
--values_size;
}
}
}
Tried with ICC, Clang, and GCC on Godbolt, and vector's insert was faster (for 5 numbers inserted). On my machine, MSVC, same result but less severe. I also compared with Maxim's version from his answer. I realize using Godbolt isn't a good method for comparison, but I don't have access to the 3 other compilers on my current machine.
https://godbolt.org/z/vjV2wA
Results from my machine:
My insert: 659us
Maxim insert: 712us
Vector insert: 315us
Godbolt's ICC
My insert: 470us
Maxim insert: 139us
Vector insert: 127us
Godbolt's GCC
My insert: 815us
Maxim insert: 97us
Vector insert: 97us
Godbolt's Clang:
My insert: 477us
Maxim insert: 188us
Vector insert: 96us

How do I decrease the count of an element in a multiset in C++?

I am using a multi-set in c++, which I believe stores an element and the respective count of it when it is inserted.
Here, when I want to delete an element, I just want to decrease the count of that element in the set by 1 till it is greater than 0.
Example C++ code:
multiset<int>mset;
mset.insert(2);
mset.insert(2);
printf("%d ",mset.count(2)); //this returns 2
// here I need an O(1) constant time function (in-built or whatever )
// to decrease the count of 2 in the set without deleting it
// Remember constant time only
-> Function and its specifications
printf("%d ",mset.count(2)); // it should print 1 now .
Is there any way to achieve that or should i go by deleting that and inserting the element 2 by the required (count-1) times?

... I am using a multi-set in c++, which stores an element and the respective count of it ...
No you aren't. You're using a multi-set which stores n copies of a value which was inserted n times.
If you want to store something relating a value to a count, use an associative container like std::map<int, int>, and use map[X]++ to increment the number of Xs.
... i need an O(1) constant time function ... to decrease the count ...
Both map and set have O(log N) complexity just to find the element you want to alter, so this is impossible with them. Use std::unordered_map/set to get O(1) complexity.
... I just want to decrease the count of that element in the set by 1 till it is >0
I'm not sure what that means.
with a set:
to remove all copies of an element from the set, use equal_range to get a range (pair of iterators), and then erase that range
to remove all-but-one copies in a non-empty range, just increment the first iterator in the pair and check it's still not equal to the second iterator before erasing the new range.
these both have an O(log N) lookup (equal_range) step followed by a linear-time erase step (although it's linear with the number of elements having the same key, not N).
with a map:
to remove the count from a map, just erase the key
to set the count to one, just use map[key]=1;
both of these have an O(log N) lookup followed by a constant-time erase
with an unordered map ... for your purposes it's identical to the map above, except with O(1) complexity.
Here's a quick example using unordered_map:
template <typename Key>
class Counter {
std::unordered_map<Key, unsigned> count_;
public:
unsigned inc(Key k, unsigned delta = 1) {
auto result = count_.emplace(k, delta);
if (result.second) {
return delta;
} else {
unsigned& current = result.first->second;
current += delta;
return current;
}
}
unsigned dec(Key k, unsigned delta = 1) {
auto iter = count_.find(k);
if (iter == count_.end()) return 0;
unsigned& current = iter->second;
if (current > delta) {
current -= delta;
return current;
}
// else current <= delta means zero
count_.erase(iter);
return 0;
}
unsigned get(Key k) const {
auto iter = count_.find(k);
if (iter == count_.end()) return 0;
return iter->second;
}
};
and use it like so:
int main() {
Counter<int> c;
// test increment
assert(c.inc(1) == 1);
assert(c.inc(2) == 1);
assert(c.inc(2) == 2);
// test lookup
assert(c.get(0) == 0);
assert(c.get(1) == 1);
// test simple decrement
assert(c.get(2) == 2);
assert(c.dec(2) == 1);
assert(c.get(2) == 1);
// test erase and underflow
assert(c.dec(2) == 0);
assert(c.dec(2) == 0);
assert(c.dec(1, 42) == 0);
}

Combining arrays/lists in an specific fashion

I'm trying to find a sensible algorithm to combine multiple lists/vectors/arrays as defined below.
Each element contains a float declaring the start of its range of validity and a constant that is used over this range. Where ranges from different lists overlap their constants need to be added to produce one global list.
I've done an attempt at an illustration below to try and give a good idea of what I mean:
First List:
0.5---------------2------------3.2--------4
a1 a2 a3
Second List:
1----------2----------3---------------4.5
b1 b2 b3
Desired Output:
0.5----1----------2----------3-3.2--------4--4.5
a1 a1+b1 a2+b2 ^ a3+b3 b3
b3+a2
I can't think of a sensible way of going about this in the case of n lists; Just 2 is quite easy to brute force.
Any hints or ideas would be welcome. Each list is represented as a C++ std::vector (so feel free to use standard algorithms) and are sorted by start of range value.
Cheers!
Edit: Thanks for the advice, I've come up with a naive implementation, not sure why I couldn't get here on my own first. To my mind the obvious improvement would be to store an iterator for each vector since they're already sorted and not have to re-traverse each vector for each point. Given that most vectors will contain less than 100 elements, but there may be many vectors this may or may not be worthwhile. I'd have to profile to see.
Any thoughts on this?
#include <vector>
#include <iostream>
struct DataType
{
double intervalStart;
int data;
// More data here, the data is not just a single int, but that
// works for our demonstration
};
int main(void)
{
// The final "data" of each vector is meaningless as it refers to
// the coming range which won't be used as this is only for
// bounded ranges
std::vector<std::vector<DataType> > input = {{{0.5, 1}, {2.0, 3}, {3.2, 3}, {4.0, 4}},
{{1.0, 5}, {2.0, 6}, {3.0, 7}, {4.5, 8}},
{{-34.7895, 15}, {-6.0, -2}, {1.867, 5}, {340, 7}}};
// Setup output vector
std::vector<DataType> output;
std::size_t inputSize = 0;
for (const auto& internalVec : input)
inputSize += internalVec.size();
output.reserve(inputSize);
// Fill output vector
for (const auto& internalVec : input)
std::copy(internalVec.begin(), internalVec.end(), std::back_inserter(output));
// Sort output vector by intervalStartPoints
std::sort(output.begin(), output.end(),
[](const DataType& data1, const DataType& data2)
{
return data1.intervalStart < data2.intervalStart;
});
// Remove DataTypes with same intervalStart - each interval can only start once
output.erase(std::unique(output.begin(), output.end(),
[](const DataType& dt1, const DataType& dt2)
{
return dt1.intervalStart == dt2.intervalStart;
}), output.end());
// Output now contains all the right intersections, just not with the right data
// Lambda to find the associated data value associated with an
// intervsalStart value in a vector
auto FindDataValue = [&](const std::vector<DataType> v, double startValue)
{
auto iter = std::find_if(v.begin(), v.end(), [startValue](const DataType& data)
{
return data.intervalStart > startValue;
});
if (iter == v.begin() || iter == v.end())
{
return 0;
}
return (iter-1)->data;
};
// For each interval in the output traverse the input and sum the
// data constants
for (auto& val : output)
{
int sectionData = 0;
for (const auto& iv : input)
sectionData += FindDataValue(iv, val.intervalStart);
val.data = sectionData;
}
for (const auto& i : output)
std::cout << "loc: " << i.intervalStart << " data: " << i.data << std::endl;
return 0;
}
Edit2: #Stas's code is a very good way to approach this problem. I've just tested it on all the edge cases I could think of.
Here's my merge_intervals implementation in case anyone is interested. The only slight change I've had to make to the snippets Stas provided is:
for (auto& v : input)
v.back().data = 0;
Before combining the vectors as suggested. Thanks!
template<class It1, class It2, class OutputIt>
OutputIt merge_intervals(It1 first1, It1 last1,
It2 first2, It2 last2,
OutputIt destBegin)
{
const auto begin1 = first1;
const auto begin2 = first2;
auto CombineData = [](const DataType& d1, const DataType& d2)
{
return DataType{d1.intervalStart, (d1.data+d2.data)};
};
for (; first1 != last1; ++destBegin)
{
if (first2 == last2)
{
return std::copy(first1, last1, destBegin);
}
if (first1->intervalStart == first2->intervalStart)
{
*destBegin = CombineData(*first1, *first2);
++first1; ++first2;
}
else if (first1->intervalStart < first2->intervalStart)
{
if (first2 > begin2)
*destBegin = CombineData(*first1, *(first2-1));
else
*destBegin = *first1;
++first1;
}
else
{
if (first1 > begin1)
*destBegin = CombineData(*first2, *(first1-1));
else
*destBegin = *first2;
++first2;
}
}
return std::copy(first2, last2, destBegin);
}

Unfortunately, your algorithm is inherently slow. It doesn't make sense to profile or apply some C++ specific tweaks, it won't help. It will never stop calculation on pretty small sets like merging 1000 lists of 10000 elements each.
Let's try to evaluate time complexity of your algo. For the sake of simplicity, let's merge only lists of the same length.
L - length of a list
N - number of lists to be merged
T = L * N - length of a whole concatenated list
Complexity of your algorithm steps:
create output vector - O(T)
sort output vector - O(T*log(T))
filter output vector - O(T)
fix data in output vector - O(T*T)
See, the last step defines the whole algorithm complexity: O(T*T) = O(L^2*N^2). It is not acceptable for practical application. See, to merge 1000 lists of 10000 elements each, the algorithm should run 10^14 cycles.
Actually, the task is pretty complex, so do not try to solve it in one step. Divide and conquer!
Write an algorithm that merges two lists into one
Use it to merge a list of lists
Merging two lists into one
This is relatively easy to implement (but be careful with corner cases). The algorithm should have linear time complexity: O(2*L). Take a look at how std::merge is implemented. You just need to write your custom variant of std::merge, let's call it merge_intervals.
Applying a merge algorithm to a list of lists
This is a little bit tricky, but again, divide and conquer! The idea is to do recursive merge: split a list of lists on two halves and merge them.
template<class It, class Combine>
auto merge_n(It first, It last, Combine comb)
-> typename std::remove_reference<decltype(*first)>::type
{
if (first == last)
throw std::invalid_argument("Empty range");
auto count = std::distance(first, last);
if (count == 1)
return *first;
auto it = first;
std::advance(it, count / 2);
auto left = merge_n(first, it, comb);
auto right = merge_n(it, last, comb);
return comb(left, right);
}
Usage:
auto combine = [](const std::vector<DataType>& a, const std::vector<DataType>& b)
{
std::vector<DataType> result;
merge_intervals(a.begin(), a.end(), b.begin(), b.end(),
std::back_inserter(result));
return result;
};
auto output = merge_n(input.begin(), input.end(), combine);
The nice property of such recursive approach is a time complexity: it is O(L*N*log(N)) for the whole algorithm. So, to merge 1000 lists of 10000 elements each, the algorithm should run 10000 * 1000 * 9.966 = 99,660,000 cycles. It is 1,000,000 times faster than original algorithm.
Moreover, such algorithm is inherently parallelizable. It is not a big deal to write parallel version of merge_n and run it on thread pool.

I know I'm a bit late to the party, but when I started writing this you hadn't a suitable answer yet, and my solution should have a relatively good time complexity, so here you go:
I think the most straightforward way to approach this is to see each of your sorted lists as a stream of events: At a given time, the value (of that stream) changes to a new value:
template<typename T>
struct Point {
using value_type = T;
float time;
T value;
};
You want to superimpose those streams into a single stream (i.e. having their values summed up at any given point). For that you take the earliest event from all streams, and apply its effect on the result stream. Therefore, you need to first "undo" the effect that the previous value from that stream made on the result stream, and then add the new value to the current value of the result stream.
To be able to do that, you need to remember for each stream the last value, the next value (and when the stream is empty):
std::vector<std::tuple<Value, StreamIterator, StreamIterator>> streams;
The first element of the tuple is the last effect of that stream onto the result stream, the second is an iterator pointing to the streams next event, and the last is the end iterator of that stream:
transform(from, to, inserter(streams, begin(streams)),
[] (auto & stream) {
return make_tuple(static_cast<Value>(0), begin(stream), end(stream));
});
To be able to always get the earliest event of all the streams, it helps to keep the (information about the) streams in a (min) heap, where the top element is the stream with the next (earliest) event. That's the purpose of the following comparator:
auto heap_compare = [] (auto const & lhs, auto const & rhs) {
bool less = (*get<1>(lhs)).time < (*get<1>(rhs)).time;
return (not less);
};
Then, as long as there are still some events (i.e. some stream that is not empty), first (re)build the heap, take the top element and apply its next event to the result stream, and then remove that element from the stream. Finally, if the stream is now empty, remove it.
// The current value of the result stream.
Value current = 0;
while (streams.size() > 0) {
// Reorder the stream information to get the one with the earliest next
// value into top ...
make_heap(begin(streams), end(streams), heap_compare);
// .. and select it.
auto & earliest = streams[0];
// New value is the current one, minus the previous effect of the selected
// stream plus the new value from the selected stream
current = current - get<0>(earliest) + (*get<1>(earliest)).value;
// Store the new time point with the new value and the time of the used
// time point from the selected stream
*out++ = Point<Value>{(*get<1>(earliest)).time, current};
// Update the effect of the selected stream
get<0>(earliest) = (*get<1>(earliest)).value;
// Advance selected stream to its next time point
++(get<1>(earliest));
// Remove stream if empty
if (get<1>(earliest) == get<2>(earliest)) {
swap(streams[0], streams[streams.size() - 1u]);
streams.pop_back();
}
}
This will return a stream where there might be multiple points with the same time, but a different value. This occurs when there are multiple "events" at the same time. If you only want the last value, i.e. the value after all these events happened, then one needs to combine them:
merge_point_lists(begin(input), end(input), inserter(merged, begin(merged)));
// returns points with the same time, but with different values. remove these
// duplicates, by first making them REALLY equal, i.e. setting their values
// to the last value ...
for (auto write = begin(merged), read = begin(merged), stop = end(merged);
write != stop;) {
for (++read; (read != stop) and (read->time == write->time); ++read) {
write->value = read->value;
}
for (auto const cached = (write++)->value; write != read; ++write) {
write->value = cached;
}
}
// ... and then removing them.
merged.erase(
unique(begin(merged), end(merged),
[](auto const & lhs, auto const & rhs) {
return (lhs.time == rhs.time);}),
end(merged));
(Live example here)
Concerning the time complexity: This is iterating over all "events", so it depends on the number of events e. The very first make_heap call has to built a complete new heap, this has worst case complexity of 3 * s where s is the number of streams the function has to merge. On subsequent calls, make_heap only has to correct the very first element, this has worst case complexity of log(s'). I write s' because the number of streams (that need to be considered) will decrease to zero. This
gives
3s + (e-1) * log(s')
as complexity. Assuming the worst case, where s' decreases slowly (this happens when the events are evenly distributed across the streams, i.e. all streams have the same number of events:
3s + (e - 1 - s) * log(s) + (sum (log(i)) i = i to s)

Do you really need a data structure as result? I don't think so. Actually you're defining several functions that can be added. The examples you give are encoded using a 'start, value(, implicit end)' tuple. The basic building block is a function that looks up it's value at a certain point:
double valueAt(const vector<edge> &starts, float point) {
auto it = std::adjacent_find(begin(starts), end(starts),
[&](edge e1, edge e2) {
return e1.x <= point && point < e2.x;
});
return it->second;
};
The function value for a point is the sum of the function values for all code-series.
If you really need a list in the end, you can join and sort all edge.x values for all series, and create the list from that.
Unless performance is an issue :)

If you can combine two of these structures, you can combine many.
First, encapsulate your std::vector into a class. Implement what you know as operator+= (and define operator+ in terms of this if you want). With that in place, you can combine as many as you like, just by repeated addition. You could even use std::accumulate to combine a collection of them.

Reorder vector using a vector of indices [duplicate]

This question already has answers here:
How do I sort a std::vector by the values of a different std::vector? [duplicate]
(13 answers)
Closed 12 months ago.
I'd like to reorder the items in a vector, using another vector to specify the order:
char A[] = { 'a', 'b', 'c' };
size_t ORDER[] = { 1, 0, 2 };
vector<char> vA(A, A + sizeof(A) / sizeof(*A));
vector<size_t> vOrder(ORDER, ORDER + sizeof(ORDER) / sizeof(*ORDER));
reorder_naive(vA, vOrder);
// A is now { 'b', 'a', 'c' }
The following is an inefficient implementation that requires copying the vector:
void reorder_naive(vector<char>& vA, const vector<size_t>& vOrder)
{
assert(vA.size() == vOrder.size());
vector vCopy = vA; // Can we avoid this?
for(int i = 0; i < vOrder.size(); ++i)
vA[i] = vCopy[ vOrder[i] ];
}
Is there a more efficient way, for example, that uses swap()?

This algorithm is based on chmike's, but the vector of reorder indices is const. This function agrees with his for all 11! permutations of [0..10]. The complexity is O(N^2), taking N as the size of the input, or more precisely, the size of the largest orbit.
See below for an optimized O(N) solution which modifies the input.
template< class T >
void reorder(vector<T> &v, vector<size_t> const &order ) {
for ( int s = 1, d; s < order.size(); ++ s ) {
for ( d = order[s]; d < s; d = order[d] ) ;
if ( d == s ) while ( d = order[d], d != s ) swap( v[s], v[d] );
}
}
Here's an STL style version which I put a bit more effort into. It's about 47% faster (that is, almost twice as fast over [0..10]!) because it does all the swaps as early as possible and then returns. The reorder vector consists of a number of orbits, and each orbit is reordered upon reaching its first member. It's faster when the last few elements do not contain an orbit.
template< typename order_iterator, typename value_iterator >
void reorder( order_iterator order_begin, order_iterator order_end, value_iterator v ) {
typedef typename std::iterator_traits< value_iterator >::value_type value_t;
typedef typename std::iterator_traits< order_iterator >::value_type index_t;
typedef typename std::iterator_traits< order_iterator >::difference_type diff_t;
diff_t remaining = order_end - 1 - order_begin;
for ( index_t s = index_t(), d; remaining > 0; ++ s ) {
for ( d = order_begin[s]; d > s; d = order_begin[d] ) ;
if ( d == s ) {
-- remaining;
value_t temp = v[s];
while ( d = order_begin[d], d != s ) {
swap( temp, v[d] );
-- remaining;
}
v[s] = temp;
}
}
}
And finally, just to answer the question once and for all, a variant which does destroy the reorder vector (filling it with -1's). For permutations of [0..10], It's about 16% faster than the preceding version. Because overwriting the input enables dynamic programming, it is O(N), asymptotically faster for some cases with longer sequences.
template< typename order_iterator, typename value_iterator >
void reorder_destructive( order_iterator order_begin, order_iterator order_end, value_iterator v ) {
typedef typename std::iterator_traits< value_iterator >::value_type value_t;
typedef typename std::iterator_traits< order_iterator >::value_type index_t;
typedef typename std::iterator_traits< order_iterator >::difference_type diff_t;
diff_t remaining = order_end - 1 - order_begin;
for ( index_t s = index_t(); remaining > 0; ++ s ) {
index_t d = order_begin[s];
if ( d == (diff_t) -1 ) continue;
-- remaining;
value_t temp = v[s];
for ( index_t d2; d != s; d = d2 ) {
swap( temp, v[d] );
swap( order_begin[d], d2 = (diff_t) -1 );
-- remaining;
}
v[s] = temp;
}
}

In-place reordering of vector
Warning: there is an ambiguity about the semantic what the ordering-indices mean. Both are answered here
move elements of vector to the position of the indices
Interactive version here.
#include <iostream>
#include <vector>
#include <assert.h>
using namespace std;
void REORDER(vector<double>& vA, vector<size_t>& vOrder)
{
assert(vA.size() == vOrder.size());
// for all elements to put in place
for( int i = 0; i < vA.size() - 1; ++i )
{
// while the element i is not yet in place
while( i != vOrder[i] )
{
// swap it with the element at its final place
int alt = vOrder[i];
swap( vA[i], vA[alt] );
swap( vOrder[i], vOrder[alt] );
}
}
}
int main()
{
std::vector<double> vec {7, 5, 9, 6};
std::vector<size_t> inds {1, 3, 0, 2};
REORDER(vec, inds);
for (size_t vv = 0; vv < vec.size(); ++vv)
{
std::cout << vec[vv] << std::endl;
}
return 0;
}
output
9
7
6
5
note that you can save one test because if n-1 elements are in place the last nth element is certainly in place.
On exit vA and vOrder are properly ordered.
This algorithm performs at most n-1 swapping because each swap moves the element to its final position. And we'll have to do at most 2N tests on vOrder.
draw the elements of vector from the position of the indices
Try it interactively here.
#include <iostream>
#include <vector>
#include <assert.h>
template<typename T>
void reorder(std::vector<T>& vec, std::vector<size_t> vOrder)
{
assert(vec.size() == vOrder.size());
for( size_t vv = 0; vv < vec.size() - 1; ++vv )
{
if (vOrder[vv] == vv)
{
continue;
}
size_t oo;
for(oo = vv + 1; oo < vOrder.size(); ++oo)
{
if (vOrder[oo] == vv)
{
break;
}
}
std::swap( vec[vv], vec[vOrder[vv]] );
std::swap( vOrder[vv], vOrder[oo] );
}
}
int main()
{
std::vector<double> vec {7, 5, 9, 6};
std::vector<size_t> inds {1, 3, 0, 2};
reorder(vec, inds);
for (size_t vv = 0; vv < vec.size(); ++vv)
{
std::cout << vec[vv] << std::endl;
}
return 0;
}
Output
5
6
7
9

It appears to me that vOrder contains a set of indexes in the desired order (for example the output of sorting by index). The code example here follows the "cycles" in vOrder, where following a sub-set (could be all of vOrder) of indexes will cycle through the sub-set, ending back at the first index of the sub-set.
Wiki article on "cycles"
https://en.wikipedia.org/wiki/Cyclic_permutation
In the following example, every swap places at least one element in it's proper place. This code example effectively reorders vA according to vOrder, while "unordering" or "unpermuting" vOrder back to its original state (0 :: n-1). If vA contained the values 0 through n-1 in order, then after reorder, vA would end up where vOrder started.
template <class T>
void reorder(vector<T>& vA, vector<size_t>& vOrder)
{
assert(vA.size() == vOrder.size());
// for all elements to put in place
for( size_t i = 0; i < vA.size(); ++i )
{
// while vOrder[i] is not yet in place
// every swap places at least one element in it's proper place
while( vOrder[i] != vOrder[vOrder[i]] )
{
swap( vA[vOrder[i]], vA[vOrder[vOrder[i]]] );
swap( vOrder[i], vOrder[vOrder[i]] );
}
}
}
This can also be implemented a bit more efficiently using moves instead swaps. A temp object is needed to hold an element during the moves. Example C code, reorders A[] according to indexes in I[], also sorts I[] :
void reorder(int *A, int *I, int n)
{
int i, j, k;
int tA;
/* reorder A according to I */
/* every move puts an element into place */
/* time complexity is O(n) */
for(i = 0; i < n; i++){
if(i != I[i]){
tA = A[i];
j = i;
while(i != (k = I[j])){
A[j] = A[k];
I[j] = j;
j = k;
}
A[j] = tA;
I[j] = j;
}
}
}

If it is ok to modify the ORDER array then an implementation that sorts the ORDER vector and at each sorting operation also swaps the corresponding values vector elements could do the trick, I think.

A survey of existing answers
You ask if there is "a more efficient way". But what do you mean by efficient and what are your requirements?
Potatoswatter's answer works in O(N²) time with O(1) additional space and doesn't mutate the reordering vector.
chmike and rcgldr give answers which use O(N) time with O(1) additional space, but they achieve this by mutating the reordering vector.
Your original answer allocates new space and then copies data into it while Tim MB suggests using move semantics. However, moving still requires a place to move things to and an object like an std::string has both a length variable and a pointer. In other words, a move-based solution requires O(N) allocations for any objects and O(1) allocations for the new vector itself. I explain why this is important below.
Preserving the reordering vector
We might want that reordering vector! Sorting costs O(N log N). But, if you know you'll be sorting several vectors in the same way, such as in a Structure of Arrays (SoA) context, you can sort once and then reuse the results. This can save a lot of time.
You might also want to sort and then unsort data. Having the reordering vector allows you to do this. A use case here is for performing genomic sequencing on GPUs where maximal speed efficiency is obtained by having sequences of similar lengths processed in batches. We cannot rely on the user providing sequences in this order so we sort and then unsort.
So, what if we want the best of all worlds: O(N) processing without the costs of additional allocation but also without mutating our ordering vector (which we might, after all, want to reuse)? To find that world, we need to ask:
Why is extra space bad?
There are two reasons you might not want to allocate additional space.
The first is that you don't have much space to work with. This can occur in two situations: you're on an embedded device with limited memory. Usually this means you're working with small datasets, so the O(N²) solution is probably fine here. But it can also happen when you are working with really large datasets. In this case O(N²) is unacceptable and you have to use one of the O(N) mutating solutions.
The other reason extra space is bad is because allocation is expensive. For smaller datasets it can cost more than the actual computation. Thus, one way to achieve efficiency is to eliminate allocation.
Outline
When we mutate the ordering vector we are doing so as a way to indicate whether elements are in their permuted positions. Rather than doing this, we could use a bit-vector to indicate that same information. However, if we allocate the bit vector each time that would be expensive.
Instead, we could clear the bit vector each time by resetting it to zero. However, that incurs an additional O(N) cost per function use.
Rather, we can store a "version" value in a vector and increment this on each function use. This gives us O(1) access, O(1) clear, and an amoritzed allocation cost. This works similarly to a persistent data structure. The downside is that if we use an ordering function too often the version counter needs to be reset, though the O(N) cost of doing so is amortized.
This raises the question: what is the optimal data type for the version vector? A bit-vector maximizes cache utilization but requires a full O(N) reset after each use. A 64-bit data type probably never needs to be reset, but has poor cache utilization. Experimenting is the best way to figure this out.
Two types of permutations
We can view an ordering vector as having two senses: forward and backward. In the forward sense, the vector tell us where elements go to. In the backward sense, the vector tells us where elements are coming from. Since the ordering vector is implicitly a linked list, the backward sense requires O(N) additional space, but, again, we can amortize the allocation cost. Applying the two senses sequentially brings us back to our original ordering.
Performance
Running single-threaded on my "Intel(R) Xeon(R) E-2176M CPU # 2.70GHz", the following code takes about 0.81ms per reordering for sequences 32,767 elements long.
Code
Fully commented code for both senses with tests:
#include <algorithm>
#include <cassert>
#include <random>
#include <stack>
#include <stdexcept>
#include <vector>
///#brief Reorder a vector by moving its elements to indices indicted by another
/// vector. Takes O(N) time and O(N) space. Allocations are amoritzed.
///
///#param[in,out] values Vector to be reordered
///#param[in] ordering A permutation of the vector
///#param[in,out] visited A black-box vector to be reused between calls and
/// shared with with `backward_reorder()`
template<class ValueType, class OrderingType, class ProgressType>
void forward_reorder(
std::vector<ValueType> &values,
const std::vector<OrderingType> &ordering,
std::vector<ProgressType> &visited
){
if(ordering.size()!=values.size()){
throw std::runtime_error("ordering and values must be the same size!");
}
//Size the visited vector appropriately. Since vectors don't shrink, this will
//shortly become large enough to handle most of the inputs. The vector is 1
//larger than necessary because the first element is special.
if(visited.empty() || visited.size()-1<values.size());
visited.resize(values.size()+1);
//If the visitation indicator becomes too large, we reset everything. This is
//O(N) expensive, but unlikely to occur in most use cases if an appropriate
//data type is chosen for the visited vector. For instance, an unsigned 32-bit
//integer provides ~4B uses before it needs to be reset. We subtract one below
//to avoid having to think too much about off-by-one errors. Note that
//choosing the biggest data type possible is not necessarily a good idea!
//Smaller data types will have better cache utilization.
if(visited.at(0)==std::numeric_limits<ProgressType>::max()-1)
std::fill(visited.begin(), visited.end(), 0);
//We increment the stored visited indicator and make a note of the result. Any
//value in the visited vector less than `visited_indicator` has not been
//visited.
const auto visited_indicator = ++visited.at(0);
//For doing an early exit if we get everything in place
auto remaining = values.size();
//For all elements that need to be placed
for(size_t s=0;s<ordering.size() && remaining>0;s++){
assert(visited[s+1]<=visited_indicator);
//Ignore already-visited elements
if(visited[s+1]==visited_indicator)
continue;
//Don't rearrange if we don't have to
if(s==visited[s])
continue;
//Follow this cycle, putting elements in their places until we get back
//around. Use move semantics for speed.
auto temp = std::move(values[s]);
auto i = s;
for(;s!=(size_t)ordering[i];i=ordering[i],--remaining){
std::swap(temp, values[ordering[i]]);
visited[i+1] = visited_indicator;
}
std::swap(temp, values[s]);
visited[i+1] = visited_indicator;
}
}
///#brief Reorder a vector by moving its elements to indices indicted by another
/// vector. Takes O(2N) time and O(2N) space. Allocations are amoritzed.
///
///#param[in,out] values Vector to be reordered
///#param[in] ordering A permutation of the vector
///#param[in,out] visited A black-box vector to be reused between calls and
/// shared with with `forward_reorder()`
template<class ValueType, class OrderingType, class ProgressType>
void backward_reorder(
std::vector<ValueType> &values,
const std::vector<OrderingType> &ordering,
std::vector<ProgressType> &visited
){
//The orderings form a linked list. We need O(N) memory to reverse a linked
//list. We use `thread_local` so that the function is reentrant.
thread_local std::stack<OrderingType> stack;
if(ordering.size()!=values.size()){
throw std::runtime_error("ordering and values must be the same size!");
}
//Size the visited vector appropriately. Since vectors don't shrink, this will
//shortly become large enough to handle most of the inputs. The vector is 1
//larger than necessary because the first element is special.
if(visited.empty() || visited.size()-1<values.size());
visited.resize(values.size()+1);
//If the visitation indicator becomes too large, we reset everything. This is
//O(N) expensive, but unlikely to occur in most use cases if an appropriate
//data type is chosen for the visited vector. For instance, an unsigned 32-bit
//integer provides ~4B uses before it needs to be reset. We subtract one below
//to avoid having to think too much about off-by-one errors. Note that
//choosing the biggest data type possible is not necessarily a good idea!
//Smaller data types will have better cache utilization.
if(visited.at(0)==std::numeric_limits<ProgressType>::max()-1)
std::fill(visited.begin(), visited.end(), 0);
//We increment the stored visited indicator and make a note of the result. Any
//value in the visited vector less than `visited_indicator` has not been
//visited.
const auto visited_indicator = ++visited.at(0);
//For doing an early exit if we get everything in place
auto remaining = values.size();
//For all elements that need to be placed
for(size_t s=0;s<ordering.size() && remaining>0;s++){
assert(visited[s+1]<=visited_indicator);
//Ignore already-visited elements
if(visited[s+1]==visited_indicator)
continue;
//Don't rearrange if we don't have to
if(s==visited[s])
continue;
//The orderings form a linked list. We need to follow that list to its end
//in order to reverse it.
stack.emplace(s);
for(auto i=s;s!=(size_t)ordering[i];i=ordering[i]){
stack.emplace(ordering[i]);
}
//Now we follow the linked list in reverse to its beginning, putting
//elements in their places. Use move semantics for speed.
auto temp = std::move(values[s]);
while(!stack.empty()){
std::swap(temp, values[stack.top()]);
visited[stack.top()+1] = visited_indicator;
stack.pop();
--remaining;
}
visited[s+1] = visited_indicator;
}
}
int main(){
std::mt19937 gen;
std::uniform_int_distribution<short> value_dist(0,std::numeric_limits<short>::max());
std::uniform_int_distribution<short> len_dist (0,std::numeric_limits<short>::max());
std::vector<short> data;
std::vector<short> ordering;
std::vector<short> original;
std::vector<size_t> progress;
for(int i=0;i<1000;i++){
const int len = len_dist(gen);
data.clear();
ordering.clear();
for(int i=0;i<len;i++){
data.push_back(value_dist(gen));
ordering.push_back(i);
}
original = data;
std::shuffle(ordering.begin(), ordering.end(), gen);
forward_reorder(data, ordering, progress);
assert(original!=data);
backward_reorder(data, ordering, progress);
assert(original==data);
}
}

Never prematurely optimize. Meassure and then determine where you need to optimize and what. You can end with complex code that is hard to maintain and bug-prone in many places where performance is not an issue.
With that being said, do not early pessimize. Without changing the code you can remove half of your copies:
template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> const & order )
{
std::vector<T> tmp; // create an empty vector
tmp.reserve( data.size() ); // ensure memory and avoid moves in the vector
for ( std::size_t i = 0; i < order.size(); ++i ) {
tmp.push_back( data[order[i]] );
}
data.swap( tmp ); // swap vector contents
}
This code creates and empty (big enough) vector in which a single copy is performed in-order. At the end, the ordered and original vectors are swapped. This will reduce the copies, but still requires extra memory.
If you want to perform the moves in-place, a simple algorithm could be:
template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> const & order )
{
for ( std::size_t i = 0; i < order.size(); ++i ) {
std::size_t original = order[i];
while ( i < original ) {
original = order[original];
}
std::swap( data[i], data[original] );
}
}
This code should be checked and debugged. In plain words the algorithm in each step positions the element at the i-th position. First we determine where the original element for that position is now placed in the data vector. If the original position has already been touched by the algorithm (it is before the i-th position) then the original element was swapped to order[original] position. Then again, that element can already have been moved...
This algorithm is roughly O(N^2) in the number of integer operations and thus is theoretically worse in performance time as compare to the initial O(N) algorithm. But it can compensate if the N^2 swap operations (worst case) cost less than the N copy operations or if you are really constrained by memory footprint.

It's an interesting intellectual exercise to do the reorder with O(1) space requirement but in 99.9% of the cases the simpler answer will perform to your needs:
void permute(vector<T>& values, const vector<size_t>& indices)
{
vector<T> out;
out.reserve(indices.size());
for(size_t index: indices)
{
assert(0 <= index && index < values.size());
out.push_back(std::move(values[index]));
}
values = std::move(out);
}
Beyond memory requirements, the only way I can think of this being slower would be due to the memory of out being in a different cache page than that of values and indices.

You could do it recursively, I guess - something like this (unchecked, but it gives the idea):
// Recursive function
template<typename T>
void REORDER(int oldPosition, vector<T>& vA,
const vector<int>& vecNewOrder, vector<bool>& vecVisited)
{
// Keep a record of the value currently in that position,
// as well as the position we're moving it to.
// But don't move it yet, or we'll overwrite whatever's at the next
// position. Instead, we first move what's at the next position.
// To guard against loops, we look at vecVisited, and set it to true
// once we've visited a position.
T oldVal = vA[oldPosition];
int newPos = vecNewOrder[oldPosition];
if (vecVisited[oldPosition])
{
// We've hit a loop. Set it and return.
vA[newPosition] = oldVal;
return;
}
// Guard against loops:
vecVisited[oldPosition] = true;
// Recursively re-order the next item in the sequence.
REORDER(newPos, vA, vecNewOrder, vecVisited);
// And, after we've set this new value,
vA[newPosition] = oldVal;
}
// The "main" function
template<typename T>
void REORDER(vector<T>& vA, const vector<int>& newOrder)
{
// Initialise vecVisited with false values
vector<bool> vecVisited(vA.size(), false);
for (int x = 0; x < vA.size(); x++)
{
REORDER(x, vA, newOrder, vecVisited);
}
}
Of course, you do have the overhead of vecVisited. Thoughts on this approach, anyone?

To iterate through the vector is O(n) operation. Its sorta hard to beat that.

Your code is broken. You cannot assign to vA and you need to use template parameters.
vector<char> REORDER(const vector<char>& vA, const vector<size_t>& vOrder)
{
assert(vA.size() == vOrder.size());
vector<char> vCopy(vA.size());
for(int i = 0; i < vOrder.size(); ++i)
vCopy[i] = vA[ vOrder[i] ];
return vA;
}
The above is slightly more efficient.

It is not clear by the title and the question if the vector should be ordered with the same steps it takes to order vOrder or if vOrder already contains the indexes of the desired order.
The first interpretation has already a satisfying answer (see chmike and Potatoswatter), I add some thoughts about the latter.
If the creation and/or copy cost of object T is relevant
template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> & order )
{
std::size_t i,j,k;
for(i = 0; i < order.size() - 1; ++i) {
j = order[i];
if(j != i) {
for(k = i + 1; order[k] != i; ++k);
std::swap(order[i],order[k]);
std::swap(data[i],data[j]);
}
}
}
If the creation cost of your object is small and memory is not a concern (see dribeas):
template <typename T>
void reorder( std::vector<T> & data, std::vector<std::size_t> const & order )
{
std::vector<T> tmp; // create an empty vector
tmp.reserve( data.size() ); // ensure memory and avoid moves in the vector
for ( std::size_t i = 0; i < order.size(); ++i ) {
tmp.push_back( data[order[i]] );
}
data.swap( tmp ); // swap vector contents
}
Note that the two pieces of code in dribeas answer do different things.

I was trying to use #Potatoswatter's solution to sort multiple vectors by a third one and got really confused by output from using the above functions on a vector of indices output from Armadillo's sort_index. To switch from a vector output from sort_index (the arma_inds vector below) to one that can be used with #Potatoswatter's solution (new_inds below), you can do the following:
vector<int> new_inds(arma_inds.size());
for (int i = 0; i < new_inds.size(); i++) new_inds[arma_inds[i]] = i;

I came up with this solution which has the space complexity of O(max_val - min_val + 1), but it can be integrated with std::sort and benefits from std::sort's O(n log n) decent time complexity.
std::vector<int32_t> dense_vec = {1, 2, 3};
std::vector<int32_t> order = {1, 0, 2};
int32_t max_val = *std::max_element(dense_vec.begin(), dense_vec.end());
std::vector<int32_t> sparse_vec(max_val + 1);
int32_t i = 0;
for(int32_t j: dense_vec)
{
sparse_vec[j] = order[i];
i++;
}
std::sort(dense_vec.begin(), dense_vec.end(),
[&sparse_vec](int32_t i1, int32_t i2) {return sparse_vec[i1] < sparse_vec[i2];});
The following assumptions made while writing this code:
Vector values start from zero.
Vector does not contain repeated values.
We have enough memory to sacrifice in order to use std::sort

This should avoid copying the vector:
void REORDER(vector<char>& vA, const vector<size_t>& vOrder)
{
assert(vA.size() == vOrder.size());
for(int i = 0; i < vOrder.size(); ++i)
if (i < vOrder[i])
swap(vA[i], vA[vOrder[i]]);
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js