Related
I'm working on a path planning program where I have a priority queue 'U':
using HeapKey = pair<float, float>;
vector<pair<HeapKey, unsigned int>> U;
I order and maintain my priority queue as a binary min-heap (aka. the cheapest node first in the queue) using greater as my comparison function to get a min-heap (maybe not important). While the program is executing and planning a path it is adding nodes to 'U' with push_back() followed by push_heap() to get that node into the correct order and everything is working fine there...
However, the algorithm I'm using calls for sometimes updating a node already present in 'U' with new values. It does this by removing it from 'U' (I find it with find_if() and remove it with erase(), if that's important) and then call the function to re-insert (again push_back() followed by push_heap()) so the node have its updated values.
This have proved a bit of an unexpected problem for me. I'm no expert at this, but as far as I've been able to think, since the node is removed some place INSIDE 'U' then it messes up the order of the heap. I've been able to get the program to work by using make_heap() after the node is removed. However, this solution brought another issue since the program now takes a lot more time to complete, longer the larger my map/nodes in the heap, presumably because make_heap() is re-organizing/iterating through the entire heap every time I update a node, thus slowing down the overall planning.
Sadly I don't have time to change my program and get new results, I can only make use of simple, easy solutions I can implement fast. I'm mostly here to learn and perhaps see if there are some suggestions I can pass on about how to solve this issue of efficiently maintaining a heap/priority queue when you aren't just removing the first or last elements but elements maybe in the middle. Reducing the time taken to plan is the only thing I am missing.
Attempt at minimal reproducible example without going into the actual algorithm and such:
#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;
using Cost = float;
using HeapKey = pair<Cost, Cost>;
pair<Cost, Cost> PAIR1;
vector<pair<HeapKey, unsigned int>> U;
using KeyCompare = std::greater<std::pair<HeapKey, unsigned int>>;
int in_U[20];
ostream& operator<<(ostream& os, pair<Cost, Cost> const& p) {
return os << "<" << p.first << ", " << p.second << ">";
}
bool has_neightbor(unsigned int id) {
if ( (in_U[id+1]) && (in_U[id-1])) {
return true;
}
return false;
}
void insert(unsigned int id, HeapKey k) {
U.push_back({ k, id });
push_heap(U.begin(), U.end(), KeyCompare());
in_U[id]++;
}
void update(unsigned int id) {
Cost x;
Cost y;
if (id != 21) { //lets say 21 is the goal
x = U[id].first.first;
y = U[id].first.second;
}
if (in_U[id]) {
auto it = find_if(U.begin(), U.end(), [=](auto p) { return p.second == id; });
U.erase(it);
make_heap(U.begin(), U.end(), KeyCompare());
in_U[id]--;
}
int r1 = rand() % 10 + 1;
int r2 = rand() % 10 + 1;
if (x != y) {
insert(id, {x + r1, y + r2});
}
}
int main() {
U.push_back({ {8, 2}, 1 });
in_U[1]++;
U.push_back({ {5, 1}, 2 });
in_U[2]++;
U.push_back({ {6, 1}, 3 });
in_U[3]++;
U.push_back({ {6, 5}, 4 });
in_U[4]++;
U.push_back({ {2, 3}, 5 });
in_U[5]++;
U.push_back({ {2, 9}, 6 });
in_U[6]++;
U.push_back({ {9, 2}, 7 });
in_U[7]++;
U.push_back({ {4, 7}, 8 });
in_U[8]++;
U.push_back({ {11, 4}, 9 });
in_U[9]++;
U.push_back({ {2, 2}, 10 });
in_U[10]++;
U.push_back({ {1, 2}, 11 });
in_U[11]++;
U.push_back({ {7, 2}, 12 });
in_U[12]++;
make_heap(U.begin(), U.end(), KeyCompare());
PAIR1.first = 14;
PAIR1.second = 6;
while (U.front().first < PAIR1) {
cout << "Is_heap?: " << is_heap(U.begin(), U.end(), KeyCompare()) << endl;
cout << "U: ";
for (auto p : U) {
cout << p.second << p.first << " - ";
}
cout << endl;
auto uid = U.front().second;
pop_heap(U.begin(), U.end(), KeyCompare());
U.pop_back();
if (has_neightbor(uid)) {
update(uid - 1);
update(uid + 1);
}
}
//getchar();
}
Yes, the algorithm is relatively simple. Note that when considering an item at index i, it's "parent" in a heap is at index (i-1)/2, and it's children are at indecies i*2+1 and i*2+2.
Swap item_to_pop for the last item in the range. This moves that item to the desired (last) position, but inserts a "small" item in the middle of the heap. This needs to be fixed.
If the "small" item at item_to_pop position is larger than it's current parent, then swap with it's parent. Repeat until that item is either no longer larger than it's current parent or is the new root. Then we're done. Notably, this is the same algorithm as push_heap, except with the shortcut that we start in the middle instead of at the end.
If the "small" item at item_to_pop position is smaller than either current child, then swap with the larger child. Repeat until that item is larger than any of its current children (note that near the end it might only have one or no children). Then we're done. Notably, this is the same algorithm as pop_heap, except with the shortcut that we start in the middle instead of at the top.
This algorithm will do at most log2(n)+1 swaps, and log2(n)*2+1 comparisons, making it almost as fast as pop_heap and push_heap. Which isn't really surprising since it's the same algorithm.
template< class RandomIt, class Compare >
constexpr void pop_mid_heap(RandomIt first, RandomIt last, RandomIt item_to_pop, Compare comp) {
assert(std::is_heap(first, last)); //this is compiled out of release builds
assert(first<=item_to_pop);
assert(item_to_pop<last);
using std::swap;
std::size_t new_size = last - first - 1;
if (new_size == 0) return;
//swap the end of the range and item_to_pop, so that item_to_pop is at the end
swap(*item_to_pop, *--last);
if (new_size == 1) return;
//If item_to_pop is bigger than it's parent, then swap with the parent
bool moved_up = false;
RandomIt swap_itr;
while (true) {
std::size_t offset = item_to_pop - first;
if (offset == 0) break; //item_to_pop is at root: exit loop
swap_itr = first + (offset-1) / 2;
if (comp(*item_to_pop, *swap_itr))
break; //item_to_pop smaller than it's parent: exit loop
swap(*item_to_pop, *swap_itr); //swap with parent and repeat
item_to_pop = swap_itr;
moved_up = true;
}
if (moved_up) return; //if we moved the item up, then heap is complete: exit
//If biggest child is bigger than item_to_pop, then swap with that child
while (true) {
std::size_t offset = item_to_pop - first;
std::size_t swap_idx = offset * 2 + 1;
if (swap_idx >= new_size) break; //no children: exit loop
swap_itr = first + swap_idx;
if (swap_idx+1 < new_size && comp(*swap_itr, *(swap_itr+1))) //if right child exists and is bigger, swap that instead
++swap_itr;
if (!comp(item_to_pop, swap_itr)) break; //item_to_pop bigger than biggest child: exit loop
swap(*item_to_pop, *swap_itr); //swap with bigger child and repeat
item_to_pop = swap_itr;
}
}
template< class RandomIt >
constexpr void pop_mid_heap(RandomIt first, RandomIt last, RandomIt item_to_pop) {
pop_mid_heap(first, last, item_to_pop, std::less<>{});
}
https://ideone.com/zNW7h7
Theoretically one can optimize out the "or is the new root" check in the push_heap part, but the checks to detect that case adds complexity that doesn't seem worth it.
IMO, this is useful and should be part of the C++ standard library.
In general it's expensive to update a node in the middle of a binary heap not because the update operation is expensive but because finding the node is an O(n) operation. If you know where the node is in the heap, updating its priority is very easy. My answer at https://stackoverflow.com/a/8706363/56778 shows how to delete a node. Updating a node's priority is similar: rather than replacing the node with the last one in the heap, you just sift the node up or down as required.
If you want the ability to find a node quickly, then you have to build an indexed heap. Basically, you have a dictionary entry for each node. The dictionary key is the node's ID (or whatever you use to identify it), and the value is the node's index in the binary heap. You modify the heap code so that it updates the dictionary entry whenever the node is moved around in the heap. It makes the heap a little bit slower (by a constant factor), but makes finding an arbitrary node an O(1) operation.
Or, you can replace the binary heap with a Pairing Heap, skip list, or any of the other "heap" types that work with node pointers. My experience has been that although the theoretical performance of those two isn't as good as the theoretical performance of Fibonacci heap, the real-world performance is much better.
With either of those it's a whole lot easier to maintain an index: you just add a node reference to it when you add a node to the heap, and remove a reference when the node is removed from the heap. Both of those heap types are easy to build and performance will be about the same as for a binary heap although they will use somewhat more memory. From experience I'll say that Pairing heap is easier to build than skip list, but skip list is a more generally useful data structure.
I have a vector std::vector<inputInfo> inputList and another vector std::vector<int> selection.
inputInfo is a struct that has some information stored.
The vector selection corresponds to positions inside inputList vector.
I need to remove elements from inputList which correspond to entries in the selection vector.
Here's my attempt on this removal algorithm.
Assuming the selection vector is sorted and using some (unavoidable ?) pointer arithmetic, this can be done in one line:
template <class T>
inline void erase_selected(std::vector<T>& v, const std::vector<int>& selection)
{
v.resize(std::distance(
v.begin(),
std::stable_partition(v.begin(), v.end(),
[&selection, &v](const T& item) {
return !std::binary_search(
selection.begin(), selection.end(),
static_cast<int>(static_cast<const T*>(&item) - &v[0]));
})));
}
This is based on an idea of Sean Parent (see this C++ Seasoning video) to use std::stable_partition ("stable" keeps elements sorted in the output array) to move all selected items to the end of an array.
The line with pointer arithmetic
static_cast<int>(static_cast<const T*>(&item) - &v[0])
can, in principle, be replaced with STL algorithms and index-free expression
std::distance(std::find(v.begin(), v.end(), item), std::begin(v))
but this way we have to spend O(n) in std::find.
The shortest way to remove non-contiguous elements:
template <class T> void erase_selected(const std::vector<T>& v, const std::vector<int>& selection)
{
std::vector<int> sorted_sel = selection;
std::sort(sorted_sel.begin(), sorted_sel.end());
// 1) Define checker lambda
// 'filter' is called only once for every element,
// all the calls respect the original order of the array
// We manually keep track of the item which is filtered
// and this way we can look this index in 'sorted_sel' array
int itemIndex = 0;
auto filter = [&itemIndex, &sorted_sel](const T& item) {
return !std::binary_search(
sorted_sel.begin(),
sorted_sel.end(),
itemIndex++);
}
// 2) Move all 'not-selected' to the end
auto end_of_selected = std::stable_partition(
v.begin(),
v.end(),
filter);
// 3) Cut off the end of the std::vector
v.resize(std::distance(v.begin(), end_of_selected));
}
Original code & test
If for some reason the code above does not work due to strangely behaving std::stable_partition(), then below is a workaround (wrapping the input array values with selected flags.
I do not assume that inputInfo structure contains the selected flag, so I wrap all the items in the T_withFlag structure which keeps pointers to original items.
#include <algorithm>
#include <iostream>
#include <vector>
template <class T>
std::vector<T> erase_selected(const std::vector<T>& v, const std::vector<int>& selection)
{
std::vector<int> sorted_sel = selection;
std::sort(sorted_sel.begin(), sorted_sel.end());
// Packed (data+flag) array
struct T_withFlag
{
T_withFlag(const T* ref = nullptr, bool sel = false): src(ref), selected(sel) {}
const T* src;
bool selected;
};
std::vector<T_withFlag> v_with_flags;
// should be like
// { {0, true}, {0, true}, {3, false},
// {0, true}, {2, false}, {4, false},
// {5, false}, {0, true}, {7, false} };
// for the input data in main()
v_with_flags.reserve(v.size());
// No "beautiful" way to iterate a vector
// and keep track of element index
// We need the index to check if it is selected
// The check takes O(log(n)), so the loop is O(n * log(n))
int itemIndex = 0;
for (auto& ii: v)
v_with_flags.emplace_back(
T_withFlag(&ii,
std::binary_search(
sorted_sel.begin(),
sorted_sel.end(),
itemIndex++)
));
// I. (The bulk of ) Removal algorithm
// a) Define checker lambda
auto filter = [](const T_withFlag& ii) { return !ii.selected; };
// b) Move every item marked as 'not-selected'
// to the end of an array
auto end_of_selected = std::stable_partition(
v_with_flags.begin(),
v_with_flags.end(),
filter);
// c) Cut off the end of the std::vector
v_with_flags.resize(
std::distance(v_with_flags.begin(), end_of_selected));
// II. Output
std::vector<T> v_out(v_with_flags.size());
std::transform(
// for C++20 you can parallelize this
// with 'std::execution::par' as first parameter
v_with_flags.begin(),
v_with_flags.end(),
v_out.begin(),
[](const T_withFlag& ii) { return *(ii.src); });
return v_out;
}
The test function is
int main()
{
// Obviously, I do not know the structure
// used by the topic starter,
// so I just declare a small structure for a test
// The 'erase_selected' does not assume
// this structure to be 'light-weight'
struct inputInfo
{
int data;
inputInfo(int v = 0): data(v) {}
};
// Source selection indices
std::vector<int> selection { 0, 1, 3, 7 };
// Source data array
std::vector<inputInfo> v{ 0, 0, 3, 0, 2, 4, 5, 0, 7 };
// Output array
auto v_out = erase_selected(v, selection);
for (auto ii : v_out)
std::cout << ii.data << ' ';
std::cout << std::endl;
}
Say I have a vector of integers like this std::vector<int> _data;
I know that if I want to remove multiple items from _data, then I can simply call
_data.erase( std::remove_if( _data.begin(), _data.end(), [condition] ), _data.end() );
Which is much faster than eraseing multiple elements, as less movement of data is required within the vector. I'm wondering if there's something similar for insertions.
For example, if I have the following pairs
auto pair1 = { _data.begin() + 5, 5 };
auto pair2 = { _data.begin() + 12, 12 };
Can I insert both of these in one iteration using some existing std function? I know I can do something like:
_data.insert( pair2.first, pair2.second );
_data.insert( pair1.first, pair1.second );
But this is (very) slow for large vectors (talking 100,000+ elements).
EDIT: Basically, I have a custom set (and map) which use a vector as the underlying containers. I know I can just use std::set or std::map, but the number of traversals I do far outweighs the insertion/removals. Switching from a set and map to this custom set/map already cut 20% of run-time off. Currently though, insertions take approximately 10% of the remaining run time, so reducing that is important.
The order is also required, unfortunately. As much as possible, I use the unordered_ versions, but in some places the order does matter.
One way is to create another vector with capacity equal to the original size plus the number of the elements being inserted and then do an insert loop with no reallocations, O(N) complexity:
template<class T>
std::vector<T> insert_elements(std::vector<T> const& v, std::initializer_list<std::pair<std::size_t, T>> new_elements) {
std::vector<T> u;
u.reserve(v.size() + new_elements.size());
auto src = v.begin();
size_t copied = 0;
for(auto const& element : new_elements) {
auto to_copy = element.first - copied;
auto src_end = src + to_copy;
u.insert(u.end(), src, src_end);
src = src_end;
copied += to_copy;
u.push_back(element.second);
}
u.insert(u.end(), src, v.end());
return u;
}
int main() {
std::vector<int> v{1, 3, 5};
for(auto e : insert_elements(v, {{1,2}, {2,4}}))
std::cout << e << ' ';
std::cout << '\n';
}
Output:
1 2 3 4 5
Ok, we need some assumptions. Let old_end be a reverse iterator to the last element of your vector. Assume that your _data has been resized to exactly fit both its current content and what you want to insert. Assume that inp is a container of std::pair containing your data to be inserted that is ordered reversely (so first the element that is to be inserted at the hindmost position and so on). Then we can do:
std::merge(old_end, _data.rend(), inp.begin(), inp.end(), data.rend(), [int i = inp.size()-1](const &T t, const &std::pair<Iter, T> p) mutable {
if( std::distance(_data.begin(), p.first) == i ) {
--i;
return false;
}
return true;
}
But I think that is not more clear than using a good old for. The problem with the stl-algorithms is that the predicates work on values and not on iterators thats a bit annoying for this problem.
Here's my take:
template<class Key, class Value>
class LinearSet
{
public:
using Node = std::pair<Key, Value>;
template<class F>
void insert_at_multiple(F&& f)
{
std::queue<Node> queue;
std::size_t index = 0;
for (auto it = _kvps.begin(); it != _kvps.end(); ++it)
{
// The container size is left untouched here, no iterator invalidation.
if (std::optional<Node> toInsert = f(index))
{
queue.push(*it);
*it = std::move(*toInsert);
}
else
{
++index;
// Replace current node with queued one.
if (!queue.empty())
{
queue.push(std::move(*it));
*it = std::move(queue.front());
queue.pop();
}
}
}
// We now have as many displaced items in the queue as were inserted,
// add them to the end.
while (!queue.empty())
{
_kvps.emplace_back(std::move(queue.front()));
queue.pop();
}
}
private:
std::vector<Node> _kvps;
};
https://godbolt.org/z/EStKgQ
This is a linear time algorithm that doesn't need to know the number of inserted elements a priori. For each index, it asks for an element to insert there. If it gets one, it pushes the corresponding existing vector element to a queue and replaces it with the new one. Otherwise, it extracts the current item to the back of the queue and puts the item at the front of the queue into the current position (noop if no elements were inserted yet). Note that the vector size is left untouched during all this. Only at the end do we push back all items still in the queue.
Note that the indices we use for determining inserted item locations here are all pre-insertion. I find this a point of potential confusion (and it is a limitation - you can't add an element at the very end with this algorithm. Could be remedied by calling f during the second loop too, working on that...).
Here's a version that allows inserting arbitrarily many elements at the end (and everywhere else). It passes post-insertion indices to the functor!
template<class F>
void insert_at_multiple(F&& f)
{
std::queue<Node> queue;
std::size_t index = 0;
for (auto it = _kvps.begin(); it != _kvps.end(); ++it)
{
if (std::optional<Node> toInsert = f(index))
queue.push(std::move(*toInsert));
if (!queue.empty())
{
queue.push(std::move(*it));
*it = std::move(queue.front());
queue.pop();
}
++index;
}
// We now have as many displaced items in the queue as were inserted,
// add them to the end.
while (!queue.empty())
{
if (std::optional<Node> toInsert = f(index))
{
queue.push(std::move(*toInsert));
}
_kvps.emplace_back(std::move(queue.front()));
queue.pop();
++index;
}
}
https://godbolt.org/z/DMuCtJ
Again, this leaves potential for confusion over what it means to insert at indices 0 and 1 (do you end up with an original element in between the two? In the first snippet you would, in the second you wouldn't). Can you insert at the same index multiple times? With pre-insertion indices that makes sense, with post-insertion indices it doesn't. You could also write this in terms of passing the current *it (i.e. key value pair) to the functor, but that alone seems not too useful...
This is an attempt I made, which inserts in reverse order. I did get rid of the iterators/indices for this.
template<class T>
void insert( std::vector<T> &vector, const std::vector<T> &values ) {
size_t last_index = vector.size() - 1;
vector.resize( vector.size() + values.size() ); // relies on T being default constructable
size_t move_position = vector.size() - 1;
size_t last_value_index = values.size() - 1;
size_t values_size = values.size();
bool isLastIndex = false;
while ( !isLastIndex && values_size ) {
if ( values[last_value_index] > vector[last_index] ) {
vector[move_position] = std::move( values[last_value_index--] );
--values_size;
} else {
isLastIndex = last_index == 0;
vector[move_position] = std::move( vector[last_index--] );
}
--move_position;
}
if ( isLastIndex && values_size ) {
while ( values_size ) {
vector[move_position--] = std::move( values[last_value_index--] );
--values_size;
}
}
}
Tried with ICC, Clang, and GCC on Godbolt, and vector's insert was faster (for 5 numbers inserted). On my machine, MSVC, same result but less severe. I also compared with Maxim's version from his answer. I realize using Godbolt isn't a good method for comparison, but I don't have access to the 3 other compilers on my current machine.
https://godbolt.org/z/vjV2wA
Results from my machine:
My insert: 659us
Maxim insert: 712us
Vector insert: 315us
Godbolt's ICC
My insert: 470us
Maxim insert: 139us
Vector insert: 127us
Godbolt's GCC
My insert: 815us
Maxim insert: 97us
Vector insert: 97us
Godbolt's Clang:
My insert: 477us
Maxim insert: 188us
Vector insert: 96us
I have an increasing input vector like this {0, 1, 3, 5, 6, 7, 9} and want to cluster the inputs like this {{0, 1}, {3}, {5, 6, 7}, {9}} i.e cluster only the integers that are neighbors. The data structure std::vector<std::vector<int>> solution(const std::vector<int>& input)
I usually advocate for not giving away solutions, but it looks like you're getting bogged down with indices and temporary vectors. Instead, standard iterators and algorithms make this task a breeze:
std::vector<std::vector<int>> solution(std::vector<int> const &input) {
std::vector<std::vector<int>> clusters;
// Special-casing to avoid returning {{}} in case of an empty input
if(input.empty())
return clusters;
// Loop-and-a-half, no condition here
for(auto it = begin(input);;) {
// Find the last element of the current cluster
auto const last = std::adjacent_find(
it, end(input),
[](int a, int b) { return b - a > 1; }
);
if(last == end(input)) {
// We reached the end: register the last cluster and return
clusters.emplace_back(it, last);
return clusters;
}
// One past the end of the current cluster
auto const gap = next(last);
// Register the cluster
clusters.emplace_back(it, gap);
// One past the end of a cluster is the beginning of the next one
it = gap;
}
}
See it live on Coliru (lame output formatting free of charge)
Given:
struct Object {
int id;
...
};
list<Object> objectList;
list<int> idList;
What is the best way to order objectList depending on order of idList?
Example (pseudo code):
INPUT
objectList = {o1, o2, o3};
idList = {2, 3, 1};
ACTION
sort(objectList, idList);
OUTPUT
objectList = {o2, o3, o1};
I searched in documentation but I only found methods to order elements comparing among themselves.
You can store the objects in an std::map, with id as key. Then traverse idList, get the object out of map with its id.
std::map<int, Object> objectMap;
for (auto itr = objectList.begin(); itr != objectList.end(); itr++)
{
objectMap.insert(std::make_pair(itr->id, *itr));
}
std::list<Object> newObjectList;
for (auto itr = idList.begin(); itr != idList.end(); itr++)
{
// here may fail if your idList contains ids which does not appear in objectList
newObjectList.push_back(objectMap[*itr]);
}
// now newObjectList is sorted as order in idList
Here is another variant, which works in O(n log n). This is asymptotcally optimal.
#include <list>
#include <vector>
#include <algorithm>
#include <iostream>
#include <cassert>
int main() {
struct O {
int id;
};
std::list<O> object_list{{1}, {2}, {3}, {4}};
std::list<int> index_list{4, 2, 3, 1};
assert(object_list.size() == index_list.size());
// this vector is optional. It is needed if sizeof(O) is quite large.
std::vector<std::pair<int, O*>> tmp_vector(object_list.size());
// this is O(n)
std::transform(begin(object_list), end(object_list), begin(tmp_vector),
[](auto& o) { return std::make_pair(o.id, &o); });
// this is O(n log n)
std::sort(begin(tmp_vector), end(tmp_vector),
[](const auto& o1, const auto& o2) {
return o1.first < o2.first;
});
// at this point, tmp_vector holds pairs in increasing index order.
// Note that this may not be a contiguous list.
std::list<O> tmp_list(object_list.size());
// this is again O (n log n), because lower_bound is O (n)
// we then insert the objects into a new list (you may also use some
// move semantics here).
std::transform(begin(index_list), end(index_list), begin(tmp_list),
[&tmp_vector](const auto& i) {
return *std::lower_bound(begin(tmp_vector), end(tmp_vector),
std::make_pair(i, nullptr),
[](const auto& o1, const auto& o2) {
return o1.first < o2.first;
})->second;
});
// As we just created a new list, we swap the new list with the old one.
std::swap(object_list, tmp_list);
for (const auto& o : object_list)
std::cout << o.id << std::endl;
}
I assumed that O is quite large and not easily movable. Therefore i first create tmp_vector which only contains of pairs. Then I sort this vector.
Afterwards I can simply go through the index_list and find the matching indices using binary search.
Let me elaborate on why a map is not the best solution eventhough you get a quite small piece of code. If you use a map you need to rebalance your tree after each insertion. This doesn't cost asympatotically (because n times rebalancing costs you the same as sorting once), but the constant is way larger. A "constant map" makes not that much sense (except accessing it may be easier).
I then timed the "simple" map-approach against my "not-so-simple" vector-approach. I created a randomly sorted index_list with N entries. And this is what I get (in us):
N map vector
1000 90 75
10000 1400 940
100000 24500 15000
1000000 660000 250000
NOTE: This test shows the worst case as in my case only index_list was randomly sorted, while the object_list (which is inserted into the map in order) is sorted. So rebalancing shows all its effect. If the object_list is kind of random, performance will behave more similar, eventhough performance will always be worse. The vector list will even behave better when the object list is completely random.
So already with 1000 entries the difference is already quite large. So I would strongly vote for a vector-based approach.
Assuming the data is handled to you externally and you don't have the choice of the containers:
assert( objectList.size() == idList.size() );
std::vector<std::pair<int,Object>> wrapper( idList.size() );
auto idList_it = std::begin( idList );
auto objectList_it = std::begin( objectList );
for( auto& e: wrapper )
e = std::make_pair( *idList_it++, *objectList_it++ );
std::sort(
std::begin(wrapper),
std::end(wrapper),
[]
(const std::pair<int,Object>& a, const std::pair<int,Object>& b) -> bool
{ return a.first<b.first; }
);
Then, copy back to original container.
{
auto objectList_it = std::begin( objectList );
for( const auto& e: wrapper )
*objectList_it++ = e;
}
But this solution is not optimal, I'm sure somebody will come with a better solution.
Edit: The default comparison operator for pairs requires that it is defined both for first and second members. Thus the easiest way is to provide a lambda.
Edit2: for some reason, this doesn't build if using a std::list for the wrapper. But it's ok if you use a std::vector (see here).
std::list has a sort member function you can use with a custom comparison functor.
That custom functor has to look up an object's id in the idList and can then use std::distance to calculate the position of the element in idList. It does so for both objects to be compared and returns true if the first position is smaller than the second.
Here is an example:
#include <iostream>
#include <list>
#include <algorithm>
#include <stdexcept>
struct Object
{
int id;
};
int main()
{
Object o1 = { 1 };
Object o2 = { 2 };
Object o3 = { 3 };
std::list<Object> objectList = { o1, o2, o3 };
std::list<int> const idList = { 2, 3, 1 };
objectList.sort([&](Object const& first, Object const& second)
{
auto const id_find_iter1 = std::find(begin(idList), end(idList), first.id);
auto const id_find_iter2 = std::find(begin(idList), end(idList), second.id);
if (id_find_iter1 == end(idList) || id_find_iter2 == end(idList))
{
throw std::runtime_error("ID not found");
}
auto const pos1 = std::distance(begin(idList), id_find_iter1);
auto const pos2 = std::distance(begin(idList), id_find_iter2);
return pos1 < pos2;
});
for (auto const& object : objectList)
{
std::cout << object.id << '\n';
}
}
It's probably not terribly efficient, but chances are you will never notice. If it still bothers you, you might want to look for a solution with std::vector, which unlike std::list provides random-access iterators. That turns std::distance from O(n) to O(1).
I would find it strange to end up in this situation as I would use the pointers instead of the ids. Though; there might be usecases for this.
Note that in all examples below, I assume that the ids-list contains all ids exactly ones.
Writing it yourself
The issue you like to solve is creating/sorting a list of objects based on the order of the ids in another list.
The naive way of doing this, is simply writing it yourself:
void sortByIdVector(std::list<Object> &list, const std::list<int> &ids)
{
auto oldList = std::move(list);
list = std::list<Object>{};
for (auto id : ids)
{
auto itElement = std::find_if(oldList.begin(), oldList.end(), [id](const Object &obj) { return id == obj.id; });
list.emplace_back(std::move(*itElement));
oldList.erase(itElement);
}
}
If you use a sorted vector as input, you can optimize this code to get the best performance out of it. I'm leaving it up-to you to do so.
Using sort
For this implementation, I'm gonna assume this are std::vector instead of std::list, as this is the better container to request the index of an element. (You can with some more code do the same for list)
size_t getIntendedIndex(const std::vector<int> &ids, const Object &obj)
{
auto itElement = std::find_if(ids.begin(), ids.end(), [obj](int id) { return id == obj.id; });
return itElement - ids.begin();
}
void sortByIdVector(std::list<Object> &list, const std::vector<int> &ids)
{
list.sort([&ids](const Object &lhs, const Object &rhs){ return getIntendedIndex(ids, lhs) < getIntendedIndex(ids, rhs); });
}
Insertion
Another approach, also more suitable for std::vector would be simply inserting the elements at the right place and will be more performant than the std::sort.
void sortByIdVector(std::vector<Object> &list, const std::vector<int> &ids)
{
auto oldList = std::move(list);
list = std::vector<Object>{};
list.resize(oldList.size());
for (Object &obj : oldList)
{
auto &newLocation = list[getIntendedIndex(ids, obj)];
newLocation = std::move(obj);
}
}
objectList.sort([&idList] (const Object& o1, const Object& o2) -> bool
{ return std::find(++std::find(idList.begin(), idList.end(), o1.id),
idList.end(), o2.id)
!= idList.end();
});
The idea is to check if we find o1.id before o2.id in the idList.
We search o1.id, increment the found position then we search o2.id: if found, that implies o1 < o2.
Test
#include <iostream>
#include <string>
#include <list>
#include <algorithm>
struct Object {
int id;
string name;
};
int main()
{
list<Object> objectList {{1, "one_1"}, {2, "two_1"}, {3, "three_1"}, {2, "two_2"}, {1, "one_2"}, {4, "four_1"}, {3, "Three_2"}, {4, "four_2"}};
list<int> idList {3, 2, 4, 1};
objectList.sort([&idList] (const Object& o1, const Object& o2) -> bool
{ return std::find(++std::find(idList.begin(), idList.end(), o1.id), idList.end(), o2.id) != idList.end(); });
for(const auto& o: objectList) cout << o.id << " " << o.name << "\n";
}
/* OUTPUT:
3 three_1
3 Three_2
2 two_1
2 two_2
4 four_1
4 four_2
1 one_1
1 one_2
*/