break sub task of parallel_for_each - c++

I have a big vector of items that are sorted based on one of their fields, e.g. a cost attribute, and I want to do a bit of processing on each of these items to find the maximum value of a different attribute... The constraint here is that we cannot use an item to calculate a maximum value if that item's cost exceeds some arbitrary price.
The single threaded for-loop looks like this:
auto maxValue = -MAX_FLT;
for(const auto& foo: foos) {
// Break if the cost is too high.
if(foo.cost() > 46290) {
break;
}
maxValue = max(maxValue , foo.value());
}
I've been able to somewhat convert this into a parallel_for_each. (Disclaimer: I'm new to PPL.)
combinable<float> localMaxValue([]{ return -MAX_FLT; });
parallel_for_each(begin(foos), end(foos), [&](const auto& foo) {
// Attempt to early out if the cost is too high.
if(foo.getCost() > 46290) {
return;
}
localMaxValue.local() = max(localMaxValue.local(), foo.getValue());
}
auto maxValue = localMaxValue.combine(
[](const auto& first, const auto& second) {
return max<float>(first, second);
});
The return statement inside the parallel_for feels inefficient since it's still executing over every item, and in this case, it's quite possible that the parallel_for could end up iterating over multiple portions of the vector that are costed too high.
How can I take advantage of the fact that the vector is already sorted by cost?
I looked into using a cancellation token, but that approach seems incorrect as it would cause all sub tasks of the parallel_for to be cancelled which means I may get the wrong maximum value.
Is there something like a cancellation token that could cancel that specific sub task of the parallel_for, or is there a better tool than the parallel_for in this case?

If the vector is sorted by cost then you can iterate over only the items whose cost is lower then the cost limit.
If the cost is x.
find the first item iterator which is equal or larger than x.
you can use std::lower_bound.
then you use your parallel_for_each from the beginning of the vector to the iterator you found.
combinable<float> localMaxValue([]{ return -MAX_FLT; });
//I'm assuming foos is std::vector.
int cost_limit = 46290;
auto it_end = std::lower_bound(foos.begin(), foos.end(), cost_limit, [](const auto& foo, int cost_limit)
{
return foo.getCost() < cost_limit;
});
parallel_for_each(foos.begin(), foos.end(), [&](const auto& foo) {
localMaxValue.local() = max(localMaxValue.local(), foo.getValue());
}
auto maxValue = localMaxValue.combine(
[](const auto& first, const auto& second) {
return max<float>(first, second);
});

Related

If find_if() takes too long, are there alternatives that can be used for better program performance?

I'm working on a D* Lite path planner in C++. The program maintains a priority queue of cells (U), each cell have two cost values, and a key can be calculated for a cell which determine it's order on the priority queue.
using Cost = float;
using HeapKey = pair<Cost, Cost>;
using KeyCompare = std::greater<std::pair<HeapKey, unsigned int>>;
vector<pair<HeapKey, unsigned int>> U;
When a cell is added it is done so by using:
U.push_back({ k, id });
push_heap(U.begin(), U.end(), KeyCompare());
As part of the path planning algorithm cells sometimes need to be removed, and here lies the current problem as far as I can see. I recently had help on this site to speed my program up quite a bit by using push_heap instead of make_heap, but now it seems that the part of the program that removes cells is the slowest part. Cells are removed from the priority queue by:
void DstarPlanner::updateVertex(unsigned int id) {
...
...
auto it = find_if(U.begin(), U.end(), [=](auto p) { return p.second == id; });
U.erase(it);
...
...
}
From my tests this seems to take roughly 80% of the time my program use for path planning. It was my hope coming here that a more time-saving method existed.
Thank you.
EDIT - Extra information.
void DstarPlanner::insertHeap(unsigned int id, HeapKey k) {
U.push_back({ k, id });
push_heap(U.begin(), U.end(), KeyCompare());
in_U[id]++;
}
void DstarPlanner::updateVertex(unsigned int id) {
Cell* u = graph.getCell(id);
if (u->id != id_goal) {
Cost mincost = infinity;
for (auto s : u->neighbors) {
mincost = min(mincost, graph.getEdgeCost(u->id, s->id) + s->g);
}
u->rhs = mincost;
}
if (in_U[id]) {
auto it = find_if(U.begin(), U.end(), [=](auto p) { return p.second == id; });
U.erase(it);
in_U[id]--;
}
if (u->g != u->rhs) {
insertHeap(id, u->calculateKey());
}
}
vector<int> DstarPlanner::ComputeShortestPath() {
vector<int> bestPath;
vector<int> emptyPath;
Cell* n = graph.getCell(id_start);
while (U.front().first < n->calculateKey() || n->rhs != n->g) {
auto uid = U.front().second;
Cell* u = graph.getCell(uid);
auto kold = U.front().first;
pop_heap(U.begin(), U.end(), KeyCompare());
U.pop_back();
in_U[u->id]--;
if (kold < u->calculateKey()) {
insertHeap(u->id, u->calculateKey());
} else if (u->g > u->rhs) {
u->g = u->rhs;
for (auto s : u->neighbors) {
if (!occupied(s->id)) {
updateVertex(s->id);
}
}
} else {
u->g = infinity;
for (auto s : u->neighbors) {
if (!occupied(s->id)) {
updateVertex(s->id);
}
}
updateVertex(u->id);
}
}
bestPath=constructPath();
return bestPath;
}
find_if does a linear search. It maybe faster to use:
std::map/std::set -> Standard binary search tree implementations
std::unordered_map/std::unordered_set -> Standard hash table implementations
These may use a lot of memory if your elements (key-value pairs) are small integers. To avoid that you can use 3rd party alternatives like boost::unordered_flat_map.
How do you re-heapify after U.erase(it)? Do you ever delete multiple nodes at once?
If deletions need to be atomic between searches, then you can
swap it with end() - 1,
erase end() - 1, and
re-heapify.
Erasing end() - 1 is O(1) while erasing it is linear in std::distance(it, end).
void DstarPlanner::updateVertex(unsigned int id) {
...
// take the id by reference since this is synchronous
auto it = find_if(U.begin(), U.end(), [&](const auto& p) { return p.second == id; });
*it = std::move(*(U.end() - 1));
U.erase((U.end() - 1));
std::make_heap(U.begin(), U.end()); // expensive!!! 3*distance(begin, end)
...
}
If you can delete multiple nodes between searches, then you can use a combination of erase + remove_if to only perform one mass re-heapify. This is important be heapify is expensive.
it = remove_if(begin, end, [](){ lambda }
erase(it, end)
re-heapify
void DstarPlanner::updateVertex(const std::vector<unsigned int>& sorted_ids) {
...
auto it = remove_if(U.begin(), U.end(), [&](const auto& p) { return std::binary_search(ids.begin(), ids.end(), p.second); });
U.erase(it, U.end());
std::make_heap(U.begin(), U.end()); // expensive!!! 3*distance(begin, end)
...
}
Doing better
You can possibly improve on this by replacing std::make_heap (which makes no assumptions about the heapiness of [begin(), end()) with a custom method that re-heapifies a former heap around "poison points" -- it only needs to initially inspect the elements around the elements that were swapped. This sounds like a pain to write and I'd only do it if the resulting program was still too slow.
Have you thought of...
Just not even removing elements from the heap? The fact you're using a heap tells me that the algorithm designers suggested a heap. If they suggested a heap, then they likely didn't envision random removals. This is speculation on my part. I'm otherwise not familiar with D* lite.

C++ Simplify loop over map and extending/overwriting of vector

Given
std::vector<int> vec1 of size s_vec and capacity c.
std::vector<int> vec2.
std::map<int, int> m of size s_m >= s_vec.
std::unordered_set<int> flags.
bool flag = False
I want to copy as many values of m (in order) into vec1 (overwriting previous values) without exceeding the capacity c. If any values remain I want to push those values to the end of vec2. For each of these, values I want to check if they are in flags. If they are, I'd like to set flag to true.
This is how I currently, achieve this:
int i = 0;
for (auto const& e : m) {
if(i < c) {
if(i == vec1.size()) {
vec1.push_back(e.second);
} else {
vec1.at(i) = e.second;
}
} else {
vec2.push_back(e.second);
if(flags.count(e.second)){
flag = true;
}
}
}
I am new to C++ coming from python and R. Therefore, I assume that this can be simplified quite a bit (with iterators?). What can I do to improve the code here?
Your code must increment i at the end of each loop for it to work.
If you can use c++20 and its ranges, I would probably rewrite it completely, to something like:
using namespace std::views; // for simplicity here
std::ranges::copy(m | take(c) | values, vec1.begin());
std::ranges::copy(m | drop(c) | values, std::back_inserter(vec2));
flag = std::ranges::any_of(vec2, [&flags](int i){return flags.contains(i);});
The beauty of this, is that it matches your requirements much better.
The first lines does: "I want to copy as many values of m (in order) into vec1 (overwriting previous values) without exceeding the capacity c."
The second line does: "If any values remain I want to push those values to the end of vec2."
The third line does: "For each of these, values I want to check if they are in flags. If they are, I'd like to set flag to true."
Building on the comments of #PaulMcKenzie and the answers provided by #Nelfeal and #cptFracassa, this is what I ended up with.
size_t new_size = std::min(vec1.capacity(), m.size());
vec1.resize(new_size);
std::transform(m.begin(),
std::next(m.begin(), new_size),
vec1.begin(),
[](std::pair<int, int> p) { return p.second; });
std::transform(std::next(m.begin(), new_size),
m.end(),
std::back_inserter(vec2),
[&flags, &flag](std::pair<int, int> p) {
if(flags.count(p.second)) {
flag = true;
}
return p.second;
});
In the first part, instead of doing either push_back or assignment to at, you can just clear the vector and push_back everything. clear does not change the capacity.
Your loop is doing two different things, one after the other (and by the way, I assume you forgot to increment i). You should split it into two loops.
With all that, your code becomes:
vec1.clear();
auto it = m.begin();
for (int i = 0; i < c; ++i) {
vec1.push_back(it->second);
++it;
}
while (it != m.end()) {
vec2.push_back(it->second);
if(flags.count(it->second)){
flag = true;
}
++it;
}
At this point, you can also use standard algorithms (std::copy, std::transform as mentioned in the comments).

How does std::remove perform compared to std::find for std::vector?

Background
I recently refactored some code that placed actors in a world. One actor per room.
My original implementation used a closed set std::vector to keep track of indexes of rooms that were no longer available:
void RoomsAndCorridorsMapGenerator::PlaceActors() noexcept {
const int room_count = static_cast<int>(rooms.size());
auto closed_set = std::vector<std::size_t>{};
closed_set.reserve(room_count);
for(auto* actor : _map->_actors) {
const auto room_idx = [&]() {
auto idx = static_cast<std::size_t>(MathUtils::GetRandomIntLessThan(room_count));
while(std::find(std::begin(closed_set), std::end(closed_set), idx) != std::end(closed_set)) {
idx = static_cast<std::size_t>(MathUtils::GetRandomIntLessThan(room_count));
}
closed_set.push_back(idx);
return idx;
}(); //IIIL
actor->SetPosition(IntVector2{rooms[room_idx].CalcCenter()});
}
}
That is:
Generate a random index.
Search the vector for the index generated.
If found, go to step 1.
Add the newly generated room index to the closed set.
For each actor, continue from step 1.
My refactored implementation used an open set:
void RoomsAndCorridorsMapGenerator::PlaceActors() noexcept {
auto open_set = std::vector<std::size_t>{};
open_set.resize(rooms.size());
std::iota(std::begin(open_set), std::end(open_set), std::size_t{0u});
for(auto* actor : _map->_actors) {
const auto room_idx = [&]() {
auto idx = open_set[static_cast<std::size_t>(MathUtils::GetRandomIntLessThan(static_cast<int>(open_set.size())))];
open_set.erase(std::remove(std::begin(open_set), std::end(open_set), idx), std::end(open_set));
return idx;
}(); //IIIL
actor->SetPosition(IntVector2{rooms[room_idx].CalcCenter()});
}
}
i.e.:
Pre-generate the list of available indexes, based on the number of rooms.
Pick a random value from the list.
Remove that value from the list.
For each actor, continue from step 2.
This simplified the code and the algorithm, but I'm not sure of the complexity.
Question
How does std::find to a vector compare to std::remove both in time and complexity?
I'm not sure if the repeated appending/erase-remove idiom also has an effect.

Optimized argmin: an effective way to find an item minimizing a function

Let us say I've got a collection of items and a score function on them:
struct Item { /* some data */ };
std::vector<Item> items;
double score(Item);
I'd like to find the item from that collection whose score is the lowest. An easy way to write this is:
const auto argmin = std::min_element(begin(items), end(items), [](Item a, Item b) {
return score(a) < score(b);
});
But if score is a heavy-to-compute function, the fact that std::min_element actually calls it multiple times on some items may be worrying. And this is expected because the compiler cannot guess score is a pure function.
How could I find argmin but with score being called only once per item? Memoization is one possibility, anything else?
My objective is to write a code snippet which is easy to read, in a dream world as obvious as calling std::min_element on the collection is.
As I commented above, if the vector is not too big, you can use std::transform to store all scores first, then apply std::min_element.
However, if you want to take benefit of "lazy evaluation", and still want to use C++'s STL, there are some tricks to work it out.
The point is std::accumulate can be regarded as a general reduce or fold operation (like foldl in haskell). With C++17's syntax sugar for std::tuple, we can write something like:
auto [min_ind, _, min_value] = std::accumulate(items.begin(), items.end(),
std::make_tuple(-1LU, 0LU, std::numeric_limits<double>::max()),
[] (std::tuple<std::size_t, std::size_t, double> accu, const Item &s) {
// up to this point, the index of min, the current index, and the last minimal value
auto [min_ind, cur_ind, prev_min] = accu;
double r = score(s);
if ( r < prev_min ) {
return std::make_tuple(cur_ind, cur_ind + 1, r);
} else {
return std::make_tuple(min_ind, cur_ind + 1, prev_min);
}
});
Here's a function that does what you want--even going beyond the intuitive "call score exactly once per element" by realizing that there's nothing smaller than negative infinity!
const Item* smallest(const std::vector<Item>& items)
{
double min_score = items.empty() ? NAN : INFINITY;
const Item* min_item = items.empty() ? nullptr : &*begin(items);
for (const auto& item : items) {
double item_score = score(item);
if (item_score < min_score) {
min_score = item_score;
min_item = &item;
if (item_score == -INFINITY) {
break;
}
}
}
return min_item;
}
As suggested bu user #liliscent, one could:
generate a collection of precalculated scores,
find the minimum score from it,
and infer the position of the minimizing item from the position of the minimum score.
This is my reading of their suggestion:
template<class InputIt, class Scoring>
auto argmin(InputIt first, InputIt last, Scoring scoring)
{
using score_type = typename std::result_of_t<Scoring(typename std::iterator_traits<InputIt>::value_type)>;
std::vector<score_type> scores(std::distance(first, last));
std::transform(first, last, begin(scores), scoring);
const auto scoremin = std::min_element(begin(scores), end(scores));
return first + std::distance(begin(scores), scoremin);
}
With a live demo.

Combining arrays/lists in an specific fashion

I'm trying to find a sensible algorithm to combine multiple lists/vectors/arrays as defined below.
Each element contains a float declaring the start of its range of validity and a constant that is used over this range. Where ranges from different lists overlap their constants need to be added to produce one global list.
I've done an attempt at an illustration below to try and give a good idea of what I mean:
First List:
0.5---------------2------------3.2--------4
a1 a2 a3
Second List:
1----------2----------3---------------4.5
b1 b2 b3
Desired Output:
0.5----1----------2----------3-3.2--------4--4.5
a1 a1+b1 a2+b2 ^ a3+b3 b3
b3+a2
I can't think of a sensible way of going about this in the case of n lists; Just 2 is quite easy to brute force.
Any hints or ideas would be welcome. Each list is represented as a C++ std::vector (so feel free to use standard algorithms) and are sorted by start of range value.
Cheers!
Edit: Thanks for the advice, I've come up with a naive implementation, not sure why I couldn't get here on my own first. To my mind the obvious improvement would be to store an iterator for each vector since they're already sorted and not have to re-traverse each vector for each point. Given that most vectors will contain less than 100 elements, but there may be many vectors this may or may not be worthwhile. I'd have to profile to see.
Any thoughts on this?
#include <vector>
#include <iostream>
struct DataType
{
double intervalStart;
int data;
// More data here, the data is not just a single int, but that
// works for our demonstration
};
int main(void)
{
// The final "data" of each vector is meaningless as it refers to
// the coming range which won't be used as this is only for
// bounded ranges
std::vector<std::vector<DataType> > input = {{{0.5, 1}, {2.0, 3}, {3.2, 3}, {4.0, 4}},
{{1.0, 5}, {2.0, 6}, {3.0, 7}, {4.5, 8}},
{{-34.7895, 15}, {-6.0, -2}, {1.867, 5}, {340, 7}}};
// Setup output vector
std::vector<DataType> output;
std::size_t inputSize = 0;
for (const auto& internalVec : input)
inputSize += internalVec.size();
output.reserve(inputSize);
// Fill output vector
for (const auto& internalVec : input)
std::copy(internalVec.begin(), internalVec.end(), std::back_inserter(output));
// Sort output vector by intervalStartPoints
std::sort(output.begin(), output.end(),
[](const DataType& data1, const DataType& data2)
{
return data1.intervalStart < data2.intervalStart;
});
// Remove DataTypes with same intervalStart - each interval can only start once
output.erase(std::unique(output.begin(), output.end(),
[](const DataType& dt1, const DataType& dt2)
{
return dt1.intervalStart == dt2.intervalStart;
}), output.end());
// Output now contains all the right intersections, just not with the right data
// Lambda to find the associated data value associated with an
// intervsalStart value in a vector
auto FindDataValue = [&](const std::vector<DataType> v, double startValue)
{
auto iter = std::find_if(v.begin(), v.end(), [startValue](const DataType& data)
{
return data.intervalStart > startValue;
});
if (iter == v.begin() || iter == v.end())
{
return 0;
}
return (iter-1)->data;
};
// For each interval in the output traverse the input and sum the
// data constants
for (auto& val : output)
{
int sectionData = 0;
for (const auto& iv : input)
sectionData += FindDataValue(iv, val.intervalStart);
val.data = sectionData;
}
for (const auto& i : output)
std::cout << "loc: " << i.intervalStart << " data: " << i.data << std::endl;
return 0;
}
Edit2: #Stas's code is a very good way to approach this problem. I've just tested it on all the edge cases I could think of.
Here's my merge_intervals implementation in case anyone is interested. The only slight change I've had to make to the snippets Stas provided is:
for (auto& v : input)
v.back().data = 0;
Before combining the vectors as suggested. Thanks!
template<class It1, class It2, class OutputIt>
OutputIt merge_intervals(It1 first1, It1 last1,
It2 first2, It2 last2,
OutputIt destBegin)
{
const auto begin1 = first1;
const auto begin2 = first2;
auto CombineData = [](const DataType& d1, const DataType& d2)
{
return DataType{d1.intervalStart, (d1.data+d2.data)};
};
for (; first1 != last1; ++destBegin)
{
if (first2 == last2)
{
return std::copy(first1, last1, destBegin);
}
if (first1->intervalStart == first2->intervalStart)
{
*destBegin = CombineData(*first1, *first2);
++first1; ++first2;
}
else if (first1->intervalStart < first2->intervalStart)
{
if (first2 > begin2)
*destBegin = CombineData(*first1, *(first2-1));
else
*destBegin = *first1;
++first1;
}
else
{
if (first1 > begin1)
*destBegin = CombineData(*first2, *(first1-1));
else
*destBegin = *first2;
++first2;
}
}
return std::copy(first2, last2, destBegin);
}
Unfortunately, your algorithm is inherently slow. It doesn't make sense to profile or apply some C++ specific tweaks, it won't help. It will never stop calculation on pretty small sets like merging 1000 lists of 10000 elements each.
Let's try to evaluate time complexity of your algo. For the sake of simplicity, let's merge only lists of the same length.
L - length of a list
N - number of lists to be merged
T = L * N - length of a whole concatenated list
Complexity of your algorithm steps:
create output vector - O(T)
sort output vector - O(T*log(T))
filter output vector - O(T)
fix data in output vector - O(T*T)
See, the last step defines the whole algorithm complexity: O(T*T) = O(L^2*N^2). It is not acceptable for practical application. See, to merge 1000 lists of 10000 elements each, the algorithm should run 10^14 cycles.
Actually, the task is pretty complex, so do not try to solve it in one step. Divide and conquer!
Write an algorithm that merges two lists into one
Use it to merge a list of lists
Merging two lists into one
This is relatively easy to implement (but be careful with corner cases). The algorithm should have linear time complexity: O(2*L). Take a look at how std::merge is implemented. You just need to write your custom variant of std::merge, let's call it merge_intervals.
Applying a merge algorithm to a list of lists
This is a little bit tricky, but again, divide and conquer! The idea is to do recursive merge: split a list of lists on two halves and merge them.
template<class It, class Combine>
auto merge_n(It first, It last, Combine comb)
-> typename std::remove_reference<decltype(*first)>::type
{
if (first == last)
throw std::invalid_argument("Empty range");
auto count = std::distance(first, last);
if (count == 1)
return *first;
auto it = first;
std::advance(it, count / 2);
auto left = merge_n(first, it, comb);
auto right = merge_n(it, last, comb);
return comb(left, right);
}
Usage:
auto combine = [](const std::vector<DataType>& a, const std::vector<DataType>& b)
{
std::vector<DataType> result;
merge_intervals(a.begin(), a.end(), b.begin(), b.end(),
std::back_inserter(result));
return result;
};
auto output = merge_n(input.begin(), input.end(), combine);
The nice property of such recursive approach is a time complexity: it is O(L*N*log(N)) for the whole algorithm. So, to merge 1000 lists of 10000 elements each, the algorithm should run 10000 * 1000 * 9.966 = 99,660,000 cycles. It is 1,000,000 times faster than original algorithm.
Moreover, such algorithm is inherently parallelizable. It is not a big deal to write parallel version of merge_n and run it on thread pool.
I know I'm a bit late to the party, but when I started writing this you hadn't a suitable answer yet, and my solution should have a relatively good time complexity, so here you go:
I think the most straightforward way to approach this is to see each of your sorted lists as a stream of events: At a given time, the value (of that stream) changes to a new value:
template<typename T>
struct Point {
using value_type = T;
float time;
T value;
};
You want to superimpose those streams into a single stream (i.e. having their values summed up at any given point). For that you take the earliest event from all streams, and apply its effect on the result stream. Therefore, you need to first "undo" the effect that the previous value from that stream made on the result stream, and then add the new value to the current value of the result stream.
To be able to do that, you need to remember for each stream the last value, the next value (and when the stream is empty):
std::vector<std::tuple<Value, StreamIterator, StreamIterator>> streams;
The first element of the tuple is the last effect of that stream onto the result stream, the second is an iterator pointing to the streams next event, and the last is the end iterator of that stream:
transform(from, to, inserter(streams, begin(streams)),
[] (auto & stream) {
return make_tuple(static_cast<Value>(0), begin(stream), end(stream));
});
To be able to always get the earliest event of all the streams, it helps to keep the (information about the) streams in a (min) heap, where the top element is the stream with the next (earliest) event. That's the purpose of the following comparator:
auto heap_compare = [] (auto const & lhs, auto const & rhs) {
bool less = (*get<1>(lhs)).time < (*get<1>(rhs)).time;
return (not less);
};
Then, as long as there are still some events (i.e. some stream that is not empty), first (re)build the heap, take the top element and apply its next event to the result stream, and then remove that element from the stream. Finally, if the stream is now empty, remove it.
// The current value of the result stream.
Value current = 0;
while (streams.size() > 0) {
// Reorder the stream information to get the one with the earliest next
// value into top ...
make_heap(begin(streams), end(streams), heap_compare);
// .. and select it.
auto & earliest = streams[0];
// New value is the current one, minus the previous effect of the selected
// stream plus the new value from the selected stream
current = current - get<0>(earliest) + (*get<1>(earliest)).value;
// Store the new time point with the new value and the time of the used
// time point from the selected stream
*out++ = Point<Value>{(*get<1>(earliest)).time, current};
// Update the effect of the selected stream
get<0>(earliest) = (*get<1>(earliest)).value;
// Advance selected stream to its next time point
++(get<1>(earliest));
// Remove stream if empty
if (get<1>(earliest) == get<2>(earliest)) {
swap(streams[0], streams[streams.size() - 1u]);
streams.pop_back();
}
}
This will return a stream where there might be multiple points with the same time, but a different value. This occurs when there are multiple "events" at the same time. If you only want the last value, i.e. the value after all these events happened, then one needs to combine them:
merge_point_lists(begin(input), end(input), inserter(merged, begin(merged)));
// returns points with the same time, but with different values. remove these
// duplicates, by first making them REALLY equal, i.e. setting their values
// to the last value ...
for (auto write = begin(merged), read = begin(merged), stop = end(merged);
write != stop;) {
for (++read; (read != stop) and (read->time == write->time); ++read) {
write->value = read->value;
}
for (auto const cached = (write++)->value; write != read; ++write) {
write->value = cached;
}
}
// ... and then removing them.
merged.erase(
unique(begin(merged), end(merged),
[](auto const & lhs, auto const & rhs) {
return (lhs.time == rhs.time);}),
end(merged));
(Live example here)
Concerning the time complexity: This is iterating over all "events", so it depends on the number of events e. The very first make_heap call has to built a complete new heap, this has worst case complexity of 3 * s where s is the number of streams the function has to merge. On subsequent calls, make_heap only has to correct the very first element, this has worst case complexity of log(s'). I write s' because the number of streams (that need to be considered) will decrease to zero. This
gives
3s + (e-1) * log(s')
as complexity. Assuming the worst case, where s' decreases slowly (this happens when the events are evenly distributed across the streams, i.e. all streams have the same number of events:
3s + (e - 1 - s) * log(s) + (sum (log(i)) i = i to s)
Do you really need a data structure as result? I don't think so. Actually you're defining several functions that can be added. The examples you give are encoded using a 'start, value(, implicit end)' tuple. The basic building block is a function that looks up it's value at a certain point:
double valueAt(const vector<edge> &starts, float point) {
auto it = std::adjacent_find(begin(starts), end(starts),
[&](edge e1, edge e2) {
return e1.x <= point && point < e2.x;
});
return it->second;
};
The function value for a point is the sum of the function values for all code-series.
If you really need a list in the end, you can join and sort all edge.x values for all series, and create the list from that.
Unless performance is an issue :)
If you can combine two of these structures, you can combine many.
First, encapsulate your std::vector into a class. Implement what you know as operator+= (and define operator+ in terms of this if you want). With that in place, you can combine as many as you like, just by repeated addition. You could even use std::accumulate to combine a collection of them.