I'm trying to find a sensible algorithm to combine multiple lists/vectors/arrays as defined below.
Each element contains a float declaring the start of its range of validity and a constant that is used over this range. Where ranges from different lists overlap their constants need to be added to produce one global list.
I've done an attempt at an illustration below to try and give a good idea of what I mean:
First List:
a1 a2 a3
Second List:
b1 b2 b3
Desired Output:
a1 a1+b1 a2+b2 ^ a3+b3 b3
I can't think of a sensible way of going about this in the case of n lists; Just 2 is quite easy to brute force.
Any hints or ideas would be welcome. Each list is represented as a C++ std::vector (so feel free to use standard algorithms) and are sorted by start of range value.
Edit: Thanks for the advice, I've come up with a naive implementation, not sure why I couldn't get here on my own first. To my mind the obvious improvement would be to store an iterator for each vector since they're already sorted and not have to re-traverse each vector for each point. Given that most vectors will contain less than 100 elements, but there may be many vectors this may or may not be worthwhile. I'd have to profile to see.
Any thoughts on this?
#include <vector>
#include <iostream>
struct DataType
double intervalStart;
int data;
// More data here, the data is not just a single int, but that
// works for our demonstration
int main(void)
// The final "data" of each vector is meaningless as it refers to
// the coming range which won't be used as this is only for
// bounded ranges
std::vector<std::vector<DataType> > input = {{{0.5, 1}, {2.0, 3}, {3.2, 3}, {4.0, 4}},
{{1.0, 5}, {2.0, 6}, {3.0, 7}, {4.5, 8}},
{{-34.7895, 15}, {-6.0, -2}, {1.867, 5}, {340, 7}}};
// Setup output vector
std::vector<DataType> output;
std::size_t inputSize = 0;
for (const auto& internalVec : input)
inputSize += internalVec.size();
// Fill output vector
for (const auto& internalVec : input)
std::copy(internalVec.begin(), internalVec.end(), std::back_inserter(output));
// Sort output vector by intervalStartPoints
std::sort(output.begin(), output.end(),
[](const DataType& data1, const DataType& data2)
return data1.intervalStart < data2.intervalStart;
// Remove DataTypes with same intervalStart - each interval can only start once
output.erase(std::unique(output.begin(), output.end(),
[](const DataType& dt1, const DataType& dt2)
return dt1.intervalStart == dt2.intervalStart;
}), output.end());
// Output now contains all the right intersections, just not with the right data
// Lambda to find the associated data value associated with an
// intervsalStart value in a vector
auto FindDataValue = [&](const std::vector<DataType> v, double startValue)
auto iter = std::find_if(v.begin(), v.end(), [startValue](const DataType& data)
return data.intervalStart > startValue;
if (iter == v.begin() || iter == v.end())
return 0;
return (iter-1)->data;
// For each interval in the output traverse the input and sum the
// data constants
for (auto& val : output)
int sectionData = 0;
for (const auto& iv : input)
sectionData += FindDataValue(iv, val.intervalStart); = sectionData;
for (const auto& i : output)
std::cout << "loc: " << i.intervalStart << " data: " << << std::endl;
return 0;
Edit2: #Stas's code is a very good way to approach this problem. I've just tested it on all the edge cases I could think of.
Here's my merge_intervals implementation in case anyone is interested. The only slight change I've had to make to the snippets Stas provided is:
for (auto& v : input)
v.back().data = 0;
Before combining the vectors as suggested. Thanks!
template<class It1, class It2, class OutputIt>
OutputIt merge_intervals(It1 first1, It1 last1,
It2 first2, It2 last2,
OutputIt destBegin)
const auto begin1 = first1;
const auto begin2 = first2;
auto CombineData = [](const DataType& d1, const DataType& d2)
return DataType{d1.intervalStart, (};
for (; first1 != last1; ++destBegin)
if (first2 == last2)
return std::copy(first1, last1, destBegin);
if (first1->intervalStart == first2->intervalStart)
*destBegin = CombineData(*first1, *first2);
++first1; ++first2;
else if (first1->intervalStart < first2->intervalStart)
if (first2 > begin2)
*destBegin = CombineData(*first1, *(first2-1));
*destBegin = *first1;
if (first1 > begin1)
*destBegin = CombineData(*first2, *(first1-1));
*destBegin = *first2;
return std::copy(first2, last2, destBegin);

Unfortunately, your algorithm is inherently slow. It doesn't make sense to profile or apply some C++ specific tweaks, it won't help. It will never stop calculation on pretty small sets like merging 1000 lists of 10000 elements each.
Let's try to evaluate time complexity of your algo. For the sake of simplicity, let's merge only lists of the same length.
L - length of a list
N - number of lists to be merged
T = L * N - length of a whole concatenated list
Complexity of your algorithm steps:
create output vector - O(T)
sort output vector - O(T*log(T))
filter output vector - O(T)
fix data in output vector - O(T*T)
See, the last step defines the whole algorithm complexity: O(T*T) = O(L^2*N^2). It is not acceptable for practical application. See, to merge 1000 lists of 10000 elements each, the algorithm should run 10^14 cycles.
Actually, the task is pretty complex, so do not try to solve it in one step. Divide and conquer!
Write an algorithm that merges two lists into one
Use it to merge a list of lists
Merging two lists into one
This is relatively easy to implement (but be careful with corner cases). The algorithm should have linear time complexity: O(2*L). Take a look at how std::merge is implemented. You just need to write your custom variant of std::merge, let's call it merge_intervals.
Applying a merge algorithm to a list of lists
This is a little bit tricky, but again, divide and conquer! The idea is to do recursive merge: split a list of lists on two halves and merge them.
template<class It, class Combine>
auto merge_n(It first, It last, Combine comb)
-> typename std::remove_reference<decltype(*first)>::type
if (first == last)
throw std::invalid_argument("Empty range");
auto count = std::distance(first, last);
if (count == 1)
return *first;
auto it = first;
std::advance(it, count / 2);
auto left = merge_n(first, it, comb);
auto right = merge_n(it, last, comb);
return comb(left, right);
auto combine = [](const std::vector<DataType>& a, const std::vector<DataType>& b)
std::vector<DataType> result;
merge_intervals(a.begin(), a.end(), b.begin(), b.end(),
return result;
auto output = merge_n(input.begin(), input.end(), combine);
The nice property of such recursive approach is a time complexity: it is O(L*N*log(N)) for the whole algorithm. So, to merge 1000 lists of 10000 elements each, the algorithm should run 10000 * 1000 * 9.966 = 99,660,000 cycles. It is 1,000,000 times faster than original algorithm.
Moreover, such algorithm is inherently parallelizable. It is not a big deal to write parallel version of merge_n and run it on thread pool.

I know I'm a bit late to the party, but when I started writing this you hadn't a suitable answer yet, and my solution should have a relatively good time complexity, so here you go:
I think the most straightforward way to approach this is to see each of your sorted lists as a stream of events: At a given time, the value (of that stream) changes to a new value:
template<typename T>
struct Point {
using value_type = T;
float time;
T value;
You want to superimpose those streams into a single stream (i.e. having their values summed up at any given point). For that you take the earliest event from all streams, and apply its effect on the result stream. Therefore, you need to first "undo" the effect that the previous value from that stream made on the result stream, and then add the new value to the current value of the result stream.
To be able to do that, you need to remember for each stream the last value, the next value (and when the stream is empty):
std::vector<std::tuple<Value, StreamIterator, StreamIterator>> streams;
The first element of the tuple is the last effect of that stream onto the result stream, the second is an iterator pointing to the streams next event, and the last is the end iterator of that stream:
transform(from, to, inserter(streams, begin(streams)),
[] (auto & stream) {
return make_tuple(static_cast<Value>(0), begin(stream), end(stream));
To be able to always get the earliest event of all the streams, it helps to keep the (information about the) streams in a (min) heap, where the top element is the stream with the next (earliest) event. That's the purpose of the following comparator:
auto heap_compare = [] (auto const & lhs, auto const & rhs) {
bool less = (*get<1>(lhs)).time < (*get<1>(rhs)).time;
return (not less);
Then, as long as there are still some events (i.e. some stream that is not empty), first (re)build the heap, take the top element and apply its next event to the result stream, and then remove that element from the stream. Finally, if the stream is now empty, remove it.
// The current value of the result stream.
Value current = 0;
while (streams.size() > 0) {
// Reorder the stream information to get the one with the earliest next
// value into top ...
make_heap(begin(streams), end(streams), heap_compare);
// .. and select it.
auto & earliest = streams[0];
// New value is the current one, minus the previous effect of the selected
// stream plus the new value from the selected stream
current = current - get<0>(earliest) + (*get<1>(earliest)).value;
// Store the new time point with the new value and the time of the used
// time point from the selected stream
*out++ = Point<Value>{(*get<1>(earliest)).time, current};
// Update the effect of the selected stream
get<0>(earliest) = (*get<1>(earliest)).value;
// Advance selected stream to its next time point
// Remove stream if empty
if (get<1>(earliest) == get<2>(earliest)) {
swap(streams[0], streams[streams.size() - 1u]);
This will return a stream where there might be multiple points with the same time, but a different value. This occurs when there are multiple "events" at the same time. If you only want the last value, i.e. the value after all these events happened, then one needs to combine them:
merge_point_lists(begin(input), end(input), inserter(merged, begin(merged)));
// returns points with the same time, but with different values. remove these
// duplicates, by first making them REALLY equal, i.e. setting their values
// to the last value ...
for (auto write = begin(merged), read = begin(merged), stop = end(merged);
write != stop;) {
for (++read; (read != stop) and (read->time == write->time); ++read) {
write->value = read->value;
for (auto const cached = (write++)->value; write != read; ++write) {
write->value = cached;
// ... and then removing them.
unique(begin(merged), end(merged),
[](auto const & lhs, auto const & rhs) {
return (lhs.time == rhs.time);}),
(Live example here)
Concerning the time complexity: This is iterating over all "events", so it depends on the number of events e. The very first make_heap call has to built a complete new heap, this has worst case complexity of 3 * s where s is the number of streams the function has to merge. On subsequent calls, make_heap only has to correct the very first element, this has worst case complexity of log(s'). I write s' because the number of streams (that need to be considered) will decrease to zero. This
3s + (e-1) * log(s')
as complexity. Assuming the worst case, where s' decreases slowly (this happens when the events are evenly distributed across the streams, i.e. all streams have the same number of events:
3s + (e - 1 - s) * log(s) + (sum (log(i)) i = i to s)

Do you really need a data structure as result? I don't think so. Actually you're defining several functions that can be added. The examples you give are encoded using a 'start, value(, implicit end)' tuple. The basic building block is a function that looks up it's value at a certain point:
double valueAt(const vector<edge> &starts, float point) {
auto it = std::adjacent_find(begin(starts), end(starts),
[&](edge e1, edge e2) {
return e1.x <= point && point < e2.x;
return it->second;
The function value for a point is the sum of the function values for all code-series.
If you really need a list in the end, you can join and sort all edge.x values for all series, and create the list from that.
Unless performance is an issue :)

If you can combine two of these structures, you can combine many.
First, encapsulate your std::vector into a class. Implement what you know as operator+= (and define operator+ in terms of this if you want). With that in place, you can combine as many as you like, just by repeated addition. You could even use std::accumulate to combine a collection of them.


Inserting multiple values into a vector at specific positions

Say I have a vector of integers like this std::vector<int> _data;
I know that if I want to remove multiple items from _data, then I can simply call
_data.erase( std::remove_if( _data.begin(), _data.end(), [condition] ), _data.end() );
Which is much faster than eraseing multiple elements, as less movement of data is required within the vector. I'm wondering if there's something similar for insertions.
For example, if I have the following pairs
auto pair1 = { _data.begin() + 5, 5 };
auto pair2 = { _data.begin() + 12, 12 };
Can I insert both of these in one iteration using some existing std function? I know I can do something like:
_data.insert( pair2.first, pair2.second );
_data.insert( pair1.first, pair1.second );
But this is (very) slow for large vectors (talking 100,000+ elements).
EDIT: Basically, I have a custom set (and map) which use a vector as the underlying containers. I know I can just use std::set or std::map, but the number of traversals I do far outweighs the insertion/removals. Switching from a set and map to this custom set/map already cut 20% of run-time off. Currently though, insertions take approximately 10% of the remaining run time, so reducing that is important.
The order is also required, unfortunately. As much as possible, I use the unordered_ versions, but in some places the order does matter.
One way is to create another vector with capacity equal to the original size plus the number of the elements being inserted and then do an insert loop with no reallocations, O(N) complexity:
template<class T>
std::vector<T> insert_elements(std::vector<T> const& v, std::initializer_list<std::pair<std::size_t, T>> new_elements) {
std::vector<T> u;
u.reserve(v.size() + new_elements.size());
auto src = v.begin();
size_t copied = 0;
for(auto const& element : new_elements) {
auto to_copy = element.first - copied;
auto src_end = src + to_copy;
u.insert(u.end(), src, src_end);
src = src_end;
copied += to_copy;
u.insert(u.end(), src, v.end());
return u;
int main() {
std::vector<int> v{1, 3, 5};
for(auto e : insert_elements(v, {{1,2}, {2,4}}))
std::cout << e << ' ';
std::cout << '\n';
1 2 3 4 5
Ok, we need some assumptions. Let old_end be a reverse iterator to the last element of your vector. Assume that your _data has been resized to exactly fit both its current content and what you want to insert. Assume that inp is a container of std::pair containing your data to be inserted that is ordered reversely (so first the element that is to be inserted at the hindmost position and so on). Then we can do:
std::merge(old_end, _data.rend(), inp.begin(), inp.end(), data.rend(), [int i = inp.size()-1](const &T t, const &std::pair<Iter, T> p) mutable {
if( std::distance(_data.begin(), p.first) == i ) {
return false;
return true;
But I think that is not more clear than using a good old for. The problem with the stl-algorithms is that the predicates work on values and not on iterators thats a bit annoying for this problem.
Here's my take:
template<class Key, class Value>
class LinearSet
using Node = std::pair<Key, Value>;
template<class F>
void insert_at_multiple(F&& f)
std::queue<Node> queue;
std::size_t index = 0;
for (auto it = _kvps.begin(); it != _kvps.end(); ++it)
// The container size is left untouched here, no iterator invalidation.
if (std::optional<Node> toInsert = f(index))
*it = std::move(*toInsert);
// Replace current node with queued one.
if (!queue.empty())
*it = std::move(queue.front());
// We now have as many displaced items in the queue as were inserted,
// add them to the end.
while (!queue.empty())
std::vector<Node> _kvps;
This is a linear time algorithm that doesn't need to know the number of inserted elements a priori. For each index, it asks for an element to insert there. If it gets one, it pushes the corresponding existing vector element to a queue and replaces it with the new one. Otherwise, it extracts the current item to the back of the queue and puts the item at the front of the queue into the current position (noop if no elements were inserted yet). Note that the vector size is left untouched during all this. Only at the end do we push back all items still in the queue.
Note that the indices we use for determining inserted item locations here are all pre-insertion. I find this a point of potential confusion (and it is a limitation - you can't add an element at the very end with this algorithm. Could be remedied by calling f during the second loop too, working on that...).
Here's a version that allows inserting arbitrarily many elements at the end (and everywhere else). It passes post-insertion indices to the functor!
template<class F>
void insert_at_multiple(F&& f)
std::queue<Node> queue;
std::size_t index = 0;
for (auto it = _kvps.begin(); it != _kvps.end(); ++it)
if (std::optional<Node> toInsert = f(index))
if (!queue.empty())
*it = std::move(queue.front());
// We now have as many displaced items in the queue as were inserted,
// add them to the end.
while (!queue.empty())
if (std::optional<Node> toInsert = f(index))
Again, this leaves potential for confusion over what it means to insert at indices 0 and 1 (do you end up with an original element in between the two? In the first snippet you would, in the second you wouldn't). Can you insert at the same index multiple times? With pre-insertion indices that makes sense, with post-insertion indices it doesn't. You could also write this in terms of passing the current *it (i.e. key value pair) to the functor, but that alone seems not too useful...
This is an attempt I made, which inserts in reverse order. I did get rid of the iterators/indices for this.
template<class T>
void insert( std::vector<T> &vector, const std::vector<T> &values ) {
size_t last_index = vector.size() - 1;
vector.resize( vector.size() + values.size() ); // relies on T being default constructable
size_t move_position = vector.size() - 1;
size_t last_value_index = values.size() - 1;
size_t values_size = values.size();
bool isLastIndex = false;
while ( !isLastIndex && values_size ) {
if ( values[last_value_index] > vector[last_index] ) {
vector[move_position] = std::move( values[last_value_index--] );
} else {
isLastIndex = last_index == 0;
vector[move_position] = std::move( vector[last_index--] );
if ( isLastIndex && values_size ) {
while ( values_size ) {
vector[move_position--] = std::move( values[last_value_index--] );
Tried with ICC, Clang, and GCC on Godbolt, and vector's insert was faster (for 5 numbers inserted). On my machine, MSVC, same result but less severe. I also compared with Maxim's version from his answer. I realize using Godbolt isn't a good method for comparison, but I don't have access to the 3 other compilers on my current machine.
Results from my machine:
My insert: 659us
Maxim insert: 712us
Vector insert: 315us
Godbolt's ICC
My insert: 470us
Maxim insert: 139us
Vector insert: 127us
Godbolt's GCC
My insert: 815us
Maxim insert: 97us
Vector insert: 97us
Godbolt's Clang:
My insert: 477us
Maxim insert: 188us
Vector insert: 96us

Optimized argmin: an effective way to find an item minimizing a function

Let us say I've got a collection of items and a score function on them:
struct Item { /* some data */ };
std::vector<Item> items;
double score(Item);
I'd like to find the item from that collection whose score is the lowest. An easy way to write this is:
const auto argmin = std::min_element(begin(items), end(items), [](Item a, Item b) {
return score(a) < score(b);
But if score is a heavy-to-compute function, the fact that std::min_element actually calls it multiple times on some items may be worrying. And this is expected because the compiler cannot guess score is a pure function.
How could I find argmin but with score being called only once per item? Memoization is one possibility, anything else?
My objective is to write a code snippet which is easy to read, in a dream world as obvious as calling std::min_element on the collection is.
As I commented above, if the vector is not too big, you can use std::transform to store all scores first, then apply std::min_element.
However, if you want to take benefit of "lazy evaluation", and still want to use C++'s STL, there are some tricks to work it out.
The point is std::accumulate can be regarded as a general reduce or fold operation (like foldl in haskell). With C++17's syntax sugar for std::tuple, we can write something like:
auto [min_ind, _, min_value] = std::accumulate(items.begin(), items.end(),
std::make_tuple(-1LU, 0LU, std::numeric_limits<double>::max()),
[] (std::tuple<std::size_t, std::size_t, double> accu, const Item &s) {
// up to this point, the index of min, the current index, and the last minimal value
auto [min_ind, cur_ind, prev_min] = accu;
double r = score(s);
if ( r < prev_min ) {
return std::make_tuple(cur_ind, cur_ind + 1, r);
} else {
return std::make_tuple(min_ind, cur_ind + 1, prev_min);
Here's a function that does what you want--even going beyond the intuitive "call score exactly once per element" by realizing that there's nothing smaller than negative infinity!
const Item* smallest(const std::vector<Item>& items)
double min_score = items.empty() ? NAN : INFINITY;
const Item* min_item = items.empty() ? nullptr : &*begin(items);
for (const auto& item : items) {
double item_score = score(item);
if (item_score < min_score) {
min_score = item_score;
min_item = &item;
if (item_score == -INFINITY) {
return min_item;
As suggested bu user #liliscent, one could:
generate a collection of precalculated scores,
find the minimum score from it,
and infer the position of the minimizing item from the position of the minimum score.
This is my reading of their suggestion:
template<class InputIt, class Scoring>
auto argmin(InputIt first, InputIt last, Scoring scoring)
using score_type = typename std::result_of_t<Scoring(typename std::iterator_traits<InputIt>::value_type)>;
std::vector<score_type> scores(std::distance(first, last));
std::transform(first, last, begin(scores), scoring);
const auto scoremin = std::min_element(begin(scores), end(scores));
return first + std::distance(begin(scores), scoremin);
With a live demo.

What is the fastest way to return a range of numbers from a sorted array of numbers?

For C++ language, what's the fastest way in processing run-time (in multi core processors), from an algorithm design viewpoint, to search numbers (e.g. between 100 and 1000) that are within an array (or splice or whatever faster data structures for the purpose of this) and return the range of numbers limited to only 10 items returned? e.g. pseudocode in golang:
var listofnums := []uint64
var numcounter := 1
// splice of [1,2,3,4,5,31,32 .. 932536543] this list has 1 billion numeric items.
// the listofnums are already sorted each time an item is added but we do not know the lower_bound or upper_bound of the item list.
// I know I can use binary search to find listofnums[i] where it is smallest at [i] too... I'm asking for suggestions.
for i:=uint(0); i < len(listofnums); i++ {
if listofnums[i] > 100 && listofnums[i] < 1000 {
if listofnums[i]> 1000 || numcounter == 10 {
is this the fastest way? I saw bitmap structures in C++ but not sure if can be applied here.
I've come across this question, which is perfectly fine for veteran programmers to ask but I have no idea why it's down voted.
What is the fastest search method for array?
Can someone please not remove this question but let me rephrase it? Thanks in advance. I hope to find the most optimum way to return a range of numbers from a large array of numeric items.
If I understand your problem correctly you need to find two positions in your array, the first of which all numbers are greater than or equal to 100 and the second of which all numbers are less than or equal to 1000.
The functions std::lower_bound and std::upper_bound do binary searches designed to find such a range.
For arrays, in C++ we usually use a std::vector and denote the beginning and end of ranges using a pair of iterators.
So something like this may be what you need:
std::pair<std::vector<int>::iterator, std::vector<int>::iterator>
find_range(std::vector<int>& v, int min, int max)
auto begin = std::lower_bound(std::begin(v), std::end(v), min);
// start searching after the previously found value
auto end = std::upper_bound(begin, std::end(v), max);
return {begin, end};
You can iterate over that range like this:
auto range = find_range(v, 100, 1000);
for(auto i = range.first; i != range.second; ++i)
std::cout << *i << '\n';
You can create a new vector from the range (slow) like this:
std::vector<int> selection{range.first, range.second};
My first attempt.
logN time complexity
creates an array slice, no copying of data
second binary search minimises the search space on the basis of the first
possible improvements:
if n is small, the second binary search would be a pessimisation. Better to simply count forward up to n times.
#include <vector>
#include <cstdint>
#include <algorithm>
#include <iterator>
#include <iostream>
template <class Iter> struct range
range(Iter first, std::size_t size) : begin_(first), end_(first + size) {}
auto begin() const { return begin_; }
auto end() const { return end_; }
Iter begin_, end_;
template<class Iter> range(Iter, std::size_t) -> range<Iter>;
auto find_first_n_between(std::vector<std::int64_t>& vec,
std::size_t n,
std::int64_t from, std::int64_t to)
auto lower = std::lower_bound(begin(vec), end(vec), from);
auto upper = std::upper_bound(lower, end(vec), to);
auto size = std::min(n, std::size_t(std::distance(lower, upper)));
return range(lower, size);
int main()
std::vector<std::int64_t> vec { 1,2,3,4,5,6,7,8,15,17,18,19,20 };
auto slice = find_first_n_between(vec, 5, 6, 15);
std::copy(std::begin(slice), std::end(slice), std::ostream_iterator<std::int64_t>(std::cout, ", "));

How to merge sorted vectors into a single vector in C++

I have 10,000 vector<pair<unsigned,unsigned>> and I want to merge them into a single vector such that it is lexicographically sorted and does not contain duplicates. In order to do so I wrote the following code. However, to my surprise the below code is taking a lot of time. Can someone please suggest as to how can I reduce the running time of my code?
using obj = pair<unsigned, unsigned>
vector< vector<obj> > vecOfVec; // 10,000 vector<obj>, each sorted with size()=10M
vector<obj> result;
for(auto it=vecOfVec.begin(), l=vecOfVec.end(); it!=l; ++it)
// append vectors
// sort result
std::sort(result.begin(), result.end());
// remove duplicates from result
result.erase(std::unique(result.begin(), result.end()), result.end());
I think you should use the fact that the vector in vectOfVect are sorted.
So detecting the min value in the front on the single vectors, push_back() it in the result and remove all the values detected from the front of the vectors matching the min values (avoiding duplicates in result).
If you can delete the vecOfVec variable, something like (caution: code not tested: just to give an idea)
while ( vecOfVec.size() )
// detect the minimal front value
auto itc = vecOfVec.cbegin();
auto lc = vecOfVec.cend();
auto valMin = itc->front();
while ( ++itc != lc )
valMin = std::min(valMin, itc->front());
// push_back() the minimal front value in result
for ( auto it = vecOfVec.begin() ; it != vecOfVec.end() ; )
// remove all the front values equals to valMin (this remove the
// duplicates from result)
while ( (false == it->empty()) && (valMin == it->front()) )
// when a vector is empty is removed
it = ( it->empty() ? vecOfVec.erase(it) : ++it );
If you can, I suggest you to switch vecOfVec from a vector< vector<obj> > to something that permit an efficient removal from the front of single containers (stacks?) and an efficient removal of single containers (a list?).
If there are lot of duplicates, you should use set rather than vector for your result, as set is the most natural thing to store something without duplicates:
set< pair<unsigned,unsigned> > resultSet;
for (auto it=vecOfVec.begin(); it!=vecOfVec.end(); ++it)
resultSet.insert(it->begin(), it->end());
If you need to turn it into a vector, you can write
vector< pair<unsigned,unsigned> > resultVec(resultSet.begin(), resultSet.end());
Note that since your code runs over 800 billion elements, it would still take a lot of time, no matter what. At least hours, if not days.
Other ideas are:
recursively merge vectors (10000 -> 5000 -> 2500 -> ... -> 1)
to merge 10000 vectors, store 10000 iterators in a heap structure
One problem with your code is the excessive use of std::sort. Unfortunately, the quicksort algorithm (which usually is the working horse used by std::sort) is not particularly faster when encountering an already sorted array.
Moreover, you're not exploiting the fact that your initial vectors are already sorted. This can be exploited by using a heap of their next values, when you will not need to call sort again. This may be coded as follows (code tested using obj=int), but perhaps it can be made more concise.
// represents the next unused entry in one vector<obj>
template<typename obj>
struct feed
typename std::vector<obj>::const_iterator current, end;
feed(std::vector<obj> const&v)
: current(v.begin()), end(v.end()) {}
friend bool operator> (feed const&l, feed const&r)
{ return *(l.current) > *(r.current); }
// - returns the smallest element
// - set corresponding feeder to next and re-establish the heap
template<typename obj>
obj get_next(std::vector<feed<obj>>&heap)
auto&f = heap[0];
auto x = *(f.current++);
if(f.current == f.end) {
} else
return x;
template<typename obj>
std::vector<obj> merge(std::vector<std::vector<obj>>const&vecOfvec)
// create min heap of feed<obj> and count total number of objects
std::vector<feed<obj>> input;
size_t num_total = 0;
for(auto const&v:vecOfvec)
if(v.size()) {
num_total += v.size();
// append values in ascending order, avoiding duplicates
std::vector<obj> result;
while(!input.empty()) {
auto x = get_next(input);
while(!input.empty() &&
!(*(input[0].current) > x)) // remove duplicates
return result;

A* and N-Puzzle optimization

I am writing a solver for the N-Puzzle (see
Right now I am using a unordered_map to store hash values of the puzzle board,
and manhattan distance as the heuristic for the algorithm, which is a plain DFS.
so I have
auto pred = [](Node * lhs, Node * rhs){ return lhs->manhattanCost_ < rhs->manhattanCost_; };
std::multiset<Node *, decltype(pred)> frontier(pred);
std::vector<Node *> explored; // holds nodes we have already explored
std::tr1::unordered_set<unsigned> frontierHashTable;
std::tr1::unordered_set<unsigned> exploredHashTable;
This works great for n = 2 and 3.
However, its really hit and miss for n=4 and above. (stl unable to allocate memory for a new node)
I also suspect that I am getting hash collisions in the unordered_set
unsigned makeHash(const Node & pNode)
unsigned int b = 378551;
unsigned int a = 63689;
unsigned int hash = 0;
for(std::size_t i = 0; i < pNode.data_.size(); i++)
hash = hash * a + pNode.data_[i];
a = a * b;
return hash;
16! = 2 × 10^13 (possible arrangements)
2^32 = 4 x 10^9 (possible hash values in a 32 bit hash)
My question is how can I optimize my code to solve for n=4 and n=5?
I know from here
that n=4 is possible in less than a second on average.
The algorithm itself is here:
bool NPuzzle::aStarSearch()
auto pred = [](Node * lhs, Node * rhs){ return lhs->manhattanCost_ < rhs->manhattanCost_; };
std::multiset<Node *, decltype(pred)> frontier(pred);
std::vector<Node *> explored; // holds nodes we have already explored
std::tr1::unordered_set<unsigned> frontierHashTable;
std::tr1::unordered_set<unsigned> exploredHashTable;
// if we are in the solved position in the first place, return true
if(initial_ == target_)
current_ = initial_;
return true;
frontier.insert(new Node(initial_)); // we are going to delete everything from the frontier later..
std::cout << "depth first search " << "cant solve!" << std::endl;
return false;
// remove a node from the frontier, and place it into the explored set
Node * pLeaf = *frontier.begin();
// do the same for the hash table
unsigned hashValue = makeHash(*pLeaf);
std::vector<Node *> children = pLeaf->genChildren();
for( auto it = children.begin(); it != children.end(); ++it)
unsigned childHash = makeHash(**it);
if(inFrontierOrExplored(frontierHashTable, exploredHashTable, childHash))
delete *it;
if(**it == target_)
current_ = **it;
// delete everything else in children
for( auto it2 = ++it; it2 != children.end(); ++it2)
delete * it2;
// delete everything in the frontier
for( auto it = frontier.begin(); it != frontier.end(); ++it)
delete *it;
// delete everything in explored
for( auto it = explored.begin(); it != explored.end(); ++it)
delete *it;
return true;
Since this is homework I will suggest some strategies you might try.
First, try using valgrind or a similar tool to check for memory leaks. You may have some memory leaks if you don't delete everything you new.
Second, calculate a bound on the number of nodes that should be explored. Keep track of the number of nodes you do explore. If you pass the bound, you might not be detecting cycles properly.
Third, try the algorithm with depth first search instead of A*. Its memory requirements should be linear in the depth of the tree and it should just be a matter of changing the sort ordering (pred). If DFS works, your A* search may be exploring too many nodes or your memory structures might be too inefficient. If DFS doesn't work, again it might be a problem with cycles.
Fourth, try more compact memory structures. For example, std::multiset does what you want but std::priority_queue with a std::deque may take up less memory. There are other changes you could try and see if they improve things.
First i would recommend cantor expansion, which you can use as the hashing method. It's 1-to-1, i.e. the 16! possible arrangements would be hashed into 0 ~ 16! - 1.
And then i would implement map by my self, as you may know, std is not efficient enough for computation. map is actually a Binary Search Tree, i would recommend Size Balanced Tree, or you can use AVL tree.
And just for record, directly use bool hash[] & big prime may also receive good result.
Then the most important thing - the A* function, like what's in the first of your link, you may try variety of A* function and find the best one.
You are only using the heuristic function to order the multiset. You should use the min(g(n) + f(n)) i.e. the min(path length + heuristic) to order your frontier.
Here the problem is, you are picking the one with the least heuristic, which may not be the correct "next child" to pick.
I believe this is what is causing your calculation to explode.