Total merging search optimization - c++

Assume there is a vector VA of size N, and each element is another vector of type T. There is an operation on type T and returning a new value of type T, i.e., bool merge(T a, T b, T &ret);. If a and c can be merged, then store the result in ret and return true; otherwise, return false. The merge operation is reflective and transitive.
A solution is found if either:
∃ x0, x1, ..., xN-1. merge(VA[0][x0], VA[1][x1], merge(VA[2][x2], ..., merge(VA[N-2][xN-2],VA[N-1][xN-1], ret)...));
any elements from N-1 (not N) sub-vectors can be merged (pick any N-1 with exactly one exception).
For example:
VA is of size 3. Element a can be merged with Element b with the result c. Element c can be merged with Element d with the result e.
VA[0] = {a}
VA[1] = {b, q}
VA[2] = {d, r}
All solutions in the above example are: {a,b}, {a,d}, {b,d}, {a,b,d}.
The task is to find all solution in the given vector VA.
My C++ code is:
void findAll(unsigned int step, unsigned int size, const T pUnifier, int hole_id) {
if(step == size) printOneResult(pUnifier);
else {
_path[step] = -1;
findAll(step + 1, pUnifier, step);
std::vector<T> vec = VA[step];
for(std::vector<T>::const_iterator it = vec.begin(); it < vec.end(); it++) {
T nextUnifier();
if( merge( *it, pUnifier, nextUnifier )) {
_path[lit_id] = it->getID();
findAll(step + 1, nextUnifier, hole_id);
The code contains recursive calls; however, it is not tail recursive. It is running slowly in practice. In reality, the size of VA is possibly hundreds and each sub-vector size is of hundreds, too. I'm wondering whether it can be optimized.
Thank you very much.

If I'm understanding your code correctly, you're performing a (recursive) brute-force search. This is not efficient, since you're given some information about your search space.
I think a good candidate here would be the A* algorithm. You could use the current greatest-chain size as the heuristic, or perhaps even the sum of the squares of the chain sizes.

To improve your code, as you use vectors, you should use the [] operator, with a int counter instead of simple iterators, that are much much slower.
You can improve it even more by minimising the function calls i either of your loops, like previously stacking the values you will use.
Since you didn't explained what really was a T_VEC, i coudln't not wrote the complete iterator-free version, but this should already be a great plus regarding speed.


Is Big-O Notation also calculated from the functions used?

I'm learning about Big-O Notation and algorithms to improve my interview skills, but I don't quite understand how to get the time complexity.
Suppose I want to sum all the elements of the following list.
std::vector<int> myList = {1,2,3,4,5} ;
Case 1:
int sum = 0;
for (int it: myList)
sum += it;
Case 2:
int sum = std::accumulate(std::begin(myList), std::end(myList), 0);
Case 1 is O(N), and case 2 is apparently O(1), but I'm sure those functions do some kind of iteration, so the question is whether Big-O notation is calculated only from of the written code of that block or also of the functions used.
If you talk about big-O, you have to talk in respect of some unit of data being processed. Both your case 1 and case 2 are O(N) where N is the number of items in the container: the unit is an int.
You tend to want the unit - and N to be the count of - the thing that's likely to grow/vary most in your program. For example, if you're talking about processing names in phonebooks, then the number of names should be N; even though the length of individual names is also somewhat variable, there's no expected pattern of increasing average name length as your program handles larger phonebooks.
Similarly, if your program had to handle an arbitrary number of containers that tended to be roughly the same length, then your unit might be a container, and then you could think of your code - case 1 and case 2 - as being big-O O(1) with respect to the number of containers, because whether there are 0, 1, 10 or a million other containers lying around someone in your program, you're only processing the one - myList. But, any individual accumulate call is O(N) with respect to any individual container's ints.
I think this example should give you an idea.
int sum(std::vector<int> const& list)
int result = 0;
for( elem const& : list )
result += elem;
return result;
int main()
std::vector<int> test = {1,2,3,4,5,6};
// O(n)
int sum1 = 0;
for( elem const& : test )
sum1 += elem;
// O(???)
int sum2 = sum(test);
For an evaluation of the time complexity, it makes more sense to count the operations that take constant time. Hence sum is not a particularly good candidate unless
the sums are always done on the same number of elements, or
the distribution of the sum lengths is known and independent of the circumstances where the calls are made (to avoid any bias).
Such evaluations are rather unusual.
case 2 is apparently O(1)
Says who? says about accumulate:
Linear in the distance between first and last.
Which is the same O(N) as your case 1 code.
(I also checked but in this case it doesn't say something about the complexity.)

Why erase+remove is more efficient than remove

I'm coding C++ to solve this problem from Leetcode:
Given an array nums and a value val, remove all instances of that
value in-place and return the new length.
Do not allocate extra space for another array, you must do this by
modifying the input array in-place with O(1) extra memory.
The order of elements can be changed. It doesn't matter what you leave
beyond the new length.
Example 1:
Given nums = [3,2,2,3], val = 3,
Your function should return length = 2, with the first two elements of
nums being 2.
It doesn't matter what you leave beyond the returned length.
I have two solutions:
Solution A:
int removeElement(vector<int>& nums, int val) {
nums.erase(remove(begin(nums), end(nums), val), end(nums));
return nums.size();
Solution B:
int removeElement(vector<int>& nums, int val) {
auto it = std::remove(nums.begin(), nums.end(), val);
return it - nums.begin();
In my opinion, Solution B should be faster than Solution A. However, the result is the opposite:
Solution A spent 0 ms, whereas Solution B spent 4 ms.
I don't know why remove + erase is faster than remove.
For a vector of trivially destructible type (int is one such type), erase(it, end()) is usually just a decrement of a size member (or pointer member, depending on the implementation strategy) that takes almost no time. 4 milliseconds is a very very small difference. It can be easily caused by the state of the machine. And I won't expect such a small difference will be reproducible.
If you want to really remove the elements from the vector, go with the first version. If you really want to do what std::remove does (you probably don't), go with the second version. Performance is not the problem here.

Most efficient way to find index of matching values in two sorted arrays using C++

I currently have a solution but I feel it's not as efficient as it could be to this problem, so I want to see if there is a faster method to this.
I have two arrays (std::vectors for example). Both arrays contain only unique integer values that are sorted but are sparse in value, ie: 1,4,12,13... What I want to ask is there fast way I can find the INDEX to one of the arrays where the values are the same. For example, array1 has values 1,4,12,13 and array2 has values 2,12,14,16. The first matching value index is 1 in array2. The index into the array is what is important as I have other arrays that contain data that will use this index that "matches".
I am not confined to using arrays, maps are possible to. I am only comparing the two arrays once. They will not be reused again after the first matching pass. There can be small to large number of values (300,000+) in either array, but DO NOT always have the same number of values (that would make things much easier)
Worse case is a linear search O(N^2). Using map would get me better O(log N) but I would still have convert an array to into a map of value, index pairs.
What I currently have to not do any container type conversions is this. Loop over the smaller of the two arrays. Compare current element of small array (array1) with the current element of large array (array2). If array1 element value is larger than array2 element value, increment the index for array2 until is it no longer larger than array1 element value (while loop). Then, if array1 element value is smaller than array2 element, go to next loop iteration and begin again. Otherwise they must be equal and I have my index to either arrays of the matching value.
So in this loop, I am at best O(N) if all values have matches and at worse O(2N) if none match. So I am wondering if there is something faster out there? It's hard to know for sure how often the two arrays will match, but I would way I would lean more toward most of the arrays will mostly have matches than not.
I hope I explained the problem well enough and I appreciate any feedback or tips on improving this.
Code example:
std::vector<int> array1 = {4,6,12,34};
std::vector<int> array2 = {1,3,6,34,40};
for(unsigned int i=0, z=0; i < array1.size(); i++)
int value1 = array1[i];
while(value1 > array2[z] && z < array2.size())
if (z >= array2.size())
break; // reached end of array2
if (value1 < array2[z])
// we have a match, i and z indices have same value
Result will be matching indexes for array1 = [1,3] and for array2= [2,3]
I wrote an implementation of this function using an algorithm that performs better with sparse distributions, than the trivial linear merge.
For distributions, that are similar†, it has O(n) complexity but ranges where the distributions are greatly different, it should perform below linear, approaching O(log n) in optimal cases. However, I wasn't able to prove that the worst case isn't better than O(n log n). On the other hand, I haven't been able to find that worst case either.
I templated it so that any type of ranges can be used, such as sub-ranges or raw arrays. Technically it works with non-random access iterators as well, but the complexity is much greater, so it's not recommended. I think it should be possible to modify the algorithm to fall back to linear search in that case, but I haven't bothered.
† By similar distribution, I mean that the pair of arrays have many crossings. By crossing, I mean a point where you would switch from one array to another if you were to merge the two arrays together in sorted order.
#include <algorithm>
#include <iterator>
#include <utility>
// helper structure for the search
template<class Range, class Out>
struct search_data {
// is any there clearer way to get iterator that might be either
// a Range::const_iterator or const T*?
using iterator = decltype(std::cbegin(std::declval<Range&>()));
iterator curr;
const iterator begin, end;
Out out;
template<class Range, class Out>
auto init_search_data(const Range& range, Out out) {
return search_data<Range, Out>{
template<class Range, class Out1, class Out2>
void match_indices(const Range& in1, const Range& in2, Out1 out1, Out2 out2) {
auto search_data1 = init_search_data(in1, out1);
auto search_data2 = init_search_data(in2, out2);
// initial order is arbitrary
auto lesser = &search_data1;
auto greater = &search_data2;
// if either range is exhausted, we are finished
while(lesser->curr != lesser->end
&& greater->curr != greater->end) {
// difference of first values in each range
auto delta = *greater->curr - *lesser->curr;
if(!delta) { // matching value was found
// store both results and increment the iterators
*lesser->out++ = std::distance(lesser->begin, lesser->curr++);
*greater->out++ = std::distance(greater->begin, greater->curr++);
continue; // then start a new iteraton
if(delta < 0) { // set the order of ranges by their first value
std::swap(lesser, greater);
delta = -delta; // delta is always positive after this
// next crossing cannot be farther than the delta
// this assumption has following pre-requisites:
// range is sorted, values are integers, values in the range are unique
auto range_left = std::distance(lesser->curr, lesser->end);
auto upper_limit =
std::min(range_left, static_cast<decltype(range_left)>(delta));
// exponential search for a sub range where the value at upper bound
// is greater than target, and value at lower bound is lesser
auto target = *greater->curr;
auto lower = lesser->curr;
auto upper = std::next(lower, upper_limit);
for(int i = 1; i < upper_limit; i *= 2) {
auto guess = std::next(lower, i);
if(*guess >= target) {
upper = guess;
lower = guess;
// skip all values in lesser,
// that are less than the least value in greater
lesser->curr = std::lower_bound(lower, upper, target);
#include <iostream>
#include <vector>
int main() {
std::vector<int> array1 = {4,6,12,34};
std::vector<int> array2 = {1,3,6,34};
std::vector<std::size_t> indices1;
std::vector<std::size_t> indices2;
match_indices(array1, array2,
std::cout << "indices in array1: ";
for(std::vector<int>::size_type i : indices1)
std::cout << i << ' ';
std::cout << "\nindices in array2: ";
for(std::vector<int>::size_type i : indices2)
std::cout << i << ' ';
std::cout << std::endl;
Since the arrays are already sorted you can just use something very much like the merge step of mergesort. This just looks at the head element of each array, and discards the lower element (the next element becomes the head). Stop when you find a match (or when either array becomes exhausted, indicating no match).
This is O(n) and the fastest you can do for arbitrary distubtions. With certain clustered distributions a "skip ahead" approach could be used rather than always looking at the next element. This could result in better than O(n) running times for certain distributions. For example, given the arrays 1,2,3,4,5 and 10,11,12,13,14 an algorithm could determine there were no matches to be found in as few as one comparison (5 < 10).
What is the range of the stored numbers?
I mean, you say that the numbers are integers, sorted, and sparse (i.e. non-sequential), and that there may be more than 300,000 of them, but what is their actual range?
The reason that I ask is that, if there is a reasonably small upper limit, u, (say, u=500,000), the fastest and most expedient solution might be to just use the values as indices. Yes, you might be wasting memory, but is 4*u really a lot of memory? This depends on your application and your target platform (i.e. if this is for a memory-constrained embedded system, its less likely to be a good idea than if you have a laptop with 32GiB RAM).
Of course, if the values are more-or-less evenly spread over 0-2^31-1, this crude idea isn't attractive, but maybe there are properties of the input values that you can exploit other simply than the range. You might be able to hand-write a fairly simple hash function.
Another thing worth considering is whether you actually need to be able to retrieve the index quickly or if it helps just be able to tell if the index exists in the other array quickly. Whether or not a value exists at a particular index requires only one bit, so you could have a bitmap of the range of the input values using 32x less memory (i.e. mask off 5 LSBs and use that as a bit position, then shift the remaining 27 bits 5 places right and use that as an array index).
Finally, a hybrid approach might be worth considering, where you decide how much memory you're prepared to use (say you decide 256KiB, which corresponds to 64Ki 4-byte integers) then use that as a lookup-table to into much smaller sub-problems. Say you have 300,000 values whose LSBs are pretty evenly distributed. Then you could use 16 LSBs as indices into a lookup-table of lists that are (on average) only 4 or 5 elements long, which you can then search by other means. A couple of year ago, I worked on some simulation software that had ~200,000,000 cells, each with a cell id; some utility functionality used a binary search to identify cells by id. We were able to speed it up significantly and non-intrusively with this strategy. Not a perfect solution, but a great improvement. (If the LSBs are not evenly distributed, maybe that's a property that you can exploit or maybe you can choose a range of bits that are, or do a bit of hashing.)
I guess the upshot is “consider some kind of hashing”, even the “identity hash” or simple masking/modulo with a little “your solution doesn't have to be perfectly general” on the side and some “your solution doesn't have to be perfectly space efficient” sauce on top.

Which is more efficient in this scenario: std::vector<bool> or std::unordered_map<int>?

I know a classic programming interview question is "Given an array of N-1 integers which are numbers 1 through N with one of them missing, find the missing number." I'm thinking that
int missing_number ( int * arr, int n )
std::vector<bool> booVec(n, false);
int * offArrEnd = arr + n;
while (arr != offArrEnd) booVec[*arr++] = true;
return std::find_first_of(booVec.begin(), booVec.end(), false)
- booVec.begin() + 1;
would be a good solution since instantiating a vector<bool> element to all false will take a short amount of time, and so will modifying its elements via booVec[*arr++]. I know I could save 1 operation by changing it to
int missing_number ( int * arr, int n )
std::vector<bool> booVec(n, false);
int * offArrEnd = arr + n;
while (arr != offArrEnd) booVec[*arr++] = true;
std::vector<bool>::iterator offBooEnd = booVec.end();
return std::find_first_of(booVec.begin(), offBooEnd, false)
- offBooEnd + 1;
But I'm wondering if using a similar procedure with unordered_map might be faster overall? I presume it would take longer to instantiate every member of an unordered_map, but it might take faster to modify its elements.
vector in this case where n is bounded should be able to beat unordered_map. The underlying data structure for unordered_map is essentially a vector, where a hash is taken, and the modulus of the hash is taken to choose the index to start at in the vector. (The vector stores the hash table "buckets") As a result, a plain vector is already a perfect hash table and you have a perfect hash -- N from the array! Therefore, the extra mechanism provided by unordered_map is going to be overhead you're not using.
(And that's assuming you don't happen to fall into the case where unordered_map can have O(n) lookup complexity due to hash collisions)
That said, vector<char> may beat vector<bool> due to the bitfield behavior of vector<bool>.
The technique you used above is the basis of Pigeonhole-Sort, with an additional guarantee of no duplicates making it even more efficient.
Thus, the algorithm is O(n) (tight bound).
A std::unordered_set has O(1) expected and O(n) worst case complexity for each of the N-1 insertions though, for a total of O(n) expected and O(n*n) worst case.
Even though the complexity in the expected (and best) case is equal, std::unordered_set is a far more complex container and thus looses the race in any case.
std::vector<bool> does not contain any bool, but is a specialization using proxies to save space (Widely regarded as a design-failure)!
Thus, using a different instantiation of vector, with char or even int will consume more modifiable memory, but might due to more efficient code (no bit-twiddling) be more efficient.
Anyway, both implementations efficiency is dwarfed by simply adding the elements and subtracting the sum from what it would be for an uninterrupted sequence, like Nikola Dimitroff comments.
int missing_number ( int * arr, int n )
unsigned long long r = (unsigned long long)n * (n+1) / 2;
r -= arr[n];
return (int)r;

Get number of elements greater than a number

I am trying to solve the following problem: Numbers are being inserted into a container. Each time a number is inserted I need to know how many elements are in the container that are greater than or equal to the current number being inserted. I believe both operations can be done in logarithmic complexity.
My question:
Are there standard containers in a C++ library that can solve the problem?
I know that std::multiset can insert elements in logarithmic time, but how can you query it? Or should I implement a data structure (e.x. a binary search tree) to solve it?
Great question. I do not think there is anything in STL which would suit your needs (provided you MUST have logarithmic times). I think the best solution then, as aschepler says in comments, is to implement a RB tree. You may have a look at STL source code, particularly on stl_tree.h to see whether you could use bits of it.
Better still, look at : (Rank Tree in C++)
Which contains link to implementation:
You should use a multiset for logarithmic complexity, yes. But computing the distance is the problem, as set/map iterators are Bidirectional, not RandomAccess, std::distance has an O(n) complexity on them:
multiset<int> my_set;
auto it = my_map.lower_bound(3);
size_t count_inserted = distance(it, my_set.end()) // this is definitely O(n)
Your complexity-issue is complicated. Here is a full analysis:
If you want a O(log(n)) complexity for each insertion, you need a sorted structure as a set. If you want the structure to not reallocate or move items when adding a new item, the insertion point distance computation will be O(n). If know the insertion size in advance, you do not need logarithmic insertion time in a sorted container. You can insert all the items then sort, it is as much O(n.log(n)) as n * O(log(n)) insertions in a set.
The only alternative is to use a dedicated container like a weighted RB-tree. Depending on your problem this may be the solution, or something really overkill.
Use multiset and distance, you are O(n.log(n)) on insertion (yes, n insertions * log(n) insertion time for each one of them), O(n.n) on distance computation, but computing distances is very fast.
If you know the inserted data size (n) in advance : Use a vector, fill it, sort it, return your distances, you are O(n.log(n)), and it is easy to code.
If you do not know n in advance, your n is likely huge, each item is memory-heavy so you can not have O(n.log(n)) reallocation : then you have time to re-encode or re-use some non-standard code, you really have to meet these complexity expectations, use a dedicated container. Also consider using a database, you will probably have issues maintaining this in memory.
Here's a quick way using Policy-Based Data Structures in C++:
There exists something called as an Ordered Set, which lets you insert/remove elements in O(logN) time (and pretty much all other functions that std::set has to offer). It also gives 2 more features: Find the Kth element and **find the rank of the Xth element. The problem is that this doesn't allow duplicates :(
No Worries though! We will map duplicates with a separate index/priority, and define a new structure (call it Ordered Multiset)! I've attached my implementation below for reference.
Finally, every time you want to find the no of elements greater than say x, call the function upper_bound (No of elements less than or equal to x) and subtract this number from the size of your Ordered Multiset!
Note: PBDS use a lot of memory, so that is a constraint, I'd suggest using a Binary Search Tree or a Fenwick Tree.
#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace std;
using namespace __gnu_pbds;
struct ordered_multiset { // multiset supporting duplicating values in set
int len = 0;
const int ADD = 1000010;
const int MAXVAL = 1000000010;
unordered_map<int, int> mp; // hash = 96814
tree<int, null_type, less<int>, rb_tree_tag, tree_order_statistics_node_update> T;
ordered_multiset() { len = 0; T.clear(), mp.clear(); }
inline void insert(int x){
len++, x += MAXVAL;
int c = mp[x]++;
T.insert((x * ADD) + c); }
inline void erase(int x){
x += MAXVAL;
int c = mp[x];
if(c) {
c--, mp[x]--, len--;
T.erase((x*ADD) + c); } }
inline int kth(int k){ // 1-based index, returns the
if(k<1 || k>len) return -1; // K'th element in the treap,
auto it = T.find_by_order(--k); // -1 if none exists
return ((*it)/ADD) - MAXVAL; }
inline int lower_bound(int x){ // Count of value <x in treap
x += MAXVAL;
int c = mp[x];
return (T.order_of_key((x*ADD)+c)); }
inline int upper_bound(int x){ // Count of value <=x in treap
x += MAXVAL;
int c = mp[x];
return (T.order_of_key((x*ADD)+c)); }
inline int size() { return len; } // Number of elements in treap
ordered_multiset s;
for(int i=0; i<n; i++) {
int x; cin>>x;
int ctr = s.size() - s.upper_bound(x);
cout<<ctr<<" ";
Input (n = 6) : 10 1 3 3 2
Output : 0 1 1 1 3
Time Complexity : O(log n) per query/insert
References : mochow13's GitHub
Sounds like a case for count_if - although I admit this doesn't solve it at logarithmic complexity, that would require a sorted type.
vector<int> v = { 1, 2, 3, 4, 5 };
int some_value = 3;
int count = count_if(v.begin(), v.end(), [some_value](int n) { return n > some_value; } );
Edit done to fix syntactic problems with lambda function
If the whole range of numbers is sufficiently small (on the order of a few million), this problem can be solved relatively easily using a Fenwick tree.
Although Fenwick trees are not part of the STL, they are both very easy to implement and time efficient. The time complexity is O(log N) for both updates and queries and the constant factors are low.
You mention in a comment on another question, that you needed this for a contest. Fenwick trees are very popular tools in competitive programming and are often useful.