lower_bound for more than one value in c++ - c++

Suppose I have a sorted vector of numbers from 0 to 1. I want to know the indices, where values become larger than multiples of 0.1 (i.e. the deciles. in the future maybe also percentiles).
A simple solution I have in mind is using std::lower_bound:
std::vector<float> v;
/// something which fills the vector here
std::sort(v.begin(),v.end());
std::vector<float>::iterator i = v.begin();
for (float k = 0.1 ; k < 0.99 ; k+= 0.1) {
i = std::lower_bound (v.begin(), v.end(), k);
std::cout << "reached " << k << " at position " << (low-v.begin()) << std::endl;
std::cout << " going from " << *(low-1) << " to " << *low << std::endl;
// for simplicity of the example, I don't check if low is the first item of the vector
}
Since the vector can be long, I was wondering if this can be made faster. A first optimisation is to not search the part of the vector below the previous decile:
i = std::lower_bound (i, v.end(), k);
But, assuming lower_bound performs a binary search, this still scans the entire upper part of the vector for each decile over and over again and doesn't use the intermediate results from the previous binary search.
So ideally I would like to use a search function to which I can pass multiple search items, somehow like:
float searchvalues[9];
for (int k = 1; k <= 9 ; ++k) {
searchvalues[k] = ((float)k)/10.;
}
int deciles[9] = FANCY_SEARCH(v.begin(),v.end(),searchvalues,9);
is there anything like this already around and existing in standard, boost, or other libraries?

To be in O(log n), you may use the following:
void fill_range(
std::array<boost::optional<std::pair<std::size_t, std::size_t>>, 10u>& ranges,
const std::vector<float>& v,
std::size_t b,
std::size_t e)
{
if (b == e) {
return;
}
int decile_b = v[b] / 0.1f;
int decile_e = v[e - 1] / 0.1f;
if (decile_b == decile_e) {
auto& range = ranges[decile_b];
if (range) {
range->first = std::min(range->first, b);
range->second = std::max(range->second, e);
} else {
range = std::make_pair(b, e);
}
} else {
std::size_t mid = (b + e + 1) / 2;
fill_range(ranges, v, b, mid);
fill_range(ranges, v, mid, e);
}
}
std::array<boost::optional<std::pair<std::size_t, std::size_t>>, 10u>
decile_ranges(const std::vector<float>& v)
{
// assume sorted `v` with value x: 0 <= x < 1
std::array<boost::optional<std::pair<std::size_t, std::size_t>>, 10u> res;
fill_range(res, v, 0, v.size());
return res;
}
Live Demo
but a linear search seems simpler
auto last = v.begin();
for (int i = 0; i != 10; ++i) {
const auto it = std::find_if(v.begin(), v.end(),
[i](float f) {return f >= (i + 1) * 0.1f;});
// ith decile ranges from `last` to `it`;
last = it;
}

There isn't anything in Boost or the C++ Standard Library. Two choices for an algorithm, bearing in mind that both vectors are sorted:
O(N): trundle through the sorted vector, considering the elements of your quantile vector as you go.
O(Log N * Log M): Start with the middle quantile. Call lower_bound. The result of this becomes the higher iterator in a subsequent lower_bound call on the set of quantiles below that pivot and the lower iterator in a subsequent lower_bound call on the set of quantiles above that pivot. Repeat the process for both halves.
For percentiles, my feeling is that (1) will be the faster choice, and is considerably simpler to implement.

Related

How to get combinations from set of elements?

I need to generate all combinations without repetitions from an array, I read some about it, that suggest to use recursion
I have an array
arr = [["A"], ["B"], ["C"], ["D"], ["E"], ["F"]]
I read that I can solved this problem using recursion like
function combinations(arr, n, k)
//do something
//then
return combinations(arr, n, k)
In my case [A, B, C, D] is equivalent to [A, B, D, C].
I found this example in C++
http://www.martinbroadhurst.com/combinations.html
But I couldn't reproduce it.
Any suggestion how can I solve this?
PD: I'm using Python, but I more interested in the algorithm than the language.
For any combinatorics problem, the best way to program it is to figure out the recurrence relation for the counting argument. In the case of combinations, the recurrence relation is simply C(n, k) = C(n - 1, k - 1) + C(n - 1, k).
But what does this mean exactly? Notice, that C(n - 1, k - 1) means that we have taken the first element of the array, and need k - 1 more elements from the other n - 1 elements. Similarly, C(n - 1, k) means that we won't choose the first element of our array as one of the k elements. But remember that if k is 0, then C(n, k) = 1, else if n is 0 then C(n, k) = 0. In our problem, k == 0 would return a set containing the empty set, else if n == 0, we would return the empty set. With this is mind, the code structure would look like this:
def combinations(arr, k):
if k == 0:
return [[]]
elif len(arr) == 0:
return []
result = []
chosen = combinations(arr[1:], k - 1) #we choose the first element of arr as one of the k elements we need
notChosen = combinations(arr[1:], k) #first element not chosen in set of k elements
for combination in chosen:
result.append([arr[0]] + combination)
for combination in notChosen:
result.append(combination)
return result
Now, this function can be optimized by performing memoization (but that can be left as an exercise to you, the reader). As an additional exercise, can you sketch out how the permutation function would look like starting from its counting relation?
Hint:
P(n, k) = C(n, k)k! = [C(n - 1, k - 1) + C(n - 1, k)]k! = P(n - 1, k - 1)k + P(n - 1, k)
[Heck... by the time I posted the answer, the C++ tag went away]
[Edited with more examples, including using char]
Comments in the code:
#include <vector>
// Function that recursively does the actual job
template <typename T, typename Function> void doCombinations(
size_t num, const std::vector<T>& values,
size_t start, std::vector<T>& combinationSoFar,
Function action
) {
if(0==num) { // the entire combination is complete
action(combinationSoFar);
}
else {
// walk through with the current position to the right,
// taking care to let enough walking room for the rest of the elements
for(size_t i=start; i<values.size()+1-num; i++) {
// push the current value there
combinationSoFar.push_back(values[i]);
// recursive call with one less element to enter combination
// and one position to the right for the next element to consider
doCombinations(num-1, values, i+1, combinationSoFar, action);
// pop the current value, we are going to move it to the right anyway
combinationSoFar.pop_back();
}
}
}
// function for the user to call. Prepares everything needed for the
// doCombinations
template <typename T, typename Function>
void for_each_combination(
size_t numInCombination,
const std::vector<T>& values,
Function action
) {
std::vector<T> combination;
doCombinations(numInCombination, values, 0, combination, action);
}
// dummy do-something with the vector
template <typename T> void cout_vector(const std::vector<T>& v) {
std::cout << '[';
for(size_t i=0; i<v.size(); i++) {
if(i) {
std::cout << ",";
}
std::cout << v[i];
}
std::cout << ']' << std::endl;
}
// Assumes the T type supports both addition and ostream <<
template <typename T> void adder(const std::vector<T>& vals) {
T sum=static_cast<T>(0);
for(T v : vals) {
sum+=v;
}
std::cout << "Sum: " << sum << " for ";
cout_vector(vals);
}
int main() {
std::cout << "Char combinations" << std::endl;
std::vector<char> char_vals{'A', 'B', 'C', 'D', 'E'};
for_each_combination(3, char_vals, cout_vector<char>);
std::cout << "\nInt combinations" << std::endl;
std::vector<int> int_vals{0, 1, 2, 3, 4};
for_each_combination(3, int_vals, cout_vector<int>);
std::cout <<"\nFloat combination adder" << std::endl;
std::vector<float> float_vals{0.0, 1.1, 2.2, 3.3, 4.4};
for_each_combination(3, float_vals, adder<float>);
return 0;
}
Output:
Char combinations
[A,B,C]
[A,B,D]
[A,B,E]
[A,C,D]
[A,C,E]
[A,D,E]
[B,C,D]
[B,C,E]
[B,D,E]
[C,D,E]
Int combinations
[0,1,2]
[0,1,3]
[0,1,4]
[0,2,3]
[0,2,4]
[0,3,4]
[1,2,3]
[1,2,4]
[1,3,4]
[2,3,4]
Float combination adder
Sum: 3.3 for [0,1.1,2.2]
Sum: 4.4 for [0,1.1,3.3]
Sum: 5.5 for [0,1.1,4.4]
Sum: 5.5 for [0,2.2,3.3]
Sum: 6.6 for [0,2.2,4.4]
Sum: 7.7 for [0,3.3,4.4]
Sum: 6.6 for [1.1,2.2,3.3]
Sum: 7.7 for [1.1,2.2,4.4]
Sum: 8.8 for [1.1,3.3,4.4]
Sum: 9.9 for [2.2,3.3,4.4]

How to get the equilibrium index of an array in O(n)?

I have done a test in C++ asking for a function that returns one of the indices that splits the input vector in 2 parts having the same sum of the elements, for eg: for the vec = {1, 2, 3, 5, 4, -1, 1, 1, 2, -1}, it may return 3, because 1+2+3 = 6 = 4-1+1+1+2-1. So I have done the function that returns the correct answer:
int func(const std::vector< int >& vecIn)
{
for (std::size_t p = 0; p < vecin.size(); p++)
{
if (std::accumulator(vecIn.begin(), vecIn.begin() + p, 0) ==
std::accumulator(vecIn.begin() + p + 1, vecIn.end(), 0))
return p;
}
return -1;
}
My problem was when the input was a very long vector containing just 1 (or -1), the return of the function was slow. So I have thought of starting the search for the wanted index from middle, and then go left and right. But the best approach I suppose is the one where the index is in the merge-sort algorithm order, that means: n/2, n/4, 3n/4, n/8, 3n/8, 5n/8, 7n/8... where n is the size of the vector. Is there a way to write this order in a formula, so I can apply it in my function?
Thanks
EDIT
After some comments I have to mention that I had done the test a few days ago, so I have forgot to put and mention the part of no solution: it should return -1... I have updated also the question title.
Specifically for this problem, I would use the following algorithm:
Compute the total sum of the vector. This gives two sums (empty vector, and full vector)
for each element in order, move one element from full to empty, which means adding the value of next element from sum(full) to sum(empty). When the two sums are equal, you have found your index.
This give a o(n) algorithm instead of o(n2)
You can solve the problem much faster without calling std::accumulator at each step:
int func(const std::vector< int >& vecIn)
{
int s1 = 0;
int s2 = std::accumulator(vecIn.begin(), vecIn.end(), 0);
for (std::size_t p = 0; p < vecin.size(); p++)
{
if (s1 == s2)
return p;
s1 += vecIn[p];
s2 -= vecIn[p];
}
}
This is O(n). At each step, s1 will contain the sum of the first p elements, and s2 the sum of the rest. You can update both of them with an addition and a subtraction when moving to the next element.
Since std::accumulator needs to iterate over the range you give it, your algorithm was O(n^2), which is why it was so slow for many elements.
To answer the actual question: Your sequence n/2, n/4, 3n/5, n/8, 3n/8 can be rewritten as
1*n/2
1*n/4 3*n/4
1*n/8 3*n/8 5*n/8 7*n/8
...
that is to say, the denominator runs from i=2 up in powers of 2, and the nominator runs from j=1 to i-1 in steps of 2. However, this is not what you need for your actual problem, because the example you give has n=10. Clearly you don't want n/4 there - your indices have to be integer.
The best solution here is to recurse. Given a range [b,e], pick a value middle (b+e/2) and set the new ranges to [b, (b+e/2)-1] and [(b+e/2)=1, e]. Of course, specialize ranges with length 1 or 2.
Considering MSalters comments, I'm afraid another solution would be better. If you want to use less memory, maybe the selected answer is good enough, but to find the possibly multiple solutions you could use the following code:
static const int arr[] = {5,-10,10,-10,10,1,1,1,1,1};
std::vector<int> vec (arr, arr + sizeof(arr) / sizeof(arr[0]) );
// compute cumulative sum
std::vector<int> cumulative_sum( vec.size() );
cumulative_sum[0] = vec[0];
for ( size_t i = 1; i < vec.size(); i++ )
{ cumulative_sum[i] = cumulative_sum[i-1] + vec[i]; }
const int complete_sum = cumulative_sum.back();
// find multiple solutions, if there are any
const int complete_sum_half = complete_sum / 2; // suggesting this is valid...
std::vector<int>::iterator it = cumulative_sum.begin();
std::vector<int> mid_indices;
do {
it = std::find( it, cumulative_sum.end(), complete_sum_half );
if ( it != cumulative_sum.end() )
{ mid_indices.push_back( it - cumulative_sum.begin() ); ++it; }
} while( it != cumulative_sum.end() );
for ( size_t i = 0; i < mid_indices.size(); i++ )
{ std::cout << mid_indices[i] << std::endl; }
std::cout << "Split behind these indices to obtain two equal halfs." << std::endl;
This way, you get all the possible solutions. If there is no solution to split the vector in two equal halfs, mid_indices will be left empty.
Again, you have to sum up each value only once.
My proposal is this:
static const int arr[] = {1,2,3,5,4,-1,1,1,2,-1};
std::vector<int> vec (arr, arr + sizeof(arr) / sizeof(arr[0]) );
int idx1(0), idx2(vec.size()-1);
int sum1(0), sum2(0);
int idxMid = -1;
do {
// fast access without using the index each time.
const int& val1 = vec[idx1];
const int& val2 = vec[idx2];
// Precompute the next (possible) sum values.
const int nSum1 = sum1 + val1;
const int nSum2 = sum2 + val2;
// move the index considering the balanace between the
// left and right sum.
if ( sum1 - nSum2 < sum2 - nSum1 )
{ sum1 = nSum1; idx1++; }
else
{ sum2 = nSum2; idx2--; }
if ( idx1 >= idx2 ){ idxMid = idx2; }
} while( idxMid < 0 && idx2 >= 0 && idx1 < vec.size() );
std::cout << idxMid << std::endl;
It does add every value only once no matter how many values. Such that it's complexity is only O(n) and not O(n^2).
The code simply runs from left and right simultanuously and moves the indices further if it's side is lower than the other.
You want nth term of the series you mentioned. Then it would be:
numerator: (n - 2^((int)(log2 n)) ) *2 + 1
denominator: 2^((int)(log2 n) + 1)
I came across the same question in Codility tests. There is a similar looking answer above (didn't pass some of the unit tests), but below code segment was successful in tests.
#include <vector>
#include <numeric>
#include <iostream>
using namespace std;
// Returns -1 if equilibrium point is not found
// use long long to support bigger ranges
int FindEquilibriumPoint(vector<long> &values) {
long long lower = 0;
long long upper = std::accumulate(values.begin(), values.end(), 0);
for (std::size_t i = 0; i < values.size(); i++) {
upper -= values[i];
if (lower == upper) {
return i;
}
lower += values[i];
}
return -1;
}
int main() {
vector<long> v = {-1, 3, -4, 5, 1, -6, 2, 1};
cout << "Equilibrium Point:" << FindEquilibriumPoint(v) << endl;
return 0;
}
Output
Equilibrium Point:1
Here it is the algorithm in Javascript:
function equi(arr){
var N = arr.length;
if (N == 0){ return -1};
var suma = 0;
for (var i=0; i<N; i++){
suma += arr[i];
}
var suma_iz = 0;
for(i=0; i<N; i++){
var suma_de = suma - suma_iz - arr[i];
if (suma_iz == suma_de){
return i};
suma_iz += arr[i];
}
return -1;
}
As you see this code satisfy the condition of O(n)

Iterate through different subset of size k

I have an array of n integers (not necessarily distinct!) and I would like to iterate over all subsets of size k. However I'd like to exclude all duplicate subsets.
e.g.
array = {1,2,2,3,3,3,3}, n = 7, k = 2
then the subsets I want to iterate over (each once) are:
{1,2},{1,3},{2,2},{2,3},{3,3}
What is an efficient algorithm for doing this?
Is a recursive approach the most efficient/elegant?
In case you have a language-specific answer, I'm using C++.
The same (or almost the same) algorithm which is used to generated combinations of a set of unique values in lexicographical order can be used to generate combinations of a multiset in lexicographical order. Doing it this way avoids the necessity to deduplicate, which is horribly expensive, and also avoids the necessity of maintaining all the generated combinations. It does require that the original list of values be sorted.
The following simple implementation finds the next k-combination of a multiset of n values in average (and worst-case) time O(n). It expects two ranges: the first range is a sorted k-combination, and the second range is the sorted multiset. (If either range is unsorted or the values in first range do not constitute a sub(multi)set of the second range, then the behaviour is undefined; no sanity checks are made.)
Only the end iterator from the second range is actually used, but I thought that made the calling convention a bit odd.
template<typename BidiIter, typename CBidiIter,
typename Compare = std::less<typename BidiIter::value_type>>
int next_comb(BidiIter first, BidiIter last,
CBidiIter /* first_value */, CBidiIter last_value,
Compare comp=Compare()) {
/* 1. Find the rightmost value which could be advanced, if any */
auto p = last;
while (p != first && !comp(*(p - 1), *--last_value)) --p;
if (p == first) return false;
/* 2. Find the smallest value which is greater than the selected value */
for (--p; comp(*p, *(last_value - 1)); --last_value) { }
/* 3. Overwrite the suffix of the subset with the lexicographically smallest
* sequence starting with the new value */
while (p != last) *p++ = *last_value++;
return true;
}
It should be clear that steps 1 and 2 combined make at most O(n) comparisons, because each of the n values is used in at most one comparison. Step 3 copies at most O(k) values, and we know that k≤n.
This could be improved to O(k) in the case where no values are repeated, by maintaining the current combination as a container of iterators into the value list rather than actual values. This would also avoid copying values, at the cost of extra dereferences. If in addition we cache the function which associates each value iterator with an iterator to the first instance of next largest value, we could eliminate Step 2 and reduce the algorithm to O(k) even for repeated values. That might be worthwhile if there are a large number of repeats and comparisons are expensive.
Here's a simple use example:
std::vector<int> values = {1,2,2,3,3,3,3};
/* Since that's sorted, the first subset is just the first k values */
const int k = 2;
std::vector<int> subset{values.cbegin(), values.cbegin() + k};
/* Print each combination */
do {
for (auto const& v : subset) std::cout << v << ' ';
std::cout << '\n';
} while (next_comb(subset.begin(), subset.end(),
values.cbegin(), values.cend()));
Live on coliru
I like bit-twiddling for this problem. Sure, it limits you to only 32 elements in your vector, but it's still cool.
First, given a bit mask, determine the next bitmask permutation (source):
uint32_t next(uint32_t v) {
uint32_t t = v | (v - 1);
return (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));
}
Next, given a vector and a bitmask, give a new vector based on that mask:
std::vector<int> filter(const std::vector<int>& v, uint32_t mask) {
std::vector<int> res;
while (mask) {
res.push_back(v[__builtin_ctz(mask)]);
mask &= mask - 1;
}
return res;
}
And with that, we just need a loop:
std::set<std::vector<int>> get_subsets(const std::vector<int>& arr, uint32_t k) {
std::set<std::vector<int>> s;
uint32_t max = (1 << arr.size());
for (uint32_t v = (1 << k) - 1; v < max; v = next(v)) {
s.insert(filter(arr, v));
}
return s;
}
int main()
{
auto s = get_subsets({1, 2, 2, 3, 3, 3, 3}, 2);
std::cout << s.size() << std::endl; // prints 5
}
The basic idea of this solution is a function like next_permutation but which generates the next ascending sequence of "digits". Here called ascend_ordered.
template< class It >
auto ascend_ordered( const int n_digits, const It begin, const It end )
-> bool
{
using R_it = reverse_iterator< It >;
const R_it r_begin = R_it( end );
const R_it r_end = R_it( begin );
int max_digit = n_digits - 1;
for( R_it it = r_begin ; it != r_end; ++it )
{
if( *it < max_digit )
{
++*it;
const int n_further_items = it - r_begin;
for( It it2 = end - n_further_items; it2 != end; ++it2 )
{
*it2 = *(it2 - 1) + 1;
}
return true;
}
--max_digit;
}
return false;
}
Main program for the case at hand:
auto main() -> int
{
vector<int> a = {1,2,2,3,3,3,3};
assert( is_sorted( begin( a ), end( a ) ) );
const int k = 2;
const int n = a.size();
vector<int> indices( k );
iota( indices.begin(), indices.end(), 0 ); // Fill with 0, 1, 2 ...
set<vector<int>> encountered;
for( ;; )
{
vector<int> current;
for( int const i : indices ) { current.push_back( a[i] ); }
if( encountered.count( current ) == 0 )
{
cout << "Indices " << indices << " -> values " << current << endl;
encountered.insert( current );
}
if( not ascend_ordered( n, begin( indices ), end( indices ) ) )
{
break;
}
}
}
Supporting includes and i/o:
#include <algorithm>
using std::is_sorted;
#include <assert.h>
#include <iterator>
using std::reverse_iterator;
#include <iostream>
using std::ostream; using std::cout; using std::endl;
#include <numeric>
using std::iota;
#include <set>
using std::set;
#include <utility>
using std::begin; using std::end;
#include <vector>
using std::vector;
template< class Container, class Enable_if = typename Container::value_type >
auto operator<<( ostream& stream, const Container& c )
-> ostream&
{
stream << "{";
int n_items_outputted = 0;
for( const int x : c )
{
if( n_items_outputted >= 1 ) { stream << ", "; }
stream << x;
++n_items_outputted;
}
stream << "}";
return stream;
}
Unlike the previous answer, this is not as efficient and doesn't do anything as fancy as a lot of the bit twiddling. However it does not limit the size of your array or the size of the subset.
This solution uses std::next_permutation to generate the combinations, and takes advantage of std::set's uniqueness property.
#include <algorithm>
#include <vector>
#include <set>
#include <iostream>
#include <iterator>
using namespace std;
std::set<std::vector<int>> getSubsets(const std::vector<int>& vect, size_t numToChoose)
{
std::set<std::vector<int>> returnVal;
// return the whole thing if we want to
// choose everything
if (numToChoose >= vect.size())
{
returnVal.insert(vect);
return returnVal;
}
// set up bool vector for combination processing
std::vector<bool> bVect(vect.size() - numToChoose, false);
// stick the true values at the end of the vector
bVect.resize(bVect.size() + numToChoose, true);
// select where the ones are set in the bool vector and populate
// the combination vector
do
{
std::vector<int> combination;
for (size_t i = 0; i < bVect.size() && combination.size() <= numToChoose; ++i)
{
if (bVect[i])
combination.push_back(vect[i]);
}
// sort the combinations
std::sort(combination.begin(), combination.end());
// insert this new combination in the set
returnVal.insert(combination);
} while (next_permutation(bVect.begin(), bVect.end()));
return returnVal;
}
int main()
{
std::vector<int> myVect = {1,2,2,3,3,3,3};
// number to select
size_t numToSelect = 3;
// get the subsets
std::set<std::vector<int>> subSets = getSubsets(myVect, numToSelect);
// output the results
for_each(subSets.begin(), subSets.end(), [] (const vector<int>& v)
{ cout << "subset "; copy(v.begin(), v.end(), ostream_iterator<int>(cout, " ")); cout << "\n"; });
}
Live example: http://coliru.stacked-crooked.com/a/beb800809d78db1a
Basically we set up a bool vector and populate a vector with the values that correspond with the position of the true items in the bool vector. Then we sort and insert this into a set. The std::next_permutation shuffles the true values in the bool array around and we just repeat.
Admittedly, not as sophisticated and more than likely slower than the previous answer, but it should do the job.

Find First Missing Element in a vector

This question has been asked before but I cannot find it for C++.
If I have a vector and I have a starting number, does std::algorithm provide me a way to find the next highest missing number?
I can obviously write this in a nested loop, I just cant shake the feeling that I'm reinventing the wheel.
For example, given: vector foo{13,8,3,6,10,1,7,0};
The starting number 0 should find 2.
The starting number 6 should find 9.
The starting number -2 should find -1.
EDIT:
Thus far all the solutions require sorting. This may in fact be required, but a temporary sorted vector would have to be created to accommodate this, as foo must remain unchanged.
At least as far as I know, there's no standard algorithm that directly implements exactly what you're asking for.
If you wanted to do it with something like O(N log N) complexity, you could start by sorting the input. Then use std::upper_bound to find the (last instance of) the number you've asked for (if present). From there, you'd find a number that differs from the previous by more than one. From there you'd scan for a difference greater than 1 between the consecutive numbers in the collection.
One way to do this in real code would be something like this:
#include <iostream>
#include <algorithm>
#include <vector>
#include <numeric>
#include <iterator>
int find_missing(std::vector<int> x, int number) {
std::sort(x.begin(), x.end());
auto pos = std::upper_bound(x.begin(), x.end(), number);
if (*pos - number > 1)
return number + 1;
else {
std::vector<int> diffs;
std::adjacent_difference(pos, x.end(), std::back_inserter(diffs));
auto pos2 = std::find_if(diffs.begin() + 1, diffs.end(), [](int x) { return x > 1; });
return *(pos + (pos2 - diffs.begin() - 1)) + 1;
}
}
int main() {
std::vector<int> x{ 13, 8, 3, 6, 10, 1,7, 0};
std::cout << find_missing(x, 0) << "\n";
std::cout << find_missing(x, 6) << "\n";
}
This is somewhat less than what you'd normally think of as optimal to provide the external appearance of a vector that can/does remain un-sorted (and unmodified in any way). I've done that by creating a copy of the vector, and sorting the copy inside the find_missing function. Thus, the original vector remains unmodified. The disadvantage is obvious: if the vector is large, copying it can/will be expensive. Furthermore, this ends up sorting the vector for every query instead of sorting once, then carrying out as many queries as desired on it.
So I thought I'd post an answer. I don't know anything in std::algorithm that accomplishes this directly, but in combination with vector<bool> you can do this in O(2N).
template <typename T>
T find_missing(const vector<T>& v, T elem){
vector<bool> range(v.size());
elem++;
for_each(v.begin(), v.end(), [&](const T& i){if((i >= elem && i - elem < range.size())range[i - elem] = true;});
auto result = distance(range.begin(), find(range.begin(), range.end(), false));
return result + elem;
}
First you need to sort the vector. Use std::sort for that.
std::lower_bound finds the first element that is greater or equal with a given element. (the elements have to be at least partially ordered)
From there you iterate while you have consecutive elements.
Dealing with duplicates: One way is the way I went: consider consecutive and equal elements when iterating. Another approach is to add a prerequisite that the vector / range contains unique elements. I chose the former because it avoids erasing elements.
Here is how you eliminate duplicates from a sorted vector:
v.erase(std::unique(v.begin(), v.end()), v.end());
My implementation:
// finds the first missing element in the vector v
// prerequisite: v must be sorted
auto firstMissing(std::vector<int> const &v, int elem) -> int {
auto low = std::lower_bound(std::begin(v), std::end(v), elem);
if (low == std::end(v) || *low != elem) {
return elem;
}
while (low + 1 != std::end(v) &&
(*low == *(low + 1) || *low + 1 == *(low + 1))) {
++low;
}
return *low + 1;
}
And a generalized version:
// finds the first missing element in the range [first, last)
// prerequisite: the range must be sorted
template <class It, class T = decltype(*std::declval<It>())>
auto firstMissing(It first, It last, T elem) -> T {
auto low = std::lower_bound(first, last, elem);
if (low == last || *low != elem) {
return elem;
}
while (std::next(low) != last &&
(*low == *std::next(low) || *low + 1 == *std::next(low))) {
std::advance(low, 1);
}
return *low + 1;
}
Test case:
int main() {
auto v = std::vector<int>{13, 8, 3, 6, 10, 1, 7, 7, 7, 0};
std::sort(v.begin(), v.end());
for (auto n : {-2, 0, 5, 6, 20}) {
cout << n << ": " << firstMissing(v, n) << endl;
}
return 0;
}
Result:
-2: -2
0: 2
5: 5
6: 9
20: 20
A note about sorting: From the OP's comments he was searching for a solution that wouldn't modify the vector.
You have to sort the vector for an efficient solution. If modifying the vector is not an option you could create a copy and work on it.
If you are hell-bent on not sorting, there is a brute force solution (very very inefficient - O(n^2)):
auto max = std::max_element(std::begin(v), std::end(v));
if (elem > *max) {
return elem;
}
auto i = elem;
while (std::find(std::begin(v), std::end(v), i) != std::end(v)) {
++i;
}
return i;
First solution:
Sort the vector. Find the starting number and see what number is next.
This will take O(NlogN) where N is the size of vector.
Second solution:
If the range of numbers is small e.g. (0,M) you can create boolean vector of size M. For each number of initial vector make the boolean of that index true. Later you can see next missing number by checking the boolean vector. This will take O(N) time and O(M) auxiliary memory.

What is the fastest way to find longest 'consecutive numbers' streak in vector ?

I have a sorted std::vector<int> and I would like to find the longest 'streak of consecutive numbers' in this vector and then return both the length of it and the smallest number in the streak.
To visualize it for you :
suppose we have :
1 3 4 5 6 8 9
I would like it to return: maxStreakLength = 4 and streakBase = 3
There might be occasion where there will be 2 streaks and we have to choose which one is longer.
What is the best (fastest) way to do this ? I have tried to implement this but I have problems with coping with more than one streak in the vector. Should I use temporary vectors and then compare their lengths?
No you can do this in one pass through the vector and only storing the longest start point and length found so far. You also need much fewer than 'N' comparisons. *
hint: If you already have say a 4 long match ending at the 5th position (=6) and which position do you have to check next?
[*] left as exercise to the reader to work out what's the likely O( ) complexity ;-)
It would be interesting to see if the fact that the array is sorted can be exploited somehow to improve the algorithm. The first thing that comes to mind is this: if you know that all numbers in the input array are unique, then for a range of elements [i, j] in the array, you can immediately tell whether elements in that range are consecutive or not, without actually looking through the range. If this relation holds
array[j] - array[i] == j - i
then you can immediately say that elements in that range are consecutive. This criterion, obviously, uses the fact that the array is sorted and that the numbers don't repeat.
Now, we just need to develop an algorithm which will take advantage of that criterion. Here's one possible recursive approach:
Input of recursive step is the range of elements [i, j]. Initially it is [0, n-1] - the whole array.
Apply the above criterion to range [i, j]. If the range turns out to be consecutive, there's no need to subdivide it further. Send the range to output (see below for further details).
Otherwise (if the range is not consecutive), divide it into two equal parts [i, m] and [m+1, j].
Recursively invoke the algorithm on the lower part ([i, m]) and then on the upper part ([m+1, j]).
The above algorithm will perform binary partition of the array and recursive descent of the partition tree using the left-first approach. This means that this algorithm will find adjacent subranges with consecutive elements in left-to-right order. All you need to do is to join the adjacent subranges together. When you receive a subrange [i, j] that was "sent to output" at step 2, you have to concatenate it with previously received subranges, if they are indeed consecutive. Or you have to start a new range, if they are not consecutive. All the while you have keep track of the "longest consecutive range" found so far.
That's it.
The benefit of this algorithm is that it detects subranges of consecutive elements "early", without looking inside these subranges. Obviously, it's worst case performance (if ther are no consecutive subranges at all) is still O(n). In the best case, when the entire input array is consecutive, this algorithm will detect it instantly. (I'm still working on a meaningful O estimation for this algorithm.)
The usability of this algorithm is, again, undermined by the uniqueness requirement. I don't know whether it is something that is "given" in your case.
Anyway, here's a possible C++ implementation
typedef std::vector<int> vint;
typedef std::pair<vint::size_type, vint::size_type> range;
class longest_sequence
{
public:
const range& operator ()(const vint &v)
{
current = max = range(0, 0);
process_subrange(v, 0, v.size() - 1);
check_record();
return max;
}
private:
range current, max;
void process_subrange(const vint &v, vint::size_type i, vint::size_type j);
void check_record();
};
void longest_sequence::process_subrange(const vint &v,
vint::size_type i, vint::size_type j)
{
assert(i <= j && v[i] <= v[j]);
assert(i == 0 || i == current.second + 1);
if (v[j] - v[i] == j - i)
{ // Consecutive subrange found
assert(v[current.second] <= v[i]);
if (i == 0 || v[i] == v[current.second] + 1)
// Append to the current range
current.second = j;
else
{ // Range finished
// Check against the record
check_record();
// Start a new range
current = range(i, j);
}
}
else
{ // Subdivision and recursive calls
assert(i < j);
vint::size_type m = (i + j) / 2;
process_subrange(v, i, m);
process_subrange(v, m + 1, j);
}
}
void longest_sequence::check_record()
{
assert(current.second >= current.first);
if (current.second - current.first > max.second - max.first)
// We have a new record
max = current;
}
int main()
{
int a[] = { 1, 3, 4, 5, 6, 8, 9 };
std::vector<int> v(a, a + sizeof a / sizeof *a);
range r = longest_sequence()(v);
return 0;
}
I believe that this should do it?
size_t beginStreak = 0;
size_t streakLen = 1;
size_t longest = 0;
size_t longestStart = 0;
for (size_t i=1; i < len.size(); i++) {
if (vec[i] == vec[i-1] + 1) {
streakLen++;
}
else {
if (streakLen > longest) {
longest = streakLen;
longestStart = beginStreak;
}
beginStreak = i;
streakLen = 1;
}
}
if (streakLen > longest) {
longest = streakLen;
longestStart = beginStreak;
}
You can't solve this problem in less than O(N) time. Imagine your list is the first N-1 even numbers, plus a single odd number (chosen from among the first N-1 odd numbers). Then there is a single streak of length 3 somewhere in the list, but worst case you need to scan the entire list to find it. Even on average you'll need to examine at least half of the list to find it.
Similar to Rodrigo's solutions but solving your example as well:
#include <vector>
#include <cstdio>
#define len(x) sizeof(x) / sizeof(x[0])
using namespace std;
int nums[] = {1,3,4,5,6,8,9};
int streakBase = nums[0];
int maxStreakLength = 1;
void updateStreak(int currentStreakLength, int currentStreakBase) {
if (currentStreakLength > maxStreakLength) {
maxStreakLength = currentStreakLength;
streakBase = currentStreakBase;
}
}
int main(void) {
vector<int> v;
for(size_t i=0; i < len(nums); ++i)
v.push_back(nums[i]);
int lastBase = v[0], currentStreakBase = v[0], currentStreakLength = 1;
for(size_t i=1; i < v.size(); ++i) {
if (v[i] == lastBase + 1) {
currentStreakLength++;
lastBase = v[i];
} else {
updateStreak(currentStreakLength, currentStreakBase);
currentStreakBase = v[i];
lastBase = v[i];
currentStreakLength = 1;
}
}
updateStreak(currentStreakLength, currentStreakBase);
printf("maxStreakLength = %d and streakBase = %d\n", maxStreakLength, streakBase);
return 0;
}