QuickSort is slower than std::sort - c++

I have a quick_sort code (ะก++) that looks like this
template< typename BidirectionalIterator, typename Compare >
BidirectionalIterator quick_sort_partition( BidirectionalIterator left, BidirectionalIterator right, Compare cmp ) {
BidirectionalIterator q = left - 1;
std::mt19937 gen(time(0));
std::uniform_int_distribution<int> uid(0, right - left - 1);
int pivot_1 = uid(gen) ;
BidirectionalIterator randomNum = pivot_1 + left;
std::iter_swap( randomNum, right );
bool index = 0;
for (BidirectionalIterator i = left; i < right; i++){
if (*i < *right){
++q;
std::iter_swap( q, i );
}
if (*i == *right){
index = 1 - index;
if(index){
++q;
std::iter_swap( q, i );
}
}
}
++q;
std::iter_swap( q, right );
return q;
}
template< typename BidirectionalIterator, typename Compare >
void quick_sort( BidirectionalIterator first, BidirectionalIterator last, Compare cmp ) {
if (first < last){
BidirectionalIterator q = quick_sort_partition(first, last, cmp);
quick_sort(first, q - 1, cmp);
quick_sort(q + 1, last, cmp);
}
}
but he is slower(more 6 times) than std::sort on big tests.
Any ideas why?
How to optimize my code for good job?

Your QuickSort implementation is pretty vanilla. It does use random pivot selection, which ensures that there are no "killer" inputs that cause performance degredation, so that's better than the absolute basic QuickSort.
There are a number of optimizations that might be used, among them:
It is typical for "Quick Sort" to in fact be implemented as a hybrid sort that falls back to (say) Insertion Sort for partitions smaller than some fixed threshold. Once you get to small partitions, the overhead of Quick Sort tends to overcome its asymptotic complexity advantages.
Maximum recursion depth can be minimized and function call overhead can be reduced by switching to a hybrid recursive / iterative approach, wherein upon each partitioning, the smaller sub-array is sorted recursively, but the code just loops to sort the larger one.
When partitioning, you can reduce the number of swaps performed by finding pairs of elements for which a swap puts both in the correct sub-partition, and swapping those, instead of alternating between swapping into one sub-partition and swapping into the other.
It would probably help to come up with a way to reuse the same random number source throughout the sort, instead of instantiating a new one upon every partitioning.

Related

stable_partition and getting O(nlogn) swaps

In en.cppreference.com, we see that std::stable_partition perform O(n) swaps, if we are allowed to use extra memory. This, I can see. Every time we find an element in the range where our predicate is false, we swap that into another buffer. In the end, we can just copy this additional buffer to the end of our successful portion. [I also assume, in this case, stable_partition can be implemented with only Forward iterators]
What I don't get is, link says stable_partition performs O(nlogn) swaps, at most, if we are not allowed to use additional memory. Here is my attempt.
#include <utility>
namespace cho {
template <typename BidirIt, typename Pred>
BidirIt stable_partition(BidirIt begin, BidirIt end, Pred p) {
BidirIt next_p = begin;
for(BidirIt it = begin; it != end; ++it) {
if(it == begin && p(*it)) {
next_p++;
continue;
}
if(p(*it)) {
std::swap(*next_p++, *it);
// riplle back swapped element to keep stability
BidirIt cur = it;
do {
BidirIt prev = --cur; cur++;
std::swap(*cur--, *prev--);
} while(cur != next_p);
}
}
return next_p;
}
template <typename ForwardIt, typename Pred>
ForwardIt partition(ForwardIt begin, ForwardIt end, Pred p) {
ForwardIt next_p = begin;
for(ForwardIt it = begin; it != end; ++it) {
if(p(*it)) {
std::swap(*next_p++, *it);
}
}
return next_p;
}
}
In this case, I ripple back after the swap. So, if the distance between two successive true cases is k, I will perform k swaps. I think the worst case occurs, for my algorithm, when the range is reverse partitioned. If there are p items where predicate is false and n-p items where predicate is true, I will get O((n - p) * p) swaps. I thought about this and I could not see how can I get worst case O(nlogn).
Implementations in LLVM, I checked but could not really get how O(nlogn) swaps are achieved.
PS: My implementation might be wrong. I tested it with couple of inputs but that's it.
Think recursively.
If both left half and right half are stable-partitioned, as in
0...01...10...01...1
b m e
the only remaining operation is to rotate the b, e range bringing m to where b were. It will take O(n) swaps. Now think recursively, and stable partition both halves. There will be O(log n) levels of recursion, totaling O(n log n) swaps. In broad strokes,
iter stable_partition(begin, end) {
if (end - begin < 2)
return;
iter mid = begin + (end - begin) / 2;
iter left_break = stable_partition(begin, mid);
iter right_break = stable_partition(mid, end);
return rotate(left_break, mid, right_break);
}
Of course, you have to think carefully what rotate should return.
I don't know C++ so I won't be able to write it for you, but it seems quite trivial to do if you have a stable sort implementation available. Well, this implementation would also need to sort in-place since you have the requirement not to use any additional memory. Provided that there is such a sort implementation out there, just sort the elements according to the following order relationship:
R(x, y) = 0 if p(x) == p(y)
R(x, y) = -1 if p(x) && !p(y)
R(x, y) = 1 if !p(x) && p(y)
Out of interest, which sorting algorithms would be suitable for this? Turns out there doesn't seem to be too many of them that tick all the boxes, see here.

Fastest way to find out if two sets overlap?

Obviously doing std::set_intersection() is a waste of time.
Isn't there a function in the algorithm header for doing exactly this?
std::find_first_of() is doing a linear search as far as I understand.
This is a solution only for std::set (or multi). A solution for map would require only a bit more work.
I try it 3 ways.
First, if one is far larger than the other, I simply look for all of the elements of one in the other. Then vice versa.
The constant 100 is wrong theoretically. It should be k n lg m > m for some k, not 100 n > m for optimal big-O performance: but the constant factor is large, and 100>lg m, so really one should experiment.
If that isn't the case, we walk through each collection looking for collisions much like set_intersection. Instead of just ++, we use .lower_bound to try to skip through each list faster.
Note that if your list consists of interleaved elements (like {1,3,7} and {0,2,4,6,8}) this will be slower than just ++ by a logarithmic factor.
If the two sets "cross" each other less often, this can skip over large amounts of each set's contents.
Replace the lower_bound portion with a mere ++ if you want to compare the two behaviors.
template<class Lhs, class Rhs>
bool sorted_has_overlap( Lhs const& lhs, Rhs const& rhs ) {
if (lhs.empty() || rhs.empty()) return false;
if (lhs.size() * 100 < rhs.size()) {
for (auto&& x:lhs)
if (rhs.find(x)!=rhs.end())
return true;
return false;
}
if (rhs.size() * 100 < lhs.size()) {
for(auto&& x:rhs)
if (lhs.find(x)!=lhs.end())
return true;
return false;
}
using std::begin; using std::end;
auto lit = begin(lhs);
auto lend = end(lhs);
auto rit = begin(rhs);
auto rend = end(rhs);
while( lit != lend && rit != rend ) {
if (*lit < *rit) {
lit = lhs.lower_bound(*rit);
continue;
}
if (*rit < *lit) {
rit = rhs.lower_bound(*lit);
continue;
}
return true;
}
return false;
}
a sorted array could do the 3rd choice of algorithm and use std::lower_bound to do fast advance of the "other" container. This has the advantage of using partial searches (which you cannot do fast in a set). It will also behave poorly on "interleaved" elements (by a log n factor) compared to naive ++.
The first two can also be done fast with sorted arrays, replacing method calls with calls to algorithms in std. Such a transformation is basically mechanical.
An asymptotically optimal version on a sorted array would use a binary search biased towards finding lower bounds at the start of the list -- search at 1, 2, 4, 8, etc instead of at half, quarters, etc. Note that this has the same lg(n) worst case, but is O(1) if the searched for element is first instead of O(lg(n)). As that case (where the search advances less) means less global progress is made, optimizing the sub-algorithm for that case gives you a better global worst-case speed.
To get why, on "fast alternation" it wouldn't perform any worse than ++ -- the case where the next element is a sign swap takes O(1) operations, and it replaces O(k) with O(lg k) if the gap is larger.
However, by this point we are far, far down an optimization hole: profile, and determine if it is worth it before proceeding this way.
Another approach on sorted arrays is to presume that std::lower_bound is written optimally (on random access iterators). Use an output iterator that throws an exception if written to. Return true iff you catch that exception, false otherwise.
(the optimizations above -- pick one and bin search the other, and exponential advance searching -- may be legal for a std::set_intersection.)
I think the use of 3 algorithms is important. Set intersection testing where one side is much smaller that the other is probably common: the extreme case of one element on one side, and many on the other is very well known (as a search).
A naive 'double linear' search gives you up to linear performance in that common case. By detecting the assymmetry between sides, you can switch over to 'linear in small, log in large' at an opportune point, and have the much better performance in those cases. O(n+m) vs O(m lg n) -- if m < O(n/lg n) the second beats the first. If m is a constant, then we get O(n) vs O(lg n) -- which includes the edge case of 'use function to find if a single element is in some large collection'.
You can use the following template function if the inputs are sorted:
template<class InputIt1, class InputIt2>
bool intersect(InputIt1 first1, InputIt1 last1, InputIt2 first2, InputIt2 last2)
{
while (first1 != last1 && first2 != last2) {
if (*first1 < *first2) {
++first1;
continue;
}
if (*first2 < *first1) {
++first2;
continue;
}
return true;
}
return false;
}
You can use like this:
#include <iostream>
int main() {
int a[] = {1, 2, 3};
int b[] = {3, 4};
int c[] = {4};
std::cout << intersect(a, a + 3, b, b + 2) << std::endl;
std::cout << intersect(b, b + 2, c, c + 1) << std::endl;
std::cout << intersect(a, a + 3, c, c + 1) << std::endl;
}
Result:
1
1
0
This function has complexity O(n + m) where n, m are input sizes. But if one input is very small to the other (n << m for example), it's better to check each of the n elements with binary search if it belongs to the other input. This gives O(n * log(m)) time.
#include <algorithm>
template<class InputIt1, class InputIt2>
/**
* When input1 is much smaller that input2
*/
bool intersect(InputIt1 first1, InputIt1 last1, InputIt2 first2, InputIt2 last2) {
while (first1 != last1)
if (std::binary_search(first2, last2, *first1++))
return true;
return false;
}
Sometimes you can encode sets of numbers in a single memory word. For example, you could encode the set {0,2,3,6,7} in the memory word: ...00000011001101. The rule is: the bit at position i (reading from right to left) is up, if and only if the number i is in the set.
Now if you have two sets, encoded in the memory words a and b, you can perform the intersection using the bitwise operator &.
int a = ...;
int b = ...;
int intersection = a & b;
int union = a | b; // bonus
The good thing of this style, is that the intersection ( union, complementation ) is performed in one cpu instruction (I don't know if this is the correct term).
You could use more than one memory word, if you need to handle numbers that are greater than the number of bits of a memory word. Normally, I use an array of memory words.
If you want handle negative numbers, just use two arrays, one for negative numbers, and one for positive numbers.
The bad thing of this method, is that it works only with integers.
I think you can make a binary_search
#include <set>
#include <iostream>
#include <algorithm>
bool overlap(const std::set<int>& s1, const std::set<int>& s2)
{
for( const auto& i : s1) {
if(std::binary_search(s2.begin(), s2.end(), i))
return true;
}
return false;
}
int main()
{
std::set<int> s1 {1, 2, 3};
std::set<int> s2 {3, 4, 5, 6};
std::cout << overlap(s1, s2) << '\n';
}

Find largest and second largest element in a range

How do I find the above without removing the largest element and searching again? Is there a more efficient way to do this? It does not matter if the these elements are duplicates.
for (e: all elements) {
if (e > largest) {
second = largest;
largest = e;
} else if (e > second) {
second = e;
}
}
You could either initialize largest and second to an appropriate lower bound, or to the first two items in the list (check which one is bigger, and don't forget to check if the list has at least two items)
using partial_sort ?
std::partial_sort(aTest.begin(), aTest.begin() + 2, aTest.end(), Functor);
An Example:
std::vector<int> aTest;
aTest.push_back(3);
aTest.push_back(2);
aTest.push_back(4);
aTest.push_back(1);
std::partial_sort(aTest.begin(), aTest.begin()+2,aTest.end(), std::greater<int>());
int Max = aTest[0];
int SecMax = aTest[1];
nth_element(begin, begin+n,end,Compare) places the element that would be nth (where "first" is "0th") if the range [begin, end) were sorted at position begin+n and makes sure that everything from [begin,begin+n) would appear before the nth element in the sorted list. So the code you want is:
nth_element(container.begin(),
container.begin()+1,
container.end(),
appropriateCompare);
This will work well in your case, since you're only looking for the two largest. Assuming your appropriateCompare sorts things from largest to smallest, the second largest element with be at position 1 and the largest will be at position 0.
Lets assume you mean to find the two largest unique values in the list.
If the list is already sorted, then just look at the second last element (or rather, iterate from the end looking for the second last value).
If the list is unsorted, then don't bother to sort it. Sorting is at best O(n lg n). Simple linear iteration is O(n), so just loop over the elements keeping track:
v::value_type second_best = 0, best = 0;
for(v::const_iterator i=v.begin(); i!=v.end(); ++i)
if(*i > best) {
second_best = best;
best = *i;
} else if(*i > second_best) {
second_best = *i;
}
There are of course other criteria, and these could all be put into the test inside the loop. However, should you mean that two elements that both have the same largest value should be found, you have to consider what happens should three or more elements all have this largest value, or if two or more elements have the second largest.
The optimal algorithm shouldn't need more than 1.5 * N - 2 comparisons. (Once we've decided that it's O(n), what's the coefficient in front of N? 2 * N comparisons is less than optimal).
So, first determine the "winner" and the "loser" in each pair - that's 0.5 * N comparisons.
Then determine the largest element by comparing winners - that's another 0.5 * N - 1 comparisons.
Then determine the second-largest element by comparing the loser of the pair where the largest element came from against the winners of all other pairs - another 0.5 * N - 1 comparisons.
Total comparisons = 1.5 N - 2.
The answer depends if you just want the values, or also iterators pointing at the values.
Minor modification of #will answer.
v::value_type second_best = 0, best = 0;
for(v::const_iterator i=v.begin(); i!=v.end(); ++i)
{
if(*i > best)
{
second_best = best;
best = *i;
}
else if (*i > second_best)
{
second_best = *i;
}
}
Create a sublist from n..m, sort it descending. Then grab the first two elements. Delete these elements from the orginal list.
You can scan the list in one pass and save the 1st and 2nd values, that has a O(n) efficiency while sorting is O(n log n).
EDIT:
I think that partial sort is O(n log k)
Untested but fun:
template <typename T, int n>
class top_n_functor : public unary_function<T, void>
{
void operator() (const T& x) {
auto f = lower_bound(values_.begin(), values_.end(), x);
if(values_.size() < n) {
values_.insert(f, x);
return;
}
if(values_.begin() == f)
return;
auto removed = values_.begin();
values_.splice(removed, values_, removed+1, f);
*removed = x;
}
std::list<T> values() {
return values_;
}
private:
std::list<T> values_;
};
int main()
{
int A[] = {1, 4, 2, 8, 5, 7};
const int N = sizeof(A) / sizeof(int);
auto vals = for_each(A, A + N, top_n_functor<int,2>()).values();
cout << "The top is " << vals.front()
<< " with second place being " << *(vals.begin()+1) << endl;
}
If the largest is the first element, search for the second largest in [largest+1,end). Otherwise search in [begin,largest) and [largest+1,end) and take the maximum of the two. Of course, this has O(2n), so it's not optimal.
If you have random-access iterators, you could do as quick sort does and use the ever-elegant recursion:
template< typename T >
std::pair<T,T> find_two_largest(const std::pair<T,T>& lhs, const std::pair<T,T>& rhs)
{
// implementation finding the two largest of the four values left as an exercise :)
}
template< typename RAIter >
std::pair< typename std::iterator_traits<RAIter>::value_type
, typename std::iterator_traits<RAIter>::value_type >
find_two_largest(RAIter begin, RAIter end)
{
const ptr_diff_t diff = end-begin;
if( diff < 2 )
return std::make_pair(*begin, *begin);
if( diff < 3 )
return std::make_pair(*begin, *begin+1);
const RAIter middle = begin + (diff)/2;
typedef std::pair< typename std::iterator_traits<RAIter>::value_type
, typename std::iterator_traits<RAIter>::value_type >
result_t;
const result_t left = find_two_largest(begin,middle);
const result_t right = find_two_largest(middle,end);
return find_two_largest(left,right);
}
This has O(n) and shouldn't make more comparisons than NomeN's implementation.
top k is usually a bit better than n(log k)
template <class t,class ordering>
class TopK {
public:
typedef std::multiset<t,ordering,special_allocator> BEST_t;
BEST_t best;
const size_t K;
TopK(const size_t k)
: K(k){
}
const BEST_t& insert(const t& item){
if(best.size()<k){
best.insert(item);
return best;
}
//k items in multiset now
//and here is why its better - because if the distribution is random then
//this and comparison above are usually the comparisons that is done;
if(compare(*best.begin(),item){//item better than worst
erase(begin());//the worst
best.insert(item); //log k-1 average as only k-1 items in best
}
return best;
}
template <class it>
const BEST_t& insert(it i,const it last){
for(;i!=last;++i){
insert(*i);
}
return best;
}
};
Of course the special_allocator can in essence be just an array of k multiset value_types and a list of those nodes (which typically has nothing on it as the other k are in use in the multiset until its time to put a new one in and we erase and then immediate ly reuse it. Good to have this or else the memory alloc/free in std::multiset and the cache line crap kills ya. Its a (very) tiny bit of work to give it static state without violating STL allocator rules.
Not as good as a specialized algo for exactly 2 but for fixed k<<n, I would GUESS (2n+delta*n) where delta is small - my DEK ACP vol3 S&S is packed away and an estimate on delta is a bit more work that I want to do.
average worst is I would guess n(log(k-1) + 2) when in opposite order and all distinct.
best is 2n + k(log k) for the k best being the first
I think you could implement a custom array and overload the indexed get/set methods of elements. Then on every set call, compare the new value with two fields for the result. While this makes setter slower, it benefits from caching or even registers. Then its a no op to get the result. This must be faster if you populate array only once per finding maximums. But if array is modified frequently, then it is slower.
If array is used in vectorized loops, then it gets harder to implement as you have to use avx/sse optimized max methods inside setter.

Is it possible to write a function like next_permutation but that only permutes r values, instead of n?

std::next_permutation (and std::prev_permutation) permute all values in the range [first, last) given for a total of n! permutations (assuming that all elements are unique).
is it possible to write a function like this:
template<class Iter>
bool next_permutation(Iter first, Iter last, Iter choice_last);
That permutes the elements in the range [first, last) but only chooses elements in the range [first, choice_last). ie we have maybe 20 elements and want to iterate through all permutations of 10 choices of them, 20 P 10 options vs 20 P 20.
Iter is a random access iterator for my purposes, but if it can be implemented as a bidirectional iterator, then great!
The less amount of external memory needed the better, but for my purposes it doesn't matter.
The chosen elements on each iteration are input to the first elements of the sequence.
Is such a function possible to implement? Does anyone know of any existing implementations?
Here is essentially what I am doing to hack around this. Suggestions on how to improve this are also welcome.
Start with a vector V of N elements of which I want to visit each permutation of R elements chosen from it (R <= N).
Build a vector I of length R with values { 0, 1, 2, ... R - 1 } to serve as an index to the elements of V
On each iteration, build a vector C of length R with values { V[I[0]], V[I[1]], ... V[I[R - 1]] }
Do something with the values in C.
Apply a function to permute the elements of I and iterate again if it was able to.
That function looks like this:
bool NextPermutationIndices(std::vector<int> &I, int N)
{
const int R = I.size();
for (int i = R - 1; ; --i) {
if (I[i] < N - R + i) {
++I[i];
return true;
}
if (i == 0)
return false;
if (I[i] > I[i-1] + 1) {
++I[i-1];
for (int j = i; j < R; ++j)
I[j] = I[j-1] + 1;
return true;
}
}
}
That function is very complicated due to all the possible off-by-one errors, as well everything using it are more complicated than is probably necessary.
EDIT:
It turns out that it was significantly easier than I had even imagined. From here, I was able to find exact implementations of many of the exact algorithms I needed (combinations, permutations, etc.).
template<class BidirectionalIterator>
bool next_partial_permutation(BidirectionalIterator first,
BidirectionalIterator middle,
BidirectionalIterator last)
{
std::reverse(middle, last);
return std::next_permutation(first, last);
}
Plus there is a combination algorithm there that works in a similar way. The implementation of that is much more complication though.
To iterate over nPk permutations, I've used the for_each_permutation() algorithm presented in this old CUJ article before. It uses a nice algorithm from Knuth which rotates the elements in situ, leaving them in the original order at the end. Therefore, it meets your no external memory requirement. It also works for BidirectionalIterators. It doesn't meet your requirement of looking like next_permutation(). However, I think this is a win - I don't like stateful APIs.
Source code for a Java combination generator is at http://www.merriampark.com/comb.htm. Strip out the Java idioms, and it's almost exactly what you're looking for, implemented as a generator to keep a lid on your memory usage.
This problem is from the mathematical field known as Combinatorics, which is part of Discrete mathematics. Discrete math is crucial to practitioners of computer science, as it includes nearly all of the math we use daily (like logic, algorithms, counting, relations, graph theory, etc.). I highly recommend Discrete and Combinatorial Mathematics: An applied introduction or
Discrete Mathematics and Its Applications, if you can afford it.
(Note: this question is related to "Algorithm for Grouping," but not quite a duplicate since this question asks to solve it in the general case.)
An algorithmic simplification would be to split this into two separate steps.
Generate a list of all possible selections of R elements out of the original data.
For each of those selections, create all possible permutations of the selected elements.
By interleaving those operations, you can avoid allocating the intermediate lists.
Selection can be implemented on a bidirectional iterator by skipping over non-selected items. Generate all selections, e.g. by permuting a sequence of R ones and (N-R) zeroes. This will need O(N) additional memory, but enables you to permute the original sequence in place.
For what its worth, here is an implementation that sort of works.
It requires that the elements above choice start in sorted order. It only works if there are no duplicate elements in the sequence (if there are, it misses some permutations, and doesn't end in the correct perumtation). It also might be missing some edge cases as I didn't really test it thoroughly as I have no plans on actually using it.
One benefit of this way over this answer's, is that way doesn't visit permutations in lexicographical order, which may (but probably not) be important. It also is kind of a pain to use boost::bind sometimes to create a functor to pass to for_each.
template<class Iter>
bool next_choice_permutation(Iter first, Iter choice, Iter last)
{
if (first == choice)
return false;
Iter i = choice;
--i;
if (*i < *choice) {
std::rotate(i, choice, last);
return true;
}
while (i != first) {
Iter j = i;
++j;
std::rotate(i, j, last);
--i;
--j;
for (; j != last; ++j) {
if (*i < *j)
break;
}
if (j != last) {
std::iter_swap(i, j);
return true;
}
}
std::rotate(first, ++Iter(first), last);
return false;
}

Finding gaps in sequence of numbers

I have a std::vector containing a handful of numbers, which are not in any particular order, and may or may not have gaps between the numbers - for example, I may have { 1,2,3, 6 } or { 2,8,4,6 } or { 1, 9, 5, 2 }, etc.
I'd like a simple way to look at this vector and say 'give me the lowest number >= 1 which does not appear in the vector'. So,
for the three examples above, the answers would be 4, 1 and 3 respectively.
It's not performance critical, and the list is short so there aren't any issues about copying the list and sorting it, for example.
I am not really stuck for a way to do this, but my STL skills are seriously atrophied and I can feel that I'm about to do something inelegant - I would be interested to see what other people came up with.
The standard algorithm you are looking for is std::adjacent_find.
Here is a solution that also uses a lambda to make the predicate clean:
int first_gap( std::vector<int> vec )
{
// Handle the special case of an empty vector. Return 1.
if( vec.empty() )
return 1;
// Sort the vector
std::sort( vec.begin(), vec.end() );
// Find the first adjacent pair that differ by more than 1.
auto i = std::adjacent_find( vec.begin(), vec.end(), [](int l, int r){return l+1<r;} );
// Handle the special case of no gaps. Return the last value + 1.
if ( i == vec.end() )
--i;
return 1 + *i;
}
The checked answer uses < for comparison. != is much simpler:
int find_gap(std::vector<int> vec) {
std::sort(vec.begin(), vec.end());
int next = 1;
for (std::vector<int>::iterator it = vec.begin(); it != vec.end(); ++it) {
if (*it != next) return next;
++next;
}
return next;
}
find_gap(1,2,4,5) = 3
find_gap(2) = 1
find_gap(1,2,3) = 4
I'm not passing a reference to the vector since a) he said time doesn't matter and b) so I don't change the order of the original vector.
Sorting the list and then doing a linear search seems the simplest solution. Depending on the expected composition of the lists you could use a less general purpose sorting algorithm, and if you implement the sort yourself you could keep track of data during the sort that could be used to speed up (or eliminate entirely) the search step. I do not think there is any particularly elegant solution to this problem
You could allocate a bit vector (of the same length as the input vector), initialize it to zero, then mark all indices that occur (note that numbers larger than the length can be ignored). Then, return the first unmarked index (or the length if all indices are marked, which only happens if all indices occur exactly once in the input vector).
This should be asymptotically faster than sort and search. It will use more memory than sorting if you are allowed to destroy the original, but less memory than sorting if you must preserve the original.
Actually, if you do a bubble sort (you know... the one that they teach you first and then tell you to never use again...), you will be able to spot the first gap early in the sorting process, so you can stop there. That should give you the fastest overall time.
Sort-n-search:
std::sort(vec.begin(), vec.end());
int lowest = 1;
for(size_t ii = 1; ii < vec.size(); ++ii)
{
if (vec[ii - 1] + 1 < vec[ii])
{
lowest = (vec[ii - 1] + 1);
break;
}
}
/* 1, 2, ..., N case */
if (lowest == vec[0]) lowest = (*vec.back()) + 1;
Iterators could be used with just as clear intent as showcased in #joe_mucchiello's (ed: better) answer.
OK, here's my 2 cents. Assume you've got a vector of length N.
If N<=2 you can check directly
First, use min_element to get the smallest element, remember it as emin
Call nth_element to get the element at N/2, call it ehalf
If ehalf != emin+N/2 there's a gap to the left, apply this method recursively there by calling nth_element on the whole array but asking for element N/4. Otherwise, recurse on the right asking for element 3*N/4.
This should be slightly better than sorting completely up front.
you could go with something like....
struct InSequence
{
int _current; bool insequence;
InSequence() : _current(1), insequence(true){}
bool operator()(int x) {
insequence = insequence ? (x == _current) : false;
_current++;
return insequence;
}
};
int first_not_in_sequence(std::vector<int>& v)
{
std::sort(v.begin(), v.end());
return 1+std::count_if(v.begin(), v.end(),InSequence());
}
A possible implementation of Thomas Kammeyer's answer
I found Thomas' approach really smart and useful - since some of us dream in code and I find the actual implementation a bit tricky I wanted to provide some ready-to-use code.
The solution presented here is as generic as possible:
No assumption is made on the type of container or range except their iterators must meet the requirements of ValueSwappable and RandomAccessIterator (due to partial sorting with nth_element)
Any number type can be used - the required traits are outlined below
Another improvement I think is that a no-gap condition can be checked early: since we have to scan for the minimum anyway we can also scan for the maximum at the same time and then determine whether the number range even contains a gap worth finding.
Last but not least the same recursive approach can be adapted for sorted ranges! If you encode in a template value parameter whether the range is already sorted, you can simply skip the partial sorting plus make determining minimum/maximum elements a no-op.
#include <type_traits>
#include <iterator>
#include <tuple>
#include <utility>
#include <algorithm>
#include <cstddef>
// number type must be:
// * arithmetic
// * subtractable (a - b)
// * divisible by 2 (a / 2)
// * incrementable (++a)
// * less-than-comparable (a < b)
// * default-constructible (A{})
// * copy-constructible
// * value-constructible (A(n))
// * unsigned or number range must only contain values >0
/** Find lowest gap value in a range */
template<typename Range>
typename std::remove_reference_t<Range>::value_type
lowest_gap_value_unsorted(Range&& r)
{
static_assert(!std::is_lvalue_reference_v<Range> && !std::is_const_v<Range>, "lowest_gap_value_unsorted requires a modifiable copy of the passed range");
return lowest_gap_value_unsorted(std::begin(r), std::end(r), std::size(r));
}
/** Find lowest gap value in a range with specified size */
template<typename Range>
typename std::remove_reference_t<Range>::value_type
lowest_gap_value_unsorted(Range&& r, std::size_t N)
{
static_assert(!std::is_lvalue_reference_v<Range> && !std::is_const_v<Range>, "lowest_gap_value_unsorted requires a modifiable copy of the passed range");
return lowest_gap_value_unsorted(std::begin(r), std::end(r), N);
}
/** Find lowest gap value in an iterator range */
template<typename Iterator>
typename std::iterator_traits<Iterator>::value_type
lowest_gap_value_unsorted(Iterator first, Iterator last)
{
return lowest_gap_value_unsorted(first, last, std::distance(first, last));
}
/** Find lowest gap value in an iterator range with specified size */
template<typename Iterator>
typename std::iterator_traits<Iterator>::value_type
lowest_gap_value(Iterator first, Iterator last, std::size_t N)
{
typedef typename std::iterator_traits<Iterator>::value_type Number;
if (bool empty = last == first)
return increment(Number{});
Iterator minElem, maxElem;
std::tie(minElem, maxElem) = std::minmax_element(first, last);
if (bool contains0 = !(Number{} < *minElem))
throw std::logic_error("Number range must not contain 0");
if (bool missing1st = increment(Number{}) < *minElem)
return increment(Number{});
if (bool containsNoGap = !(Number(N) < increment(*maxElem - *minElem)))
return increment(*maxElem);
return lowest_gap_value_unsorted_recursive(first, last, N, *minElem);
}
template<typename Iterator>
typename std::iterator_traits<Iterator>::value_type
lowest_gap_value_unsorted_recursive(Iterator first, Iterator last, std::size_t N, typename std::iterator_traits<Iterator>::value_type minValue)
{
typedef typename std::iterator_traits<Iterator>::value_type Number;
if (N == 1)
return ++minValue;
if (N == 2)
{
// determine greater of the 2 remaining elements
Number maxValue = !(minValue < *first) ? *std::next(first) : *first;
if (bool gap = ++minValue < maxValue)
return minValue;
else
return ++maxValue;
}
Iterator medianElem = std::next(first, N / 2);
// sort partially
std::nth_element(first, medianElem, last);
if (bool gapInLowerHalf = (Number(N) / 2 < *medianElem - minValue))
return lowest_gap_value_unsorted_recursive(first, medianElem, N / 2, minValue);
else
return lowest_gap_value_unsorted_recursive(medianElem, last, N / 2 + N % 2, *medianElem);
};
template<typename T>
T increment(T v)
{
return ++v;
}