Find duplicate in unsorted array with best time Complexity - c++

I know there were similar questions, but not of such specificity
Input: n-elements array with unsorted emelents with values from 1 to (n-1).
one of the values is duplicate (eg. n=5, tab[n] = {3,4,2,4,1}.
Task: find duplicate with best Complexity.
I wrote alghoritm:
int tab[] = { 1,6,7,8,9,4,2,2,3,5 };
int arrSize = sizeof(tab)/sizeof(tab[0]);
for (int i = 0; i < arrSize; i++) {
tab[tab[i] % arrSize] = tab[tab[i] % arrSize] + arrSize;
}
for (int i = 0; i < arrSize; i++) {
if (tab[i] >= arrSize * 2) {
std::cout << i;
break;
}
but i dont think it is with best possible Complexity.
Do You know better method/alghoritm? I can use any c++ library, but i don't have any idea.
Is it possible to get better complexity than O(n) ?

In terms of big-O notation, you cannot beat O(n) (same as your solution here). But you can have better constants and simpler algorithm, by using the property that the sum of elements 1,...,n-1 is well known.
int sum = 0;
for (int x : tab) {
sum += x;
}
duplicate = sum - ((n*(n-1)/2))
The constants here will be significntly better - as each array index is accessed exactly once, which is much more cache friendly and efficient to modern architectures.
(Note, this solution does ignore integer overflow, but it's easy to account for it by using 2x more bits in sum than there are in the array's elements).

Adding the classic answer because it was requested. It is based on the idea that if you xor a number with itself you get 0. So if you xor all numbers from 1 to n - 1 and all numbers in the array you will end up with the duplicate.
int duplicate = arr[0];
for (int i = 1; i < arr.length; i++) {
duplicate = duplicate ^ arr[i] ^ i;
}

Don't focus too much on asymptotic complexity. In practice the fastest algorithm is not necessarily the one with lowest asymtotic complexity. That is because constants are not taken into account: O( huge_constant * N) == O(N) == O( tiny_constant * N).
You cannot inspect N values in less than O(N). Though you do not need a full pass through the array. You can stop once you found the duplicate:
#include <iostream>
#include <vector>
int main() {
std::vector<int> vals{1,2,4,6,5,3,2};
std::vector<bool> present(vals.size());
for (const auto& e : vals) {
if (present[e]) {
std::cout << "duplicate is " << e << "\n";
break;
}
present[e] = true;
}
}
In the "lucky case" the duplicate is at index 2. In the worst case the whole vector has to be scanned. On average it is again O(N) time complexity. Further it uses O(N) additional memory while yours is using no additional memory. Again: Complexity alone cannot tell you which algorithm is faster (especially not for a fixed input size).
No matter how hard you try, you won't beat O(N), because no matter in what order you traverse the elements (and remember already found elements), the best and worst case are always the same: Either the duplicate is in the first two elements you inspect or it's the last, and on average it will be O(N).

Related

Complexity of function with array having even and odds numbers separate

So i have an array which has even and odds numbers in it.
I have to sort it with odd numbers first and then even numbers.
Here is my approach to it:
int key,val;
int odd = 0;
int index = 0;
for(int i=0;i<max;i++)
{
if(arr[i]%2!=0)
{
int temp = arr[index];
arr[index] = arr[i];
arr[i] = temp;
index++;
odd++;
}
}
First I separate even and odd numbers then I apply sorting to it.
For sorting I have this code:
for (int i=1; i<max;i++)
{
key=arr[i];
if(i<odd)
{
val = 0;
}
if(i>=odd)
{
val = odd;
}
for(int j=i; j>val && key < arr[j-1]; j--)
{
arr[j] = arr[j-1];
arr[j-1] = key;
}
}
The problem i am facing is this i cant find the complexity of the above sorting code.
Like insertion sort is applied to first odd numbers.
When they are done I skip that part and start sorting the even numbers.
Here is my approach for sorting if i have sorted array e.g: 3 5 7 9 2 6 10 12
complexity table
How all this works?
in first for loop i traverse through the loop and put all the odd numbers before the even numbers.
But since it doesnt sort them.
in next for loop which has insertion sort. I basically did is only like sorted only odd numbers first in array using if statement. Then when i == odd the nested for loop then doesnt go through all the odd numbers instead it only counts the even numbers and then sorts them.
I'm assuming you know the complexity of your partitioning (let's say A) and sorting algorithms (let's call this one B).
You first partition your n element array, then sort m element, and finally sort n - m elements. So the total complexity would be:
A(n) + B(m) + B(n - m)
Depending on what A and B actually are you should probably be able to simplify that further.
Edit: Btw, unless the goal of your code is to try and implement partitioning/sorting algorithms, I believe this is much clearer:
#include <algorithm>
#include <iterator>
template <class T>
void partition_and_sort (T & values) {
auto isOdd = [](auto const & e) { return e % 2 == 1; };
auto middle = std::partition(std::begin(values), std::end(values), isOdd);
std::sort(std::begin(values), middle);
std::sort(middle, std::end(values));
}
Complexity in this case is O(n) + 2 * O(n * log(n)) = O(n * log(n)).
Edit 2: I wrongly assumed std::partition keeps the relative order of elements. That's not the case. Fixed the code example.

O(NLogN) shows better performance than O(N) if unordered_set is used

//Time sorting O(nlogn) + binary search for N items logN = 2NLogN =
//Time: O(NLogN).
//space - O(1).
bool TwoSum::TwoSumSortAndBinarySearch(int* arr, int size, int sum)
{
sort(arr, arr + size);
for (int i = 0; i < size; i++)
{
if (binary_search(arr + i + 1, arr + size, sum - arr[i]))
return true;
}
return false;
}
//Time: O(N) as time complexity of Add and Search in hashset/unordered_set is O(1).
//Space: O(N)
bool TwoSum::TwoSumHashSet(int* arr, int size, int sum)
{
unordered_set<int> hash;
for (int i = 0; i < size; i++)
{
if (hash.find(sum - arr[i]) != hash.end())
return true;
hash.insert(arr[i]);
}
return false;
}
int* TwoSum::Testcase(int size)
{
int* in = new int[size];
for (int i = 0; i < size; i++)
{
in[i] = rand() % (size + 1);//random number b/w 0 to N.
}
return in;
}
int main()
{
int size = 5000000;
int* in = TwoSum::Testcase(size);
auto start = std::chrono::system_clock::now();//clock start
bool output = TwoSum::TwoSumHashSet(in, size, INT_MAX);
auto end = std::chrono::system_clock::now();//clock end
std::chrono::duration<double> elapsed_seconds = end - start;
cout << "elapsed time: " << elapsed_seconds.count() << "s\n";
}
I measured the performance of the above two methods, where I would like to find the TwoSum problem.
In the First approach, I am sorting the array then using binary search.
Time: O(NLogN).
space - O(1).
In the second approach, unordered_set is used whose complexity is constant on average, worst case linear in the size of the container.
//Time: O(N) as time complexity of Add and Search in hashset/unordered_set is O(1).
//Space: O(N)
Here are the three runs time taken by these two methods
TwoSumSortAndBinarySearch---------------TwoSumHashSet
8.05---------------------------------------15.15
7.76---------------------------------------14.47
7.74---------------------------------------14.28
So, it is clear that TwoSumSortAndBinarySearch performs definitely better than unordered_Set.
Which approach is more preferable and suggested in real scenario and why?
This is because computational complexity doesn’t take into account the behavior of multi-level memory system present in every modern computer. And it is precisely because you measure that behavior via proxy using time (!!), that your measurement is not “like” theoretical computational complexity. Computational complexity predicts execution times only in very well controlled situations, when the code is optimal for the platform. If you want to measure complexity, you can’t measure time. Measure operation counts. It will agree with theory then.
In my limited experience, it is rather rare that computational complexity theory would predict runtimes on reasonably sized data sets, when the behavior is neither exponential nor cubic (or higher terms). Cache access patterns and utilization of architectural parallelism are major predictors of performance, before computational complexity comes into play.

Sets and Vectors. Are sets fast in C++?

Please read the question here - http://www.spoj.com/problems/MRECAMAN/
The question was to compute the recaman's sequence where, a(0) = 0 and, a(i) = a(i-1)-i if, a(i-1)-i > 0 and does not come into the sequence before else, a(i) = a(i-1) + i.
Now when I use vectors to store the sequence, and use the find function, the program times out. But when I use an array and a set to see if the element exists, it gets accepted (very fast). IS using set faster?
Here are the codes:
Vector implementation
vector <int> sequence;
sequence.push_back(0);
for (int i = 1; i <= 500000; i++)
{
a = sequence[i - 1] - i;
b = sequence[i - 1] + i;
if (a > 0 && find(sequence.begin(), sequence.end(), a) == sequence.end())
sequence.push_back(a);
else
sequence.push_back(b);
}
Set Implementation
int a[500001]
set <int> exists;
a[0] = 0;
for (int i = 1; i <= MAXN; ++i)
{
if (a[i - 1] - i > 0 && exists.find(a[i - 1] - i) == exists.end()) a[i] = a[i - 1] - i;
else a[i] = a[i - 1] + i;
exists.insert(a[i]);
}
Lookup in an std::vector:
find(sequence.begin(), sequence.end(), a)==sequence.end()
is an O(n) operation (n being the number of elements in the vector).
Lookup in an std::set (which is a balanced binary search tree):
exists.find(a[i-1] - i) == exists.end()
is an O(log n) operation.
So yes, lookup in a set is (asymptotically) faster than a linear lookup in vector.
If you can sort the vector, the look up is faster in most cases than in set because it is much more cache friendly.
There is only one valid answer to most "Is XY faster than UV in C++" questions:
Use a profiler.
While most algorithms (including container insertions, searches etc.) have a guaranteed complexity, these complexities can only tell you about the approximate behavior for large amounts of data. The performance for any given smaller set of data can not be easily compared, and the optimizations that a compiler can apply can not be reasonably guessed by humans. So use a profiler and see what is faster. If it matters at all. To see if performance matters in that special part of your program, use a profiler.
However, in your case it might be a safe bet that searching a set of ~250k elements can be faster than searching an unsorted vector of tat size. However, if you use the vector only for storing the inserted values and leave the sequence[i-1] out in a separate variable, you can keep the vector sorted and use an algorithm for sorted ranges like binary_search, which can be way faster than the set.
A sample implementation with a sorted vector:
const static size_t NMAX = 500000;
vector<int> values = {0};
values.reserve(NMAX );
int lastInserted = 0;
for (int i = 1; i <= NMAX) {
auto a = lastInserted - i;
auto b = lastInserted + i;
auto iter = lower_bound(begin(values), end(values), a);
//a is always less than the last inserted value, so iter can't be end(values)
if (a > 0 && a < *iter) {
lastInserted = a;
}
else {
//b > a => lower_bound(b) >= lower_bound(a)
iter = lower_bound(iter, end(values), b);
lastInserted = b;
}
values.insert(iter, lastInserted);
}
I hope I did not introduce any bugs...
For the task at hand, set is faster than vector because it keeps its contents sorted and does a binary search to find a specified item, giving logarithmic complexity instead of linear complexity. When the set is small, that difference is also small, but when the set gets large the difference grows considerably. I think you can improve things a bit more than just that though.
First, I'd avoid the clumsy lookup to see if an item is already present by just attempting to insert an item, then see if that succeeded:
if (b>0 && exists.insert(b).second)
a[i] = b;
else {
a[i] = c;
exists.insert(c);
}
This avoids looking up the same item twice, once to see if it was already present, and again to insert the item. It only does a second lookup when the first one was already present, so we're going to insert some other value.
Second, and even more importantly, you can use std::unordered_set to improve the complexity from logarithmic to (expected) constant. Since unordered_set uses (mostly) the same interface as std::set, this substitution is easy to make (including the optimization above.
Here's some code to compare the three methods:
#include <iostream>
#include <string>
#include <set>
#include <unordered_set>
#include <vector>
#include <numeric>
#include <chrono>
static const int MAXN = 500000;
unsigned original() {
static int a[MAXN+1];
std::set <int> exists;
a[0] = 0;
for (int i = 1; i <= MAXN; ++i)
{
if (a[i - 1] - i > 0 && exists.find(a[i - 1] - i) == exists.end()) a[i] = a[i - 1] - i;
else a[i] = a[i - 1] + i;
exists.insert(a[i]);
}
return std::accumulate(std::begin(a), std::end(a), 0U);
}
template <class container>
unsigned reduced_lookup() {
container exists;
std::vector<int> a(MAXN + 1);
a[0] = 0;
for (int i = 1; i <= MAXN; ++i) {
int b = a[i - 1] - i;
int c = a[i - 1] + i;
if (b>0 && exists.insert(b).second)
a[i] = b;
else {
a[i] = c;
exists.insert(c);
}
}
return std::accumulate(std::begin(a), std::end(a), 0U);
}
template <class F>
void timer(F f) {
auto start = std::chrono::high_resolution_clock::now();
std::cout << f() <<"\t";
auto stop = std::chrono::high_resolution_clock::now();
std::cout << "Time: " << std::chrono::duration_cast<std::chrono::milliseconds>(stop - start).count() << " ms\n";
}
int main() {
timer(original);
timer(reduced_lookup<std::set<int>>);
timer(reduced_lookup<std::unordered_set<int>>);
}
Note how std::set and std::unordered_set provide similar enough interfaces that I've written the code as a single template that can use either type of container, then for timing just instantiated that for both set and unordered_set.
Anyway, here's some results from g++ (version 4.8.1, compiled with -O3):
212972756 Time: 137 ms
212972756 Time: 101 ms
212972756 Time: 63 ms
Changing the lookup strategy improves speed by about 30%1 and using unordered_set with the improved lookup strategy better than doubles the speed compared to the original--not bad, especially when the result actually looks cleaner, at least to me. You might not agree that it's cleaner looking, but I think we can at least agree that I didn't write code that was a lot longer or more complex to get the speed improvement.
1. Simplistic analysis indicates that it should be around 25%. Specifically, if we assume there are even odds of a given number being in the set already, then this eliminates half the lookups about half the time, or about 1/4th of the lookups.
The set is a huge speedup because it's faster to look up. (Btw, exists.count(a) == 0 is prettier than using find.)
That doesn't have anything to do with vector vs array though. Adding the set to the vector version should work just as fine.
It is classic space-time tradeoff. When you use only vector your program uses minimum memory but you should to find existing numbers on every step. It is slowly. When you use additional index data structure (like a set in your case) you dramatically speed up your code but your code now takes at least twice greater memory. More about tradeoff here.

Optimized way to find M largest elements in an NxN array using C++

I need a blazing fast way to find the 2D positions and values of the M largest elements in an NxN array.
right now I'm doing this:
struct SourcePoint {
Point point;
float value;
}
SourcePoint* maxValues = new SourcePoint[ M ];
maxCoefficients = new SourcePoint*[
for (int j = 0; j < rows; j++) {
for (int i = 0; i < cols; i++) {
float sample = arr[i][j];
if (sample > maxValues[0].value) {
int q = 1;
while ( sample > maxValues[q].value && q < M ) {
maxValues[q-1] = maxValues[q]; // shuffle the values back
q++;
}
maxValues[q-1].value = sample;
maxValues[q-1].point = Point(i,j);
}
}
}
A Point struct is just two ints - x and y.
This code basically does an insertion sort of the values coming in. maxValues[0] always contains the SourcePoint with the lowest value that still keeps it within the top M values encoutered so far. This gives us a quick and easy bailout if sample <= maxValues, we don't do anything. The issue I'm having is the shuffling every time a new better value is found. It works its way all the way down maxValues until it finds it's spot, shuffling all the elements in maxValues to make room for itself.
I'm getting to the point where I'm ready to look into SIMD solutions, or cache optimisations, since it looks like there's a fair bit of cache thrashing happening. Cutting the cost of this operation down will dramatically affect the performance of my overall algorithm since this is called many many times and accounts for 60-80% of my overall cost.
I've tried using a std::vector and make_heap, but I think the overhead for creating the heap outweighed the savings of the heap operations. This is likely because M and N generally aren't large. M is typically 10-20 and N 10-30 (NxN 100 - 900). The issue is this operation is called repeatedly, and it can't be precomputed.
I just had a thought to pre-load the first M elements of maxValues which may provide some small savings. In the current algorithm, the first M elements are guaranteed to shuffle themselves all the way down just to initially fill maxValues.
Any help from optimization gurus would be much appreciated :)
A few ideas you can try. In some quick tests with N=100 and M=15 I was able to get it around 25% faster in VC++ 2010 but test it yourself to see whether any of them help in your case. Some of these changes may have no or even a negative effect depending on the actual usage/data and compiler optimizations.
Don't allocate a new maxValues array each time unless you need to. Using a stack variable instead of dynamic allocation gets me +5%.
Changing g_Source[i][j] to g_Source[j][i] gains you a very little bit (not as much as I'd thought there would be).
Using the structure SourcePoint1 listed at the bottom gets me another few percent.
The biggest gain of around +15% was to replace the local variable sample with g_Source[j][i]. The compiler is likely smart enough to optimize out the multiple reads to the array which it can't do if you use a local variable.
Trying a simple binary search netted me a small loss of a few percent. For larger M/Ns you'd likely see a benefit.
If possible try to keep the source data in arr[][] sorted, even if only partially. Ideally you'd want to generate maxValues[] at the same time the source data is created.
Look at how the data is created/stored/organized may give you patterns or information to reduce the amount of time to generate your maxValues[] array. For example, in the best case you could come up with a formula that gives you the top M coordinates without needing to iterate and sort.
Code for above:
struct SourcePoint1 {
int x;
int y;
float value;
int test; //Play with manual/compiler padding if needed
};
If you want to go into micro-optimizations at this point, the a simple first step should be to get rid of the Points and just stuff both dimensions into a single int. That reduces the amount of data you need to shift around, and gets SourcePoint down to being a power of two long, which simplifies indexing into it.
Also, are you sure that keeping the list sorted is better than simply recomputing which element is the new lowest after each time you shift the old lowest out?
(Updated 22:37 UTC 2011-08-20)
I propose a binary min-heap of fixed size holding the M largest elements (but still in min-heap order!). It probably won't be faster in practice, as I think OPs insertion sort probably has decent real world performance (at least when the recommendations of the other posteres in this thread are taken into account).
Look-up in the case of failure should be constant time: If the current element is less than the minimum element of the heap (containing the max M elements) we can reject it outright.
If it turns out that we have an element bigger than the current minimum of the heap (the Mth biggest element) we extract (discard) the previous min and insert the new element.
If the elements are needed in sorted order the heap can be sorted afterwards.
First attempt at a minimal C++ implementation:
template<unsigned size, typename T>
class m_heap {
private:
T nodes[size];
static const unsigned last = size - 1;
static unsigned parent(unsigned i) { return (i - 1) / 2; }
static unsigned left(unsigned i) { return i * 2; }
static unsigned right(unsigned i) { return i * 2 + 1; }
void bubble_down(unsigned int i) {
for (;;) {
unsigned j = i;
if (left(i) < size && nodes[left(i)] < nodes[i])
j = left(i);
if (right(i) < size && nodes[right(i)] < nodes[j])
j = right(i);
if (i != j) {
swap(nodes[i], nodes[j]);
i = j;
} else {
break;
}
}
}
void bubble_up(unsigned i) {
while (i > 0 && nodes[i] < nodes[parent(i)]) {
swap(nodes[parent(i)], nodes[i]);
i = parent(i);
}
}
public:
m_heap() {
for (unsigned i = 0; i < size; i++) {
nodes[i] = numeric_limits<T>::min();
}
}
void add(const T& x) {
if (x < nodes[0]) {
// reject outright
return;
}
nodes[0] = x;
swap(nodes[0], nodes[last]);
bubble_down(0);
}
};
Small test/usage case:
#include <iostream>
#include <limits>
#include <algorithm>
#include <vector>
#include <stdlib.h>
#include <assert.h>
#include <math.h>
using namespace std;
// INCLUDE TEMPLATED CLASS FROM ABOVE
typedef vector<float> vf;
bool compare(float a, float b) { return a > b; }
int main()
{
int N = 2000;
vf v;
for (int i = 0; i < N; i++) v.push_back( rand()*1e6 / RAND_MAX);
static const int M = 50;
m_heap<M, float> h;
for (int i = 0; i < N; i++) h.add( v[i] );
sort(v.begin(), v.end(), compare);
vf heap(h.get(), h.get() + M); // assume public in m_heap: T* get() { return nodes; }
sort(heap.begin(), heap.end(), compare);
cout << "Real\tFake" << endl;
for (int i = 0; i < M; i++) {
cout << v[i] << "\t" << heap[i] << endl;
if (fabs(v[i] - heap[i]) > 1e-5) abort();
}
}
You're looking for a priority queue:
template < class T, class Container = vector<T>,
class Compare = less<typename Container::value_type> >
class priority_queue;
You'll need to figure out the best underlying container to use, and probably define a Compare function to deal with your Point type.
If you want to optimize it, you could run a queue on each row of your matrix in its own worker thread, then run an algorithm to pick the largest item of the queue fronts until you have your M elements.
A quick optimization would be to add a sentinel value to yourmaxValues array. If you have maxValues[M].value equal to std::numeric_limits<float>::max() then you can eliminate the q < M test in your while loop condition.
One idea would be to use the std::partial_sort algorithm on a plain one-dimensional sequence of references into your NxN array. You could probably also cache this sequence of references for subsequent calls. I don't know how well it performs, but it's worth a try - if it works good enough, you don't have as much "magic". In particular, you don't resort to micro optimizations.
Consider this showcase:
#include <algorithm>
#include <iostream>
#include <vector>
#include <stddef.h>
static const int M = 15;
static const int N = 20;
// Represents a reference to a sample of some two-dimensional array
class Sample
{
public:
Sample( float *arr, size_t row, size_t col )
: m_arr( arr ),
m_row( row ),
m_col( col )
{
}
inline operator float() const {
return m_arr[m_row * N + m_col];
}
bool operator<( const Sample &rhs ) const {
return (float)other < (float)*this;
}
int row() const {
return m_row;
}
int col() const {
return m_col;
}
private:
float *m_arr;
size_t m_row;
size_t m_col;
};
int main()
{
// Setup a demo array
float arr[N][N];
memset( arr, 0, sizeof( arr ) );
// Put in some sample values
arr[2][1] = 5.0;
arr[9][11] = 2.0;
arr[5][4] = 4.0;
arr[15][7] = 3.0;
arr[12][19] = 1.0;
// Setup the sequence of references into this array; you could keep
// a copy of this sequence around to reuse it later, I think.
std::vector<Sample> samples;
samples.reserve( N * N );
for ( size_t row = 0; row < N; ++row ) {
for ( size_t col = 0; col < N; ++col ) {
samples.push_back( Sample( (float *)arr, row, col ) );
}
}
// Let partial_sort find the M largest entry
std::partial_sort( samples.begin(), samples.begin() + M, samples.end() );
// Print out the row/column of the M largest entries.
for ( std::vector<Sample>::size_type i = 0; i < M; ++i ) {
std::cout << "#" << (i + 1) << " is " << (float)samples[i] << " at " << samples[i].row() << "/" << samples[i].col() << std::endl;
}
}
First of all, you are marching through the array in the wrong order!
You always, always, always want to scan through memory linearly. That means the last index of your array needs to be changing fastest. So instead of this:
for (int j = 0; j < rows; j++) {
for (int i = 0; i < cols; i++) {
float sample = arr[i][j];
Try this:
for (int i = 0; i < cols; i++) {
for (int j = 0; j < rows; j++) {
float sample = arr[i][j];
I predict this will make a bigger difference than any other single change.
Next, I would use a heap instead of a sorted array. The standard <algorithm> header already has push_heap and pop_heap functions to use a vector as a heap. (This will probably not help all that much, though, unless M is fairly large. For small M and a randomized array, you do not wind up doing all that many insertions on average... Something like O(log N) I believe.)
Next after that is to use SSE2. But that is peanuts compared to marching through memory in the right order.
You should be able to get nearly linear speedup with parallel processing.
With N CPUs, you can process a band of rows/N rows (and all columns) with each CPU, finding the top M entries in each band. And then do a selection sort to find the overall top M.
You could probably do that with SIMD as well (but here you'd divide up the task by interleaving columns instead of banding the rows). Don't try to make SIMD do your insertion sort faster, make it do more insertion sorts at once, which you combine at the end using a single very fast step.
Naturally you could do both multi-threading and SIMD, but on a problem which is only 30x30, that's not likely to be worthwhile.
I tried replacing float by double, and interestingly that gave me a speed improvement of about 20% (using VC++ 2008). That's a bit counterintuitive, but it seems modern processors or compilers are optimized for double value processing.
Use a linked list to store the best yet M values. You'll still have to iterate over it to find the right spot, but the insertion is O(1). It would probably even be better than binary search and insertion O(N)+O(1) vs O(lg(n))+O(N).
Interchange the fors, so you're not accessing every N element in memory and trashing the cache.
LE: Throwing another idea that might work for uniformly distributed values.
Find the min, max in 3/2*O(N^2) comparisons.
Create anywhere from N to N^2 uniformly distributed buckets, preferably closer to N^2 than N.
For every element in the NxN matrix place it in bucket[(int)(value-min)/range], range=max-min.
Finally create a set starting from the highest bucket to the lowest, add elements from other buckets to it while |current set| + |next bucket| <=M.
If you get M elements you're done.
You'll likely get less elements than M, let's say P.
Apply your algorithm for the remaining bucket and get biggest M-P elements out of it.
If elements are uniform and you use N^2 buckets it's complexity is about 3.5*(N^2) vs your current solution which is about O(N^2)*ln(M).

c++ quick sort running time

I have a question about quick sort algorithm. I implement quick sort algorithm and play it.
The elements in initial unsorted array are random numbers chosen from certain range.
I find the range of random number effects the running time. For example, the running time for 1, 000, 000 random number chosen from the range (1 - 2000) takes 40 seconds. While it takes 9 seconds if the 1,000,000 number chosen from the range (1 - 10,000).
But I do not know how to explain it. In class, we talk about the pivot value can effect the depth of recursion tree.
For my implementation, the last value of the array is chosen as pivot value. I do not use randomized scheme to select pivot value.
int partition( vector<int> &vec, int p, int r) {
int x = vec[r];
int i = (p-1);
int j = p;
while(1) {
if (vec[j] <= x){
i = (i+1);
int temp = vec[j];
vec[j] = vec[i];
vec[i] = temp;
}
j=j+1;
if (j==r)
break;
}
int temp = vec[i+1];
vec[i+1] = vec[r];
vec[r] = temp;
return i+1;
}
void quicksort ( vector<int> &vec, int p, int r) {
if (p<r){
int q = partition(vec, p, r);
quicksort(vec, p, q-1);
quicksort(vec, q+1, r);
}
}
void random_generator(int num, int * array) {
srand((unsigned)time(0));
int random_integer;
for(int index=0; index< num; index++){
random_integer = (rand()%10000)+1;
*(array+index) = random_integer;
}
}
int main() {
int array_size = 1000000;
int input_array[array_size];
random_generator(array_size, input_array);
vector<int> vec(input_array, input_array+array_size);
clock_t t1, t2;
t1 = clock();
quicksort(vec, 0, (array_size - 1)); // call quick sort
int length = vec.size();
t2 = clock();
float diff = ((float)t2 - (float)t1);
cout << diff << endl;
cout << diff/CLOCKS_PER_SEC <<endl;
}
Most likely it's not performing well because quicksort doesn't handle lots of duplicates very well and may still result in swapping them (order of key-equal elements isn't guaranteed to be preserved). You'll notice that the number of duplicates per number is 100 for 10000 or 500 for 2000, while the time factor is also approximately a factor of 5.
Have you averaged the runtimes over at least 5-10 runs at each size to give it a fair shot of getting a good starting pivot?
As a comparison have you checked to see how std::sort and std::stable_sort also perform on the same data sets?
Finally for this distribution of data (unless this is a quicksort exercise) I think counting sort would be much better - 40K memory to store the counts and it runs in O(n).
It probably has to do with how well sorted the input is. Quicksort is O(n logn) if the input is reasonably random. If it's in reverse order, performance can degrade to O(n^2). You're probably getting closer to the O(n^2) behavior with the smaller data range.
Late answer - the effect of duplicates depends on the partition scheme. The example code in the question is a variation of Lomuto partition scheme, which takes more time as the number of duplicates increases, due to the partitioning getting worse. In the case of all equal elements, Lomuto only reduces the size by 1 element with each level of recursion.
If instead Hoare partition scheme was used (with middle value as pivot), it generally takes less time as the number of duplicates increases. Hoare will needlessly swap values equal to the pivot, due to duplicates, but the partitioning will approach the ideal case of splitting an array in nearly equally sized parts. The swap overhead is somewhat masked by memory cache. Link to Wiki example of Hoare partition scheme:
https://en.wikipedia.org/wiki/Quicksort#Hoare_partition_scheme