Optimize counting sort? - c++

Given that the input will be N numbers from 0 to N (with duplicates) how I can optimize the code bellow for both small and big arrays:
void countingsort(int* input, int array_size)
{
int max_element = array_size;//because no number will be > N
int *CountArr = new int[max_element+1]();
for (int i = 0; i < array_size; i++)
CountArr[input[i]]++;
for (int j = 0, outputindex = 0; j <= max_element; j++)
while (CountArr[j]--)
input[outputindex++] = j;
delete []CountArr;
}
Having a stable sort is not a requirement.
edit: In case it's not clear, I am talking about optimizing the algorithm.

IMHO there's nothing wrong here. I highly recommend this approach when max_element is small, numbers sorted are non sparse (i.e. consecutive and no gaps) and greater than or equal to zero.
A small tweak, I'd replace new / delete and just declare a finite array using heap, e.g. 256 for max_element.
int CountArr[256] = { }; // Declare and initialize with zeroes
As you bend these rules, i.e. sparse, negative numbers you'd be struggling with this approach. You will need to find an optimal hashing function to remap the numbers to your efficient array. The more complex the hashing becomes the benefit between this over well established sorting algorithms diminishes.

In terms of complexity this cannot be beaten. It's O(N) and beats standard O(NlogN) sorting by exploiting the extra knowledge that 0<x<N. You cannot go below O(N) because you need at least to swipe through the input array once.

Related

Efficient algorithm to produce closest triplet from 3 arrays?

I need to implement an algorithm in C++ that, when given three arrays of unequal sizes, produces triplets a,b,c (one element contributed by each array) such that max(a,b,c) - min(a,b,c) is minimized. The algorithm should produce a list of these triplets, in order of size of max(a,b,c)-min(a,b,c). The arrays are sorted.
I've implemented the following algorithm (note that I now use arrays of type double), however it runs excruciatingly slow (even when compiled using GCC with -03 optimization, and other combinations of optimizations). The dataset (and, therefore, each array) has potentially tens of millions of elements. Is there a faster/more efficient method? A significant speed increase is necessary to accomplish the required task in a reasonable time frame.
void findClosest(vector<double> vec1, vector<double> vec2, vector<double> vec3){
//calculate size of each array
int len1 = vec1.size();
int len2 = vec2.size();
int len3 = vec3.size();
int i = 0; int j = 0; int k = 0; int res_i, res_j, res_k;
int diff = INT_MAX;
int iter = 0; int iter_bound = min(min(len1,len2),len3);
while(iter < iter_bound)
while(i < len1 && j < len2 && k < len3){
int minimum = min(min(vec1[i], vec2[j]), vec3[k]);
int maximum = max(max(vec1[i], vec2[j]), vec3[k]);
//if new difference less than previous difference, update difference, store
//resultants
if(fabs(maximum - minimum) < diff){ diff = maximum-minimum; res_i = i; res_j = j; res_k = k;}
//increment minimum value
if(vec1[i] == minimum) ++i;
else if(vec2[j] == minimum) ++j;
else ++k;
}
//"remove" triplet
vec1.erase(vec1.begin() + res_i);
vec2.erase(vec2.begin() + res_j);
vec3.erase(vec3.begin() + res_k);
--len1; --len2; --len3;
++iter_bound;
}
OK, you're going to need to be clever in a few ways to make this run well.
The first thing that you need is a priority queue, which is usually implemented with a heap. With that, the algorithm in pseudocode is:
Make a priority queue for possible triples in order of max - min, then how close median is to their average.
Make a pass through all 3 arrays, putting reasonable triples for every element into the priority queue
While the priority queue is not empty:
Pull a triple out
If all three of the triple are not used:
Add triple to output
Mark the triple used
else:
If you can construct reasonable triplets for unused elements:
Add them to the queue
Now for this operation to succeed, you need to efficiently find elements that are currently unused. Doing that at first is easy, just keep an array of bools where you mark off the indexes of the used values. But once a lot have been taken off, your search gets long.
The trick for that is to have a vector of bools for individual elements, a second for whether both in a pair have been used, a third for where all 4 in a quadruple have been used and so on. When you use an element just mark the individual bool, then go up the hierarchy, marking off the next level if the one you're paired with is marked off, else stopping. This additional data structure of size 2n will require an average of marking 2 bools per element used, but allows you to find the next unused index in either direction in at most O(log(n)) steps.
The resulting algorithm will be O(n log(n)).

Sort array of n elements which has k sorted sections

What is the best way to sort an section-wise sorted array as depicted in the second image?
The problem is performing a quick-sort using Message Passing Interface. The solution is performing quick-sort on array sections obtained by using MPI_Scatter() then joining the sorted
pieces using MPI_Gather().
Problem is that the array as a whole is unsorted but sections of it are.
Merging the sub-sections similarly to this solution seems like the best way of sorting the array, but considering that the sub-arrays are already within a single array other sorting algorithms may prove better.
The inputs for a sort function would be the array, it's length and the number of equally sorted sub-sections.
A signature would look something like int* sort(int* array, int length, int sections);
The sections parameter can have any value between 1 and 25. The length parameter value is greater than 0, a multiple of sections and smaller than 2^32.
This is what I am currently using:
int* merge(int* input, int length, int sections)
{
int* sub_sections_indices = new int[sections];
int* result = new int[length];
int section_size = length / sections;
for (int i = 0; i < sections; i++) //initialisation
{
sub_sections_indices[i] = 0;
}
int min, min_index, current_index;
for (int i = 0; i < length; i++) //merging
{
min_index = 0;
min = INT_MAX;
for (int j = 0; j < sections; j++)
{
if (sub_sections_indices[j] < section_size)
{
current_index = j * section_size + sub_sections_indices[j];
if (input[current_index] < min)
{
min = input[current_index];
min_index = j;
}
}
}
sub_sections_indices[min_index]++;
result[i] = min;
}
return result;
}
Optimizing for performance
I think this answer that maintains a min-heap of the smallest item of each sub-array is the best way to handle arbitrary input. However, for small values of k, think somewhere between 10 and 100, it might be faster to implement the more naive solutions given in the question you linked to; while maintaining the min-heap is only O(log n) for each step, it might have a higher overhead for small values of n than the simple linear scan from the naive solutions.
All these solutions create a copy of the input, and they maintain O(k) state.
Optimizing for space
The only way to save space I see is to sort in-place. This will be a problem for the algorithms mentioned above. An in-place algorithm will have two swap elements, but any swaps will likely destroy the property that each sub-array is sorted, unless the larger of the swapped pair is re-sorted into the sub-array it is being swapped to, which will result in an O(n²) algorithm. So if you really do need to conserve memory, I think a regular in-place sorting algorithm would have to be used, which defeats your purpose.

What is fastest way to find a prime number in range?

I have this code to find prime numbers:
void writePrimesToFile(int begin, int end, ofstream& file)
{
bool isPrime = 0;
for (int i = begin; i < end; i = i+2)
{
isPrime = 1;
for (int j = 2; j<i; j++)
if (i % j == 0)
{
isPrime = 0;
break;
}
if (isPrime)
file << i << " \n";
}
}
Is there a faster way to do it?
I tried googling a faster way but its all math and I don't understand how can I turn it into code.
Is there a faster way to do it?
Yes. There are faster primality test algorithms.
What is fastest way to find a prime number in range?
No one knows. If some one knows, then that person is guarding a massively important secret. No one has been able to prove that any of the known techniques is the fastest possible way to test primality.
You might have asked: What is the fastest known way to find a prime number in range.
The answer to that would be: It depends. The complexity of some algorithms grow asymptotically slower than that of other algorithms, but that is irrelevant if the input numbers are small. There are probabilistic methods that are very fast for some numbers, but have problematic cases where they are slower than deterministic methods.
Your input numbers are small, because they are of type int and therefore have quite limited range. With small numbers, a simple algorithm may be faster than a more complex one. To find out which algorithm is fastest for your use case, you must benchmark them.
I recommend starting with Sieve of Eratosthenes since it is asymptotically faster than your naïve approach, but also easy to implement (pseudo code courtesy of wikipedia):
Input: an integer n > 1
Let A be an array of Boolean values, indexed by integers 2 to n,
initially all set to true.
for i = 2, 3, 4, ..., not exceeding √n:
if A[i] is true:
for j = i², i²+i, i²+2i, i²+3i, ..., not exceeding n :
A[j] := false
Output: all i such that A[i] is true.

How do I find the frequency of a given number into a range into an array?

The problem is:
You are given an array of size N. Also given q=number of queries; in queries you will be given l=lower range, u=upper range and num=the number of which you will have to count frequency into l~u.
I've implemented my code in C++ as follows:
#include <iostream>
#include <map>
using namespace std;
map<int,int>m;
void mapnumbers(int arr[], int l, int u)
{
for(int i=l; i<u; i++)
{
int num=arr[i];
m[num]++;
}
}
int main()
{
int n; //Size of array
cin>>n;
int arr[n];
for(int i=0; i<n; i++)
cin>>arr[i];
int q; //Number of queries
cin>>q;
while(q--)
{
int l,u,num; //l=lower range, u=upper range, num=the number of which we will count frequency
cin>>l>>u>>num;
mapnumbers(arr,l,u);
cout<<m[num]<<endl;
}
return 0;
}
But my code has a problem, in each query it doesn't make the map m empty. That's why if I query for the same number twice/thrice it adds the count of frequency with the previous stored one.
How do I solve this?
Will it be a poor program for a large range of query as 10^5?
What is an efficient solution for this problem?
You can solve the task using SQRT-decomposition of queries. The complexity will be
O(m*sqrt(n)). First of all, sort all queries due to the following criteria: L/sqrt(N) should be increasing, where L is the left bound of query. For equal L/sqrt(N), R (right bounds) should be increasing too. N is the number of queries. Then do this: calculate answer for first query. Then, just move the bounds of this query to the bounds of the next query one by one. For example, if your first query after sort is [2,7] and second is [1, 10], move left bound to 1 and decrease the frequency of a[2], increase the frequency of a1. Move the right bound from 7 to 10. Increase the frequency of a[8], a[9] and a[10]. Increase and decrease frequencies using your map. This is a very complicated technique, but it allows to solve your task with good complexity. You can read more about SQRT-decomposition of queries here: LINK
To clear the map, you need to call map::clear():
void mapnumbers(int arr[], int l, int u)
{
m.clear()
A better approach to the clearing problem is to make m a local variable for the while (q--) loop, or even for the mapnumbers function.
However, in general it is very strange why you need map at all. You traverse the whole array anyway, and you know the number you need to count, so why not do
int mapnumbers(int arr[], int l, int u, int num)
{
int result = 0;
for(int i=l; i<u; i++)
{
if (arr[i] == num);
result ++;
}
return result;
}
This will be faster, even asymptotically faster, as map operations are O(log N), so your original solution ran for O(N log N) per query, while this simple iteration runs for O(N).
However, for a really big array and many queries (I guess the problem comes from some competitive programming site, does not it?), this still will not be enough. I guess there should be some data structure and algorithm that allows for O(log N) query, though I can not think of any right now.
UPD: I have just realized that the array does not change in your problem. This makes it much simpler, allowing for a simple O(log N) per query solution. You just need to sort all the numbers in the input array, remembering their original positions too (and making sure the sort is stable, so that the original positions are in increasing order); you can do this only once. After this, every query can be solved with just two binary searches.
Many Algorithms are available for this kind of problems . This looks like a straight forward data structure problem . You can use Segment tree , Square Root Decomposition . Check Geeksforgeeks for the algorithm ! The reason i am telling you to learn algorithm is , this kind of problems have such large constrains , your verdict will be TLE if you use your method . So better using Algorithms .
Many answers here are much way complicated. I am going to tell you easy way to find range frequency. You can use the binary search technique to get the answer in O(logn) per query.
For that, use arrays of vector to store the index values of all numbers present in the array and then use lower_bound and upper_bound provided by C++ STL.
Here is C++ Code:
#define MAX 1000010
std::vector<int> v[MAX];
int main(){
cin>>n;
for (int i = 0; i < n; ++i)
{
cin>>a;
v[a].push_back(i);
}
int low = 0, high = 0;
int q; //Number of queries
cin>>q;
while(q--)
{
int l,u,num; //l=lower range, u=upper range, num=the number of which we will count frequency
cin>>l>>u>>num;
low = lower_bound(v[num].begin(), v[num].end(), l) - v[num].begin();
high = upper_bound(v[num].begin(), v[num].end(), u) - v[num].begin();
cout<<(high - low)<<endl;
}
return 0;
}
Overall Time Complexity: O(Q*log n)

Non-standard sorting algorithm for random unique integers

I have an array of at least 2000 random unique integers, each in range 0 < n < 65000.
I have to sort it and then get the index of a random value in the array. Each of these operations have to be as fast as possible. For searching the binary-search seems to serve well.
For sorting I used the standard quick sorting algorithm (qsort), but I was told that with the given information the standard sorting algorithms will not be the most efficient. So the question is simple - what would be the most efficient way to sort the array, with the given information? Totally puzzled by this.
I don't know why the person who told you that would be so peversely cryptic, but indeed qsort is not the most efficient way to sort integers (or generally anything) in C++. Use std::sort instead.
It's possible that you can improve on your implementation's std::sort for the stated special case (2000 distinct random integers in the range 0-65k), but you're unlikely to do a lot better and it almost certainly won't be worth the effort. The things I can think of that might help:
use a quicksort, but with a different pivot selection or a different threshold for switching to insertion sort from what your implementation of sort uses. This is basically tinkering.
use a parallel sort of some kind. 2000 elements is so small that I suspect the time to create additional threads will immediately kill any hope of a performance improvement. But if you're doing a lot of sorts then you can average the cost of creating the threads across all of them, and only worry about the overhead of thread synchronization rather than thread creation.
That said, if you generate and sort the array, then look up just one value in it, and then generate a new array, you would be wasting effort by sorting the whole array each time. You can just run across the array counting the number of values smaller than your target value: this count is the index it would have. Use std::count_if or a short loop.
Each of these operations have to be as fast as possible.
That is not a legitimate software engineering criterion. Almost anything can be made a minuscule bit faster with enough months or years of engineering effort -- nothing complex is ever "as fast as possible", and even if it was you wouldn't be able to prove that it cannot be faster, and even if you could there would be new hardware out there somewhere or soon to be invented for which the fastest solution is different and better. Unless you intend to spend your whole life on this task and ultimately fail, get a more realistic goal ;-)
For sorting uniformly distributed random integers Radix Sort is typically the fastest algorithm, it can be faster than quicksort by a factor of 2 or more. However, it may be hard to find an optimized implementation of that, quick sort is much more ubiquitous. Both Radix Sort and Quick Sort may have very bad worst case performance, like O(N^2), so if worst case performance is important you have to look elsewhere, maybe you pick introsort, which is similar to std::sort in C++.
For array look up a hash table is by far the fasted method. If you don't want yet another data structure, you can always pick binary search. If you have uniformly distributed numbers interpolation search is probably the most effective method (best average performance).
Quicksort's complexity is O(n*log(n)), where n = 2000 in your case. log(2000) = 10.965784.
You can sort in O(n) using one of these algorithms:
Counting sort
Radix sort
Bucket sort
I've compared std::sort() to counting sort for N = 100000000:
#include <iostream>
#include <vector>
#include <algorithm>
#include <time.h>
#include <string.h>
using namespace std;
void countSort(int t[], int o[], int c[], int n, int k)
{
// Count the number of each number in t[] and place that value into c[].
for (int i = 0; i < n; i++)
c[t[i]]++;
// Place the number of elements less than each value at i into c[].
for (int i = 1; i <= k; i++)
c[i] += c[i - 1];
// Place each element of t[] into its correct sorted position in the output o[].
for (int i = n - 1; i >= 0; i--)
{
o[c[t[i]] - 1] = t[i];
--c[t[i]];
}
}
void init(int t[], int n, int max)
{
for (int i = 0; i < n; i++)
t[i] = rand() % max;
}
double getSeconds(clock_t start)
{
return (double) (clock() - start) / CLOCKS_PER_SEC;
}
void print(int t[], int n)
{
for (int i = 0; i < n; i++)
cout << t[i] << " ";
cout << endl;
}
int main()
{
const int N = 100000000;
const int MAX = 65000;
int *t = new int[N];
init(t, N, MAX);
//print(t, N);
clock_t start = clock();
sort(t, t + N);
cout << "std::sort " << getSeconds(start) << endl;
//print(t, N);
init(t, N, MAX);
//print(t, N);
// o[] holds the sorted output.
int *o = new int[N];
// c[] holds counters.
int *c = new int[MAX + 1];
// Set counters to zero.
memset(c, 0, (MAX + 1) * sizeof(*c));
start = clock();
countSort(t, o, c, N, MAX);
cout << "countSort " << getSeconds(start) << endl;
//print(o, N);
delete[] t;
delete[] o;
delete[] c;
return 0;
}
Results (in seconds):
std::sort 28.6
countSort 10.97
For N = 2000 both algorithms give 0 time.
Standard sorting algorithms, as well as standard nearly anything, are very good general purpose solution. If you know nothing about your data, if it truly consists of "random unique integers", then you might as well go with one of the standard implementations.
On the other hand, most programming problems appear in a context that tells something about data, and the additional info usually leads to more efficient problem-specific solutions.
For example, does your data appear all at once or in chunks? If it comes piecemeal you may speed things up by interleaving incremental sorting, such as dual-pivot quicksort, with data acquisition.
Since the domain of your numbers is so small, you can create an array of 65000 entries, set the index of the numbers you see to one, and then collect all numbers that are set to one as your sorted array. This will be exactly 67000 (assuming initialization of array is without cost) iterations in total.
Since the lists contain 2000 entries, O(n*log(n)) will probably be faster. I can think of no other O(n) algorithm for this, so I suppose you are better off with a general purpose algorithm.