Bucket sort or merge sort? - c++

I am doing an c++ assignment where I have to sort data (n=400) which is student scores from 0-100. I am confused on using bucket sort, which sorts algorithm into buckets or merge sort, which divides and conquers. Which one should I use and why?

The answer depends on your data. However, merge sort will run in O(n log n) while bucket sort will run in O(n + b) where b is the number of buckets you have. If scores are from zero to (and including) 100, then b is 101. So the question is of O(n log n) runs faster than O(n + 101) which is an easy question to answer theoretically, since O(n + 101) = O(n) and clearly O(n) is faster than O(n log n). Even if we did the (admittedly silly) exercise of substituting n for 400 we would get 501 for bucket sort and with log2(400) = 9 (rounded up) 3600 for merge sort. But that is silly, because big-O notation doesn't work that way. Theoretically, we would just conclude that O(n) is better than O(n log n).
But that is the theoretical answer. In practise, the overhead hidden behind the big-O counts, and then it might not be as simple.
That being said, the overhead in bucket sort is usually smaller than for merge sort. You need to allocate an array for some counts and an array to put the output in, and after that you need to run through the input twice, first for counting and then for sorting. A simple bucket sort could look like this:
#include <iostream>
#include <string>
// Some fake data
struct student
{
int score;
std::string name;
};
struct student scores[] = {
{45, "jack"},
{12, "jill"},
{99, "john"},
{89, "james"}};
void bucket_sort(int n, struct student in[n], struct student out[n])
{
int buckets[101]; // range 0-100 with 100 included
for (int i = 0; i < 101; i++)
{
buckets[i] = 0;
}
// get offsets for each bucket
for (int i = 0; i < n; i++)
{
buckets[in[i].score]++;
}
int acc = 0;
for (int i = 0; i < 101; i++)
{
int b = buckets[i];
buckets[i] = acc;
acc += b;
}
// Bucket the scores
for (int i = 0; i < n; i++)
{
out[buckets[in[i].score]++] = in[i];
}
}
void print_students(int n, struct student students[n])
{
for (int i = 0; i < n; i++)
{
std::cout << students[i].score << ' ' << students[i].name << std::endl;
}
std::cout << std::endl;
}
int main(void)
{
int no_students = sizeof scores / sizeof scores[0];
print_students(no_students, scores);
struct student sorted[no_students];
bucket_sort(no_students, scores, sorted);
print_students(no_students, sorted);
return 0;
}
(excuse my C++, it's been more than 10 years since I used the language, so the code might look a bit more C like than it should).
The best way to work out what is faster in practise is, of course, to measure it. Compare std::sort with something like the above, and you should get your answer.
If it wasn't for an assignment, though, I wouldn't recommend you to experiment. The built-in std::sort can easily handle 400 elements faster than you need, and there is no need to implement new sorting algorithms for something like that. For an exercise, though, it could be fun to do some measuring and experiments.

Update
Read Thomas Mailund's answer first. He provided more relevant answer to this specific question. Since the scores are likely to be in integers, histogram sort (variant of bucket sort) should be faster than merge sort!
Bucket sort performs poorly when the data set is not distributed well since most of the items will fall into a few popular buckets. In your case, It's reasonable to assume that the most of the student scores will more or less be around the median score and only have few outliers. Therefore I would argue that the merge sort performs better in this context since it is not affected by the distribution of the data set.
Additional Consideration
There could be an argument that bucket sort is better if we can adjust the bucket ranges according to the expected distribution of the data set. Sure, if we hit the jackpot and predicted the distribution really well, it can significantly speed up the sorting process. However, the downside of this is that the sorting performance can plummet when our prediction goes wrong i.e. getting unexpected data set. For example, the test being too easy/difficult might lead to this "unexpected data set" in the context of this question. In other words, bucket sort has better best-case time complexity, where as merge sort has better worst-case time complexity. Which metric to use for comparing algorithms depends on the needs of each application. In practice, the worst-case time complexity is usually found to be more useful and I think the same could be said for this specific question. It's also a plus that we don't suffer the additional cost of calculating/adjusting the bucket ranges if we go for merge sort.

The question is not precise enough: I have to sort data (n=400) which is student scores from 0-100.
If the grades are integers, bucket sort with 1 bucket per grade, also called histogram sort or counting sort will do the job in linear time as illustrated in Thomas Mailund's answer.
If the grades are decimal, bucket sort will just add complexity and given the sample size, mergesort will do just fine in O(n.log(n)) time with a classic implementation.
If the goal of the question is for you to implement a sorting algorithm, the above applies, otherwise you should just use std::sort in C++ or qsort in C with an appropriate comparison function.

Related

Improve searching through unsorted list

My code spends 40% of its time searching through unsorted vectors. More specifically, the searching function my_search repeatedly receives a single unsorted vector of length N, where N can take any values between 10 and 100,000. The weights associated with each element have relatively little variance (e.g. [ 0.8, 0.81, 0.85, 0.78, 0.8, 0.7, 0.84, 0.82, ...]).
The algorithm my_search starts by summing all the weights for each object and then sample an average of N elements (as many as the length of the vector) with replacements. The algorithm is quite similar to
int sum_of_weight = 0;
for(int i=0; i<num_choices; i++) {
sum_of_weight += choice_weight[i];
}
int rnd = random(sum_of_weight);
for(int i=0; i<num_choices; i++) {
if(rnd < choice_weight[i])
return i;
rnd -= choice_weight[i];
}
from this post.
I could sort the vector before searching but takes a time of the order of O(N log N) (depending on the sort algorithm used) and I doubt (but might be wrong as I haven't tried) that I would gain much time especially as the weights have little variance.
Another solution would be to store information of how much weight there is before a series of points. For example, while summing the vector, every N/10 elements, I could store the information of how much weights has been summed yet. Then, I could first compare rnd to these 10 breakpoints and search in only a tenth of the total length of the vector.
Would this be a good solution?
Is there a name for the process I described?
How can I estimate what is the right number of breakpoints to store as a function of N?
Is there a better solution?
log(N) Solution
{
std::vector<double> sums;
double sum_of_weight = 0;
for(int i=0; i<num_choices; i++) {
sum_of_weight += choice_weight[i];
sums.push_back(sum_of_weight);
}
std::vector<double>::iterator high = std::upper_bound(sums.begin(), sums.end(), random(sum_of_weight));
return std::distance(sums.begin(), high);
}
Essentially the same idea you have for a better way to solve it, but rather than store only a 10th of the elements, store all of them and use binary search to find the index of the one closest to your value.
Analysis
Even though this solution is O(logN), you really have to ask yourself if it's worth it. Is it worth it to have to create an extra vector, thus accumulating extra clock cycles to store things in the vector, the time it takes for vectors to resize, the time it takes to call a function to perform binary search, etc?
As I was writing the above, I realised you can use a deque instead and that will almost get rid of the performance hit from having to resize and copy contents of vectors without affecting the O(1) lookup of vectors.
So I guess the question remains, is it worth it to copy over the elements into another container and then only do an O(logN) search?
Conclusion
TBH, I don't think you've gained much from this optimization. In fact I think you gained an overhead of O(logN).

How do I find the frequency of a given number into a range into an array?

The problem is:
You are given an array of size N. Also given q=number of queries; in queries you will be given l=lower range, u=upper range and num=the number of which you will have to count frequency into l~u.
I've implemented my code in C++ as follows:
#include <iostream>
#include <map>
using namespace std;
map<int,int>m;
void mapnumbers(int arr[], int l, int u)
{
for(int i=l; i<u; i++)
{
int num=arr[i];
m[num]++;
}
}
int main()
{
int n; //Size of array
cin>>n;
int arr[n];
for(int i=0; i<n; i++)
cin>>arr[i];
int q; //Number of queries
cin>>q;
while(q--)
{
int l,u,num; //l=lower range, u=upper range, num=the number of which we will count frequency
cin>>l>>u>>num;
mapnumbers(arr,l,u);
cout<<m[num]<<endl;
}
return 0;
}
But my code has a problem, in each query it doesn't make the map m empty. That's why if I query for the same number twice/thrice it adds the count of frequency with the previous stored one.
How do I solve this?
Will it be a poor program for a large range of query as 10^5?
What is an efficient solution for this problem?
You can solve the task using SQRT-decomposition of queries. The complexity will be
O(m*sqrt(n)). First of all, sort all queries due to the following criteria: L/sqrt(N) should be increasing, where L is the left bound of query. For equal L/sqrt(N), R (right bounds) should be increasing too. N is the number of queries. Then do this: calculate answer for first query. Then, just move the bounds of this query to the bounds of the next query one by one. For example, if your first query after sort is [2,7] and second is [1, 10], move left bound to 1 and decrease the frequency of a[2], increase the frequency of a1. Move the right bound from 7 to 10. Increase the frequency of a[8], a[9] and a[10]. Increase and decrease frequencies using your map. This is a very complicated technique, but it allows to solve your task with good complexity. You can read more about SQRT-decomposition of queries here: LINK
To clear the map, you need to call map::clear():
void mapnumbers(int arr[], int l, int u)
{
m.clear()
A better approach to the clearing problem is to make m a local variable for the while (q--) loop, or even for the mapnumbers function.
However, in general it is very strange why you need map at all. You traverse the whole array anyway, and you know the number you need to count, so why not do
int mapnumbers(int arr[], int l, int u, int num)
{
int result = 0;
for(int i=l; i<u; i++)
{
if (arr[i] == num);
result ++;
}
return result;
}
This will be faster, even asymptotically faster, as map operations are O(log N), so your original solution ran for O(N log N) per query, while this simple iteration runs for O(N).
However, for a really big array and many queries (I guess the problem comes from some competitive programming site, does not it?), this still will not be enough. I guess there should be some data structure and algorithm that allows for O(log N) query, though I can not think of any right now.
UPD: I have just realized that the array does not change in your problem. This makes it much simpler, allowing for a simple O(log N) per query solution. You just need to sort all the numbers in the input array, remembering their original positions too (and making sure the sort is stable, so that the original positions are in increasing order); you can do this only once. After this, every query can be solved with just two binary searches.
Many Algorithms are available for this kind of problems . This looks like a straight forward data structure problem . You can use Segment tree , Square Root Decomposition . Check Geeksforgeeks for the algorithm ! The reason i am telling you to learn algorithm is , this kind of problems have such large constrains , your verdict will be TLE if you use your method . So better using Algorithms .
Many answers here are much way complicated. I am going to tell you easy way to find range frequency. You can use the binary search technique to get the answer in O(logn) per query.
For that, use arrays of vector to store the index values of all numbers present in the array and then use lower_bound and upper_bound provided by C++ STL.
Here is C++ Code:
#define MAX 1000010
std::vector<int> v[MAX];
int main(){
cin>>n;
for (int i = 0; i < n; ++i)
{
cin>>a;
v[a].push_back(i);
}
int low = 0, high = 0;
int q; //Number of queries
cin>>q;
while(q--)
{
int l,u,num; //l=lower range, u=upper range, num=the number of which we will count frequency
cin>>l>>u>>num;
low = lower_bound(v[num].begin(), v[num].end(), l) - v[num].begin();
high = upper_bound(v[num].begin(), v[num].end(), u) - v[num].begin();
cout<<(high - low)<<endl;
}
return 0;
}
Overall Time Complexity: O(Q*log n)

find all unique triplet in given array with sum zero with in minimum execution time [duplicate]

This question already has an answer here:
Finding three elements that sum to K
(1 answer)
Closed 7 years ago.
I've got all unique triplets from code below but I want to reduce its time
complexity. It consists of three for loops. So my question is: Is it possible to do in minimum number of loops that it decreases its time complexity?
Thanks in advance. Let me know.
#include <cstdlib>
#include<iostream>
using namespace std;
void Triplet(int[], int, int);
void Triplet(int array[], int n, int sum)
{
// Fix the first element and find other two
for (int i = 0; i < n-2; i++)
{
// Fix the second element and find one
for (int j = i+1; j < n-1; j++)
{
// Fix the third element
for (int k = j+1; k < n; k++)
if (array[i] + array[j] + array[k] == sum)
cout << "Result :\t" << array[i] << " + " << array[j] << " + " << array[k]<<" = " << sum << endl;
}
}
}
int main()
{
int A[] = {-10,-20,30,-5,25,15,-2,12};
int sum = 0;
int arr_size = sizeof(A)/sizeof(A[0]);
cout<<"********************O(N^3) Time Complexity*****************************"<<endl;
Triplet(A,arr_size,sum);
return 0;
}
I'm not a wiz at algorithms but a way I can see making your program better is to do a binary search on your third loop for the value that will give you your sum in conjunction with the 2 previous values. This however requires your data to be sorted beforehand to make it work properly (which obviously has some overhead depending on your sorting algorithm (std::sort has an average time complexity of O (n log n))) .
You can always if you want to make use of parallel programming and make your program run off multiple threads but this can get very messy.
Aside from those suggestions, it is hard to think of a better way.
You can get a slightly better complexity of O(n^2*logn) easily enough if you first sort the list and then do a binary search for the third value. The sort takes O(nlogn) and the triplets search takes O(n^2) to ennumerate all the possible pairs that exist times O(logn) for the binary search of the thrid value for a total of O(nlogn + n^2logn) or simply O(n^2*logn).
There might be some other fancy things with binary search you can do to reduce that, but I can't see easily (at 4:00 am) anything better than that.
When a triplet sums to zero, the third number is completely determined by the two first. Thus, you're only free to choose two of the numbers in each triple. With n possible numbers this yields maximum n2 triplets.
I suspect, but I'm not sure, that that's the best complexity you can do. It's not clear to me whether the number of sum-to-zero triplets, for a random sequence of signed integers, will necessarily be on the order of n2. If it's less (not likely, but if) then it might be possible to do better.
Anyway, a simple way to do this with complexity on the order of n2 is to first scan through the numbers, storing them in a data structure with constant time lookup (the C++ standard library provides such). Then scan through the array as your posted code does, except vary only on the first and second number of the triple. For the third number, look it up in the constant time look-up data structure already established: if it's there then you have a potential new triple, otherwise not.
For each zero-sum triple thus found, put it also in a constant time look-up structure.
This ensures the uniqueness criterion at no extra complexity.
In the worst case, there are C(n, 3) triplets with sum zero in an array of size n. C(n, 3) is in Θ(n³), it takes Θ(n³) time just to print the triplets. In general, you cannot get better than cubic complexity.

Non-standard sorting algorithm for random unique integers

I have an array of at least 2000 random unique integers, each in range 0 < n < 65000.
I have to sort it and then get the index of a random value in the array. Each of these operations have to be as fast as possible. For searching the binary-search seems to serve well.
For sorting I used the standard quick sorting algorithm (qsort), but I was told that with the given information the standard sorting algorithms will not be the most efficient. So the question is simple - what would be the most efficient way to sort the array, with the given information? Totally puzzled by this.
I don't know why the person who told you that would be so peversely cryptic, but indeed qsort is not the most efficient way to sort integers (or generally anything) in C++. Use std::sort instead.
It's possible that you can improve on your implementation's std::sort for the stated special case (2000 distinct random integers in the range 0-65k), but you're unlikely to do a lot better and it almost certainly won't be worth the effort. The things I can think of that might help:
use a quicksort, but with a different pivot selection or a different threshold for switching to insertion sort from what your implementation of sort uses. This is basically tinkering.
use a parallel sort of some kind. 2000 elements is so small that I suspect the time to create additional threads will immediately kill any hope of a performance improvement. But if you're doing a lot of sorts then you can average the cost of creating the threads across all of them, and only worry about the overhead of thread synchronization rather than thread creation.
That said, if you generate and sort the array, then look up just one value in it, and then generate a new array, you would be wasting effort by sorting the whole array each time. You can just run across the array counting the number of values smaller than your target value: this count is the index it would have. Use std::count_if or a short loop.
Each of these operations have to be as fast as possible.
That is not a legitimate software engineering criterion. Almost anything can be made a minuscule bit faster with enough months or years of engineering effort -- nothing complex is ever "as fast as possible", and even if it was you wouldn't be able to prove that it cannot be faster, and even if you could there would be new hardware out there somewhere or soon to be invented for which the fastest solution is different and better. Unless you intend to spend your whole life on this task and ultimately fail, get a more realistic goal ;-)
For sorting uniformly distributed random integers Radix Sort is typically the fastest algorithm, it can be faster than quicksort by a factor of 2 or more. However, it may be hard to find an optimized implementation of that, quick sort is much more ubiquitous. Both Radix Sort and Quick Sort may have very bad worst case performance, like O(N^2), so if worst case performance is important you have to look elsewhere, maybe you pick introsort, which is similar to std::sort in C++.
For array look up a hash table is by far the fasted method. If you don't want yet another data structure, you can always pick binary search. If you have uniformly distributed numbers interpolation search is probably the most effective method (best average performance).
Quicksort's complexity is O(n*log(n)), where n = 2000 in your case. log(2000) = 10.965784.
You can sort in O(n) using one of these algorithms:
Counting sort
Radix sort
Bucket sort
I've compared std::sort() to counting sort for N = 100000000:
#include <iostream>
#include <vector>
#include <algorithm>
#include <time.h>
#include <string.h>
using namespace std;
void countSort(int t[], int o[], int c[], int n, int k)
{
// Count the number of each number in t[] and place that value into c[].
for (int i = 0; i < n; i++)
c[t[i]]++;
// Place the number of elements less than each value at i into c[].
for (int i = 1; i <= k; i++)
c[i] += c[i - 1];
// Place each element of t[] into its correct sorted position in the output o[].
for (int i = n - 1; i >= 0; i--)
{
o[c[t[i]] - 1] = t[i];
--c[t[i]];
}
}
void init(int t[], int n, int max)
{
for (int i = 0; i < n; i++)
t[i] = rand() % max;
}
double getSeconds(clock_t start)
{
return (double) (clock() - start) / CLOCKS_PER_SEC;
}
void print(int t[], int n)
{
for (int i = 0; i < n; i++)
cout << t[i] << " ";
cout << endl;
}
int main()
{
const int N = 100000000;
const int MAX = 65000;
int *t = new int[N];
init(t, N, MAX);
//print(t, N);
clock_t start = clock();
sort(t, t + N);
cout << "std::sort " << getSeconds(start) << endl;
//print(t, N);
init(t, N, MAX);
//print(t, N);
// o[] holds the sorted output.
int *o = new int[N];
// c[] holds counters.
int *c = new int[MAX + 1];
// Set counters to zero.
memset(c, 0, (MAX + 1) * sizeof(*c));
start = clock();
countSort(t, o, c, N, MAX);
cout << "countSort " << getSeconds(start) << endl;
//print(o, N);
delete[] t;
delete[] o;
delete[] c;
return 0;
}
Results (in seconds):
std::sort 28.6
countSort 10.97
For N = 2000 both algorithms give 0 time.
Standard sorting algorithms, as well as standard nearly anything, are very good general purpose solution. If you know nothing about your data, if it truly consists of "random unique integers", then you might as well go with one of the standard implementations.
On the other hand, most programming problems appear in a context that tells something about data, and the additional info usually leads to more efficient problem-specific solutions.
For example, does your data appear all at once or in chunks? If it comes piecemeal you may speed things up by interleaving incremental sorting, such as dual-pivot quicksort, with data acquisition.
Since the domain of your numbers is so small, you can create an array of 65000 entries, set the index of the numbers you see to one, and then collect all numbers that are set to one as your sorted array. This will be exactly 67000 (assuming initialization of array is without cost) iterations in total.
Since the lists contain 2000 entries, O(n*log(n)) will probably be faster. I can think of no other O(n) algorithm for this, so I suppose you are better off with a general purpose algorithm.

Big O calculation

int maxValue = m[0][0];
for (int i = 0; i < N; i++)
{
for (int j = 0; j < N; j++)
{
if ( m[i][j] >maxValue )
{
maxValue = m[i][j];
}
}
}
cout<<maxValue<<endl;
int sum = 0;
for (int i = 0; i < N; i++)
{
for (int j = 0; j < N; j++)
{
sum = sum + m[i][j];
}
}
cout<< sum <<endl;
For the above mentioned code I got O(n2) as the execution time growth
They way I got it was by:
MAX [O(1) , O(n2), O(1) , O(1) , O(n2), O(1)]
both O(n2) is for for loops. Is this calculation correct?
If I change this code as:
int maxValue = m[0][0];
int sum = 0;
for (int i = 0; i < N; i++)
{
for (int j = 0; j < N; j++)
{
if ( m[i][j] > maxValue )
{
maxValue = m[i][j];
}
sum += m[i][j];
}
}
cout<<maxValue<<endl;
cout<< sum <<endl;
Still Big O would be O(n2) right?
So does that mean Big O just an indication on how time will grow according to the input data size? and not how algorithm written?
This feels a bit like a homework question to me, but...
Big-Oh is about the algorithm, and specifically how the number of steps performed (or the amount of memory used) by the algorithm grows as the size of the input data grows.
In your case, you are taking N to be the size of the input, and it's confusing because you have a two-dimensional array, NxN. So really, since your algorithm only makes one or two passes over this data, you could call it O(n), where in this case n is the size of your two-dimensional input.
But to answer the heart of your question, your first code makes two passes over the data, and your second code does the same work in a single pass. However, the idea of Big-Oh is that it should give you the order of growth, which means independent of exactly how fast a particular computer runs. So, it might be that my computer is twice as fast as yours, so I can run your first code in about the same time as you run the second code. So we want to ignore those kinds of differences and say that both algorithms make a fixed number of passes over the data, so for the purposes of "order of growth", one pass, two passes, three passes, it doesn't matter. It's all about the same as one pass.
It's probably easier to think about this without thinking about the NxN input. Just think about a single list of N numbers, and say you want to do something to it, like find the max value, or sort the list. If you have 100 items in your list, you can find the max in 100 steps, and if you have 1000 items, you can do it in 1000 steps. So the order of growth is linear with the size of the input: O(n). On the other hand, if you want to sort it, you might write an algorithm that makes roughly a full pass over the data each time it finds the next item to be inserted, and it has to do that roughly once for each element in the list, so that's making n passes over your list of length n, so that's O(n^2). If you have 100 items in your list, that's roughly 10^4 steps, and if you have 1000 items in your list that's roughly 10^6 steps. So the idea is that those numbers grow really fast in comparison to the size of your input, so even if I have a much faster computer (e.g., a model 10 years better than yours), I might be able to to beat you in the max problem even with a list 2 or 10 or even 100 or 1000 times as long. But for the sorting problem with a O(n^2) algorithm, I won't be able to beat you when I try to take on a list that's 100 or 1000 times as long, even with a computer 10 or 20 years better than yours. That's the idea of Big-Oh, to factor out those "relatively unimportant" speed differences and be able to see what amount of work, in a more general/theoretical sense, a given algorithm does on a given input size.
Of course, in real life, it may make a huge difference to you that one computer is 100 times faster than another. If you are trying to solve a particular problem with a fixed maximum input size, and your code is running at 1/10 the speed that your boss is demanding, and you get a new computer that runs 10 times faster, your problem is solved without needing to write a better algorithm. But the point is that if you ever wanted to handle larger (much larger) data sets, you couldn't just wait for a faster computer.
The big O notation is an upper bound to the maximum amount of time taken to execute the algorithm based on the input size. So basically two algorithms can have slightly varying maximum running time but same big O notation.
what you need to understand is that for a running time function that is linear based on input size will have big o notation as o(n) and a quadratic function will always have big o notation as o(n^2).
so if your running time is just n, that is one linear pass, big o notation stays o(n) and if your running time is 6n+c that is 6 linear passes and a constant time c it still is o(n).
Now in the above case the second code is more optimized as the number of times you need to make the skip to memory locations for the loop is less. and hence this will give a better execution. but both the code would still have the asymptotic running time as o(n^2).
Yes, it's O(N^2) in both cases. Of course O() time complexity depends on how you have written your algorithm, but both the versions above are O(N^2). However, note that actually N^2 is the size of your input data (it's an N x N matrix), so this would be better characterized as a linear time algorithm O(n) where n is the size of the input, i.e. n = N x N.