Top K smallest selection algorithm - O (n + k log n) vs O (n log k) for k << N - c++

I'm asking this in regards to Top K algorithm. I'd think that O(n + k log n) should be faster, because well.. for instance if you try plugging in k = 300 and n = 100000000 for example, we can see that O(n + k log n) is smaller.
However when I do a benchmark with C++, it's showing me that O (n log k) is more than 2x faster. Here's the complete benchmarking program:
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <ctime>
#include <cstdlib>
using namespace std;
int RandomNumber () { return rand(); }
vector<int> find_topk(int arr[], int k, int n)
{
make_heap(arr, arr + n, greater<int>());
vector<int> result(k);
for (int i = 0; i < k; ++i)
{
result[i] = arr[0];
pop_heap(arr, arr + n - i, greater<int>());
}
return result;
}
vector<int> find_topk2(int arr[], int k, int n)
{
make_heap(arr, arr + k, less<int>());
for (int i = k; i < n; ++i)
{
if (arr[i] < arr[0])
{
pop_heap(arr, arr + k, less<int>());
arr[k - 1] = arr[i];
push_heap(arr, arr + k, less<int>());
}
}
vector<int> result(arr, arr + k);
return result;
}
int main()
{
const int n = 220000000;
const int k = 300;
srand (time(0));
int* arr = new int[n];
generate(arr, arr + n, RandomNumber);
// replace with topk or topk2
vector<int> result = find_topk2(arr, k, n);
copy(result.begin(), result.end(), ostream_iterator<int>(cout, "\n"));
return 0;
}
find_topk 's approach is to build a complete heap of size n, in O(n) and then remove the top element of heap k times O(log n).
find_topk2 's approach is to build a heap of size k (O(k)) such that max element is at the top, and then from k to n, compare to see if any element is smaller than top element, and if so pop the top element, and push the new element which would mean n times O(log k).
Both approach are written quite similarly so I don't believe any implementation detail (like creating temporaries etc.) can cause a difference besides the algo and the dataset (which is random).
I could actually profile the results of the benchmark and could see that find_topk actually called the comparison operator many more times than find_topk2. But I'm interested more in the reasoning of the theoretical complexity.. so two questions.
Disregarding the implementation or benchmark, was I wrong in expecting that O(n + k log n) should be better than O(n log k)? If I'm wrong, please explain why and how to reason such that I can see O(n log k) is actually better.
If I'm not wrong to expect no 1. Then why is my benchmark showing otherwise?

Big O in several variables is complex, since you need assumptions on how your variables scale with one another, so you can take unambiguously the limit to infinity.
If eg. k ~ n^(1/2), then O(n log k) becomes O(n log n) and O(n + k log n) becomes O(n + n^(1/2) log n) = O(n), which is better.
If k ~ log n, then O(n log k) = O(n log log n) and O(n + k log n) = O(n), which is better. Note that log log 2^1024 = 10, so the constants hidden in the O(n) may be greater than log log n for any realistic n.
If k = constant, then O(n log k) = O(n) and O(n + k log n) = O(n), which is the same.
But the constants play a big role: for instance, building a heap may involve reading the array 3 times, whereas building a priority queue of length k as you go only requires one pass through the array, and a small constant times log k for the lookup.
Which is "better" is therefore unclear, although my quick analysis tended to show that O(n + k log n) performs better under mild assumptions on k.
For instance, if k is a very small constant (say k = 3), then I'm ready to bet that the make_heap approach performs worse than the priority queue one on real world data.
Use asymptotic analysis wisely, and above all, profile your code before drawing conclusions.

You are comparing two worst case upper bounds. For the first approach, the worst case is pretty much equal to the average case. For the second case, if the input is random, by the time you have passed more than a handful of items into the heap, the chance of throwing away the new value at once because it is not going to replace any of the top K is pretty high, so the worst case estimate for this is pessimistic.
If you are comparing wall clock time as opposed to comparisons you may find that heap based algorithms with large heaps tend not to win many races because they have horrible storage locality - and constant factors on modern microprocessors are heavily influenced by what level of memory you end up working in - finding your data is out in real memory chips (or worse, on disk) and not some level of cache will slow you down a lot - which is a shame because I really like heapsort.

Keep in mind that you can now use std::nth_element instead of having to use a heap and do things yourself. Since the default comparator operator is std::less<>(), you can say something like this:
std::nth_element(myList.begin(), myList.begin() + k, myList.end());
Now, myList from positions 0 to k will be the smallest k elements.

Related

Why is my quick-sort slower than merge-sort?

This is my implementation. When I put 100 hundred arrays size 1000000, it sorting for 300 sec.
My another algorithm, merge sort do it for 40 sec.
I wonder, if there are some things that could slow my algorithm.
template <typename TYP> void quick_sort(TYP *tab, int poczatek, int koniec) {
int i = poczatek;
int j = poczatek;
int srodek = (poczatek + koniec) / 2;
int piwot = tab[srodek];
swap(tab[srodek], tab[koniec]);
for (i = poczatek; i < koniec; i++) {
if (tab[i] < piwot) {
swap(tab[i], tab[j]);
j++;
}
}
swap(tab[koniec], tab[j]);
if (poczatek < j - 1)
quick_sort(tab, poczatek, j - 1);
if (j + 1 < koniec)
quick_sort(tab, j + 1, koniec);
}
The quick-sort has an average run-time of O(n log(n)) but a worst-case complexity of O(n^2) if the pivot is poorly chosen. Regarding your input array, the pivot you choose can be really bad. To prevent this, you can implement an Introsort. Moreover, you can use a better method to choose the pivot: the median-of-three rule for example.
Moreover, quick-sort is slow for small arrays. You can significantly improve its performance using an insertion-sort for arrays smaller than 15 for example. The last recursive calls will be faster resulting in an overall faster execution.
Finally, your quick-sort use the Lomuto partition scheme which is probably not the most efficient. You can try to use the Hoare's partition scheme.

What is the time complexity of this code? Is it O(logn) or O(loglogn)?

int n = 8; // In the video n = 8
int p = 0;
for (int i = 1; i < n; i *= 2) { // In the video i = 1
p++;
}
for (int j = 1; j < p; j *= 2) { // In the video j = 1
//code;
}
This is code from Abdul Bari Youtube channel ( link of the video), they said time complexity of this is O(loglogn) but I think it is O(log), what is the correct answer?
Fix the initial value. 0 multiplied by 2 will never end the loop.
The last loop is O(log log N) because p == log(n). However, the first loop is O(log N), hence in total it is also O(log N).
On the other hand, once you put some code in place of //code then the first loop can be negligible compared to the second and we have:
O ( log N + X * log log N)
^ first loop
^ second loop
and when X is just big enough, one can consider it as O( log log N) in total. However strictly speaking that is wrong, because complexity is about asymptotic behavior and no matter how big X, for N going to infinity, log N will always be bigger than X * log log N at some point.
PS: I assumed that //code does not depend on N, ie it has constant complexity. The above consideration changes if this is not the case.
PPS: In general complexity is important when designing algorithms. When using an algorithm it is rather irrelevant. In that case you rather care about actual runtime for your specific value of N. Complexity can be misleading and even lead to wrong expectations for a specific use case with given N.
You are correct, the time complexity of the complete code is O(log(n)).
But, Abdul Bari Sir is also correct, Because:-
In the video, Abdul Sir is trying to find the time complexity of the second for loop and not the time complexity of the whole code. Take a look at the video again and listen properly what he is saying at this time https://youtu.be/9SgLBjXqwd4?t=568
Once again, what he has derived is the time complexity of the second loop and not the time complexity of the complete code. Please listen to what he says at 9 mins and 28 secs in the video.
If your confusion is clear, please mark this as correct.
The time complexity of
int n;
int p = 0;
for (int i = 1; i < n; i *= 2) { // start at 1, not at 0
p++;
}
is O(log(n)), because you do p++ log2(n) times. The logarithms base does not matter in big O notation, because it just scales by a constant.
for (int j = 1; j < p; j *= 2) {
//code;
}
has O(log(log(n)), because you only loop up to p=log(n) by multiplying, so you have O(log(p)), so O(log(log(n)).
However, both together still are O(log(n)), because O(log(n)+log(log(n)))=O(log(n)

Is the Time Complexity of this code O(N)?

Is the Time Complexity of this code O(N) ?
(in this code I want to find Kth largest element of array)
class Solution {
public:
int findKthLargest(vector<int>& nums, int k) {
make_heap(nums.begin(), nums.end());
for (int i = 0; i < k - 1; i ++) {
pop_heap(nums.begin(), nums.end());
nums.pop_back();
}
return nums.front();
}
};
Depends.
Because make_heap is already at O(n), and each loop is at O(log n), the total time complexity for your algorithm is O(n + k log n). With a small k or a "good" set of data, the result is roughly O(n) and the constant behind the O() mark, but with a large k (near or surpassing n/2) or random data, it's O(n log n).
Also I'd like to point out that this code is modifying the original array (using reference for passing arguments), which isn't often a good practice.
log is 2-based in this post

Is my heap sort algorithm time complexity analysis correct?

Here is the algorithm:
void heapSort(int * arr, int startIndex, int endIndex)
{
minHeap<int> h(endIndex + 1);
for (int i = 0; i < endIndex + 1; i++)
h.insert(arr[i]);
for (int i = 0; i < endIndex + 1; i++)
arr[i] = h.deleteAndReturnMin();
}
The methods insert() and deleteAndReturnMin() are both O(log n). endIndex + 1 can be referred to as n number of elements. So given that information, am I correct in saying the first and the second loop are both O(n log n) and thus the time complexity for the whole algorithm is O(n log n)? And to be more precise, would the total time complexity be O(2(n log n)) (not including the initializations)? I'm learning about big O notation and time complexity so I just want to make sure I'm understanding it correctly.
Your analysis is correct.
Given that the two methods you have provided are logarithmic time, your entire runtime is logarithmic time iterated over n elements O(n log n) total. You should also realize that Big-O notation ignores constant factors, so the factor of 2 is meaningless.
Note that there is a bug in your code. The inputs seem to suggest that the array starts from startIndex, but startIndex is completely ignored in the implementation.
You can fix this by changing the size of the heap to endIndex + 1 - startIndex and looping from int i = startIndex.

Implementation of suffix array in c++

#include<iostream>
#include<string.h>
#include<utility>
#include<algorithm>
using namespace std;
struct xx
{
string x;
short int d;
int lcp;
};
bool compare(const xx a,const xx b)
{
return a.x<b.x;
}
int findlcp(string a,string b)
{
int i=0,j=0,k=0;
while(i<a.length() && j<b.length())
{
if(a[i]==b[j])
{
k++;
i++;
j++;
}
else
{
break;
}
}
return k;
}
int main()
{
string a="banana";
xx b[100];
a=a+'$';
int len=a.length();
for(int i=0;i<len;i++)
{
b[i].x=a.substr(i);
b[i].d=i+1;
}
sort(b,b+len,compare);
for(int i=0;i<len;i++)
cout<<b[i].x<<" "<<b[i].d<<endl;
b[0].lcp=0;
b[1].lcp=0;
for(int i=2;i<len;i++)
{
b[i].lcp=findlcp(b[i].x,b[i-1].x);
}
for(int i=0;i<len;i++)
cout<<b[i].d<<" "<<b[i].lcp<<endl;
}
This is a implementation of
Suffix Array. What my question is in the wikipedia article construction is given as o(n) in worst case
So in my construction:
I am sorting all the suffixes of the string using stl sort .This may at least a O(nlogn) in worst case.So here i am violating O(n) construction.
Second one is in constructing a longest common prefix array construction is given O(n).But i think my implementation in O(n^2)
So for the 1st one i.e for the sorting
If i use count sort i may decrease to O(n).If i use Count sort is it correct?Is my understanding is correct?let me know if my understanding is wrong
And is there any way to find LCP in O(n) time?
First, regarding your two statements:
1) I am sorting all the suffixes of the string using stl sort. This may at least a O(nlogn) in worst case. So here i am violating O(n) construction.
The complexity of std::sort here is worse than O(n log n). The reason is that O(n log n) assumes that there are O(n log n) individual comparisons, and that each comparison is performed in O(1) time. The latter assumption is wrong, because you are sorting strings, not atomic items (like characters or integers).
Since the length of the string items, being substrings of the main string, is O(n), it would be safe to say that the worst-case complexity of your sorting algorithm is O(n2 log n).
2) Second one is in constructing a longest common prefix array construction is given O(n).But i think my implementation in O(n^2)
Yes, your construction of the LCP array is O(n2) because you are running your lcp function n == len times, and your lcp function requires O(min(len(x),len(y))) time for a pair of strings x, y.
Next, regarding your questions:
If I use count sort I may decrease to O(n). If I use Count sort is it correct? Is my understanding is correct? Let me know if my understanding is wrong.
Unfortunately, your understanding is incorrect. Counting sort is only linear if you can, in O(1) time, get access to an atomic key for each item you want to sort. Again, the items are strings O(n) characters in length, so this won't work.
And is there any way to find LCP in O(n) time?
Yes. Recent algorithms for suffix array computation, including the DC algorithm (aka Skew algorithm), provide for methods to calculate the LCP array along with the suffix array, and do so in O(n) time.
The reference for the DC algorithm is Juha Kärkkäinen, Peter Sanders: Simple linear work suffix array construction, Automata, Languages and Programming
Lecture Notes in Computer Science Volume 2719, 2003, pp 943-955 (DOI 10.1007/3-540-45061-0_73). (But this is not the only algorithm that allows you to do this in linear time.)
You may also want to take a look at the open-source implementations mentioned in this SO post: What's the current state-of-the-art suffix array construction algorithm?. Many of the algorithms used there enable linear-time LCP array construction in addition to the suffix-array construction (but not all of the implementations there may actually include an implementation of that; I am not sure).
If you are ok with examples in Java, you may also want to look at the code for jSuffixArrays. It includes, among other algorithms, an implementation of the DC algorithm along with LCP array construction in linear time.
jogojapan has comprehensively answered your question. Just to mention an optimized cpp implementation, you might want to take a look at here.
Posting the code here in case GitHub goes down.
const int N = 1000 * 100 + 5; //max string length
namespace Suffix{
int sa[N], rank[N], lcp[N], gap, S;
bool cmp(int x, int y) {
if(rank[x] != rank[y])
return rank[x] < rank[y];
x += gap, y += gap;
return (x < S && y < S)? rank[x] < rank[y]: x > y;
}
void Sa_build(const string &s) {
S = s.size();
int tmp[N] = {0};
for(int i = 0;i < S;++i)
rank[i] = s[i],
sa[i] = i;
for(gap = 1;;gap <<= 1) {
sort(sa, sa + S, cmp);
for(int i = 1;i < S;++i)
tmp[i] = tmp[i - 1] + cmp(sa[i - 1], sa[i]);
for(int i = 0;i < S;++i)
rank[sa[i]] = tmp[i];
if(tmp[S - 1] == S - 1)
break;
}
}
void Lcp_build() {
for(int i = 0, k = 0;i < S;++i, --k)
if(rank[i] != S - 1) {
k = max(k, 0);
while(s[i + k] == s[sa[rank[i] + 1] + k])
++k;
lcp[rank[i]] = k;
}
else
k = 0;
}
};