Is the Time Complexity of this code O(N)? - c++

Is the Time Complexity of this code O(N) ?
(in this code I want to find Kth largest element of array)
class Solution {
public:
int findKthLargest(vector<int>& nums, int k) {
make_heap(nums.begin(), nums.end());
for (int i = 0; i < k - 1; i ++) {
pop_heap(nums.begin(), nums.end());
nums.pop_back();
}
return nums.front();
}
};

Depends.
Because make_heap is already at O(n), and each loop is at O(log n), the total time complexity for your algorithm is O(n + k log n). With a small k or a "good" set of data, the result is roughly O(n) and the constant behind the O() mark, but with a large k (near or surpassing n/2) or random data, it's O(n log n).
Also I'd like to point out that this code is modifying the original array (using reference for passing arguments), which isn't often a good practice.
log is 2-based in this post

Related

Is my heap sort algorithm time complexity analysis correct?

Here is the algorithm:
void heapSort(int * arr, int startIndex, int endIndex)
{
minHeap<int> h(endIndex + 1);
for (int i = 0; i < endIndex + 1; i++)
h.insert(arr[i]);
for (int i = 0; i < endIndex + 1; i++)
arr[i] = h.deleteAndReturnMin();
}
The methods insert() and deleteAndReturnMin() are both O(log n). endIndex + 1 can be referred to as n number of elements. So given that information, am I correct in saying the first and the second loop are both O(n log n) and thus the time complexity for the whole algorithm is O(n log n)? And to be more precise, would the total time complexity be O(2(n log n)) (not including the initializations)? I'm learning about big O notation and time complexity so I just want to make sure I'm understanding it correctly.
Your analysis is correct.
Given that the two methods you have provided are logarithmic time, your entire runtime is logarithmic time iterated over n elements O(n log n) total. You should also realize that Big-O notation ignores constant factors, so the factor of 2 is meaningless.
Note that there is a bug in your code. The inputs seem to suggest that the array starts from startIndex, but startIndex is completely ignored in the implementation.
You can fix this by changing the size of the heap to endIndex + 1 - startIndex and looping from int i = startIndex.

Implementation of suffix array in c++

#include<iostream>
#include<string.h>
#include<utility>
#include<algorithm>
using namespace std;
struct xx
{
string x;
short int d;
int lcp;
};
bool compare(const xx a,const xx b)
{
return a.x<b.x;
}
int findlcp(string a,string b)
{
int i=0,j=0,k=0;
while(i<a.length() && j<b.length())
{
if(a[i]==b[j])
{
k++;
i++;
j++;
}
else
{
break;
}
}
return k;
}
int main()
{
string a="banana";
xx b[100];
a=a+'$';
int len=a.length();
for(int i=0;i<len;i++)
{
b[i].x=a.substr(i);
b[i].d=i+1;
}
sort(b,b+len,compare);
for(int i=0;i<len;i++)
cout<<b[i].x<<" "<<b[i].d<<endl;
b[0].lcp=0;
b[1].lcp=0;
for(int i=2;i<len;i++)
{
b[i].lcp=findlcp(b[i].x,b[i-1].x);
}
for(int i=0;i<len;i++)
cout<<b[i].d<<" "<<b[i].lcp<<endl;
}
This is a implementation of
Suffix Array. What my question is in the wikipedia article construction is given as o(n) in worst case
So in my construction:
I am sorting all the suffixes of the string using stl sort .This may at least a O(nlogn) in worst case.So here i am violating O(n) construction.
Second one is in constructing a longest common prefix array construction is given O(n).But i think my implementation in O(n^2)
So for the 1st one i.e for the sorting
If i use count sort i may decrease to O(n).If i use Count sort is it correct?Is my understanding is correct?let me know if my understanding is wrong
And is there any way to find LCP in O(n) time?
First, regarding your two statements:
1) I am sorting all the suffixes of the string using stl sort. This may at least a O(nlogn) in worst case. So here i am violating O(n) construction.
The complexity of std::sort here is worse than O(n log n). The reason is that O(n log n) assumes that there are O(n log n) individual comparisons, and that each comparison is performed in O(1) time. The latter assumption is wrong, because you are sorting strings, not atomic items (like characters or integers).
Since the length of the string items, being substrings of the main string, is O(n), it would be safe to say that the worst-case complexity of your sorting algorithm is O(n2 log n).
2) Second one is in constructing a longest common prefix array construction is given O(n).But i think my implementation in O(n^2)
Yes, your construction of the LCP array is O(n2) because you are running your lcp function n == len times, and your lcp function requires O(min(len(x),len(y))) time for a pair of strings x, y.
Next, regarding your questions:
If I use count sort I may decrease to O(n). If I use Count sort is it correct? Is my understanding is correct? Let me know if my understanding is wrong.
Unfortunately, your understanding is incorrect. Counting sort is only linear if you can, in O(1) time, get access to an atomic key for each item you want to sort. Again, the items are strings O(n) characters in length, so this won't work.
And is there any way to find LCP in O(n) time?
Yes. Recent algorithms for suffix array computation, including the DC algorithm (aka Skew algorithm), provide for methods to calculate the LCP array along with the suffix array, and do so in O(n) time.
The reference for the DC algorithm is Juha Kärkkäinen, Peter Sanders: Simple linear work suffix array construction, Automata, Languages and Programming
Lecture Notes in Computer Science Volume 2719, 2003, pp 943-955 (DOI 10.1007/3-540-45061-0_73). (But this is not the only algorithm that allows you to do this in linear time.)
You may also want to take a look at the open-source implementations mentioned in this SO post: What's the current state-of-the-art suffix array construction algorithm?. Many of the algorithms used there enable linear-time LCP array construction in addition to the suffix-array construction (but not all of the implementations there may actually include an implementation of that; I am not sure).
If you are ok with examples in Java, you may also want to look at the code for jSuffixArrays. It includes, among other algorithms, an implementation of the DC algorithm along with LCP array construction in linear time.
jogojapan has comprehensively answered your question. Just to mention an optimized cpp implementation, you might want to take a look at here.
Posting the code here in case GitHub goes down.
const int N = 1000 * 100 + 5; //max string length
namespace Suffix{
int sa[N], rank[N], lcp[N], gap, S;
bool cmp(int x, int y) {
if(rank[x] != rank[y])
return rank[x] < rank[y];
x += gap, y += gap;
return (x < S && y < S)? rank[x] < rank[y]: x > y;
}
void Sa_build(const string &s) {
S = s.size();
int tmp[N] = {0};
for(int i = 0;i < S;++i)
rank[i] = s[i],
sa[i] = i;
for(gap = 1;;gap <<= 1) {
sort(sa, sa + S, cmp);
for(int i = 1;i < S;++i)
tmp[i] = tmp[i - 1] + cmp(sa[i - 1], sa[i]);
for(int i = 0;i < S;++i)
rank[sa[i]] = tmp[i];
if(tmp[S - 1] == S - 1)
break;
}
}
void Lcp_build() {
for(int i = 0, k = 0;i < S;++i, --k)
if(rank[i] != S - 1) {
k = max(k, 0);
while(s[i + k] == s[sa[rank[i] + 1] + k])
++k;
lcp[rank[i]] = k;
}
else
k = 0;
}
};

Efficient way to count number of swaps to insertion sort an array of integers in increasing order

Given an array of values of length n, is there a way to count the number of swaps that would be performed by insertion sort to sort that array in time better than O(n2)?
For example :
arr[]={2 ,1, 3, 1, 2}; // Answer is 4.
Algorithm:
for i <- 2 to N
j <- i
while j > 1 and a[j] < a[j - 1]
swap a[j] and a[j - 1] //I want to count this swaps?
j <- j - 1
If you want to count the number of swaps needed in insertion sort, then you want to find the following number: for each element, how many previous elements inn the array are smaller than it? The sum of these values is then the total number of swaps performed.
To find the number, you can use an order statistic tree, a balanced binary search tree that can efficiently tell you how many elements in the tree are smaller then some given element. Specifically, an orde statistic tree supports O(log n) insertion, deletion, lookup, and count of how many elements in the tree are less than some value. You can then count how many swaps will be performed as follows:
Initialize a new, empty order statistic tree.
Set count = 0
For each array element, in order:
Add the element to the order statistic tree.
Add to count the number of elements in the tree less than the value added.
Return count,
This does O(n) iterations of a loop that takes O(log n) time, so the total work done is O(n log n), which is faster than the brute-force approach.
If you want to count the number of swaps in selection sort, then you can use the fact that insertion sort will only perform a swap on the kth pass if, after processing the first k-1 elements of the list, the element in position k is not the kth smallest element. If you can do this efficiently, then we have the following basic sketch of an algorithm:
Set total = 0
For k = 1 to n:
If the element at index k isn't the kth largest element:
Swap it with the kth largest element.
Increment total
Return total
So how do we implement this efficiently? We need to efficiently be able to check whether the element at a given index is the correct element, and also need to efficiently find the position of the element that really does belong at a given index otherwise. To do this, begin by creating a balanced binary search tree that maps each element to its position in the original array. This takes time O(n log n). Now that you have the balanced tree, we can augment the structure by assigning to each element in the tree the position in the sorted sequence that this element belongs. One way to do this is with an order statistic tree, and another would be to iterate over the tree with an inorder traversal, annotating each value in the tree with its position.
Using this structure, we can check in O(log n) time whether or not an element is in the right position by looking the element up in the tree (time O(log n)), then looking at the position in the sorted sequence at which it should be and at which position it's currently located (remember that we set this up when creating the tree). If it disagrees with our expected position, then it's in the wrong place, and otherwise it's in the right place. Also, we can efficiently simulate a swap of two elements by looking up those two elements in the tree (O(log n) time total) and then swapping their positions in O(1).
As a result, we can implement the above algorithm in time O(n log n) - O(n log n) time to build the tree, then n iterations of doing O(log n) work to determine whether or not to swap.
Hope this helps!
The number of interchanges of consecutive elements necessary to arrange them in their natural order is equal to the number of inversions in the given permutation.
So the solution to this problem is to find the number of inversions in the given array of numbers.
This can be solved in O(n log n) using merge sort.
In the merge step, if you copy an element from the right array, increment a global counter (that counts inversions) by the number of items remaining in the left array. This is done because the element from the right array that just got copied is involved in an inversion with all the elements in present in the left array.
I'm not sure, but I suspect finding the minimum number is a difficult problem. Unless there's a shortcut, you'll just be searching for optimal sorting networks, which you should be able to find good resources on with your favorite search engine (or Wikipedia).
If you only care about the big-O complexity, the answer is O(n log n), and you can probably get more concrete bounds (some actual constants in there) if you look at the analysis of some efficient in-place sorting algorithms like heapsort or smoothsort.
package insertoinSortAnalysis;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class Solution {
private int[] originalArray;
public static void main(String[] args) {
Scanner sc;
try {
sc = new Scanner(System.in);
int TestCases = sc.nextInt();
for (int i = 0; i < TestCases; i++) {
int sizeofarray = sc.nextInt();
Solution s = new Solution();
s.originalArray = new int[sizeofarray];
for (int j = 0; j < sizeofarray; j++)
s.originalArray[j] = sc.nextInt();
s.devide(s.originalArray, 0, sizeofarray - 1);
System.out.println(s.count);
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public int[] devide(int[] originalArray, int low, int high) {
if (low < high) {
int mid = (low + high) / 2;
int[] result1 = devide(originalArray, low, mid);
int[] result2 = devide(originalArray, mid + 1, high);
return merge(result1, result2);
}
int[] result = { originalArray[low] };
return result;
}
private long count = 0;
private int[] merge(int[] array1, int[] array2) {
int lowIndex1 = 0;
int lowIndex2 = 0;
int highIndex1 = array1.length - 1;
int highIndex2 = array2.length - 1;
int result[] = new int[array1.length + array2.length];
int i = 0;
while (lowIndex2 <= highIndex2 && lowIndex1 <= highIndex1) {
int element = array1[lowIndex1];
while (lowIndex2 <= highIndex2 && element > array2[lowIndex2]) {
result[i++] = array2[lowIndex2++];
count += ((highIndex1 - lowIndex1) + 1);
}
result[i++] = element;
lowIndex1++;
}
while (lowIndex2 <= highIndex2 && lowIndex1 > highIndex1) {
result[i++] = array2[lowIndex2++];
}
while (lowIndex1 <= highIndex1 && lowIndex2 > highIndex2) {
result[i++] = array1[lowIndex1++];
}
return result;
}
}
Each swap in the insertion sort moves two adjacent elements - one up by one, one down by one - and `corrects' a single crossing by doing so. So:
Annotate each item, X, with its initial array index, Xi.
Sort the items using a stable sort (you can use quicksort if you treat the `initial position' annotation as a minor key)
Return half the sum of the absolute differences between each element's annotated initial position and its final position (i.e. just loop through the annotations summing abs(Xi - i)).
Just like most of the other answers, this is O(n) space and O(n*log n) time. If an in-place merge could be modified to count the crossings, that'd be better. I'm not sure it can though.
#include<stdio.h>
#include<string.h>
#include<iostream>
#include<algorithm>
using namespace std;
int a[200001];
int te[200001];
unsigned long long merge(int arr[],int temp[],int left,int mid,int right)
{
int i=left;
int j=mid;
int k=left;
unsigned long long int icount=0;
while((i<=mid-1) && (j<=right))
{
if(arr[i]<=arr[j])
temp[k++]=arr[i++];
else
{
temp[k++]=arr[j++];
icount+=(mid-i);
}
}
while(i<=mid-1)
temp[k++]=arr[i++];
while(j<=right)
temp[k++]=arr[j++];
for(int i=left;i<=right;i++)
arr[i]=temp[i];
return icount;
}
unsigned long long int mergesort(int arr[],int temp[],int left,int right)
{
unsigned long long int i=0;
if(right>left){
int mid=(left+right)/2;
i=mergesort(arr,temp,left,mid);
i+=mergesort(arr,temp,mid+1,right);
i+=merge(arr,temp,left,mid+1,right);
}
return i;
}
int main()
{
int t,n;
scanf("%d",&t);
while(t--){
scanf("%d",&n);
for(int i=0;i<n;i++){
scanf("%d",&a[i]);
}
printf("%llu\n",mergesort(a,te,0,n-1));
}
return 0;
}

complexity analysis of algorithm

here is code, which fills two dimensional array with random genarated numbers in range [1 19] without duplication, my question is: how to determine it's complexity?
For example, I see that its running time is at least O(n^2), because of its inner and outer cycles, but that about the goto statement?
Here is my code:
#include <iostream>
#include <set>
#include <cstdlib>
using namespace std;
int main()
{
int min=1;
int max=19;
int a[3][3];
set<int>b;
for (int i=0; i<3; i++)
{
for (int j=0; j<3; j++)
{
loop:
int m=min+rand()%(max-min);
if (b.find(m)==b.end())
{
a[i][j]=m;
b.insert(m);
}
else
goto loop;
}
}
for (int i=0; i<3; i++)
{
for (int j=0; j<3; j++)
cout<< a[i][j]<<" ";
cout<<endl;
}
return 0;
}
I would say that complexity of algorithm is c*O(n^2) where c is some constant, it is because if it finds duplicated element inside cycles it repeats generation of random numbers and takes some constant time, am I right?
As the likelihood of getting a working number decreases, the number of goto-loops increases.
For a uniform random number generator, the behavior is linear with respect to the number of.. numbers. It definitely doesn't add a constant to your complexity.
If n is the number of elements in a, then it'll on average scale with O(n²). (or if n is the number of rows in the square matrix a; O(n⁴)).
A much simpler implementation would be using Fisher-Yates shuffle
It's O(infinity). The O notation gives an upper bound. Because of your use of rand() in a loop, there's no guarantee that you will make progress. Therefore, no upper bound exists.
[edit]
Ok, people also want other complexities than the conventional, worst-case complexity.
The worst-case complexity obtained by assuming that the RNG generates an infinite series of ones; this means that even the first loop iteration doesn't finish. Therefore there's no finite upper bound on the run time, O(infinity).
The best-case complexity is obtained by assuming that the RNG generates sequential numbers. That means the cost of each iteration is O(log N) (set::find), and there are O(N)*O(N) iterations, so the upper bound is O(N2 log N).
The average case complexity is harder. Assuming that max = k*N*N for some k > 1, the RNG will succesfully pick an "unused" number in O(1) time. Even after N*N numbers are chosen, there are still (k-1) unused numbers, so the chance p of picking an unused number is p >= (k-1)*(N*N)/k*(N*N) <=> p>= (k-1)/k. That means we can expect to pick an unused number in k/(k-1) attempts, which is independent of N and therefore O(1). set::find still dominates the cost of each iteration, at O(log N). We still have the same number of iterations, so we get the same upper bound of O(N2 log N)
The goto loops until a random number equals a given one.
if the distribution of random numbers is uniform, "retry ... until" is "linear in average" respect to the amplitude of the range.
But this linearity gos to multiply the complexity of set::find (log(n)) (ste::insert just happen once)
The two external for are based on constants (so their timing doesn't depend on the data), hence they just multiply the time, but don't increase complexity.
"Complexity" is not about how much absolute time (or space) your program takes. It is about how much the time (or space) increases when you increase the size of your program's input data.
(BTW O for time and O for space may be different.)
Time Complexity
Assuming n is number of elements in the matrix, you have to ask yourself what happens when you add a single element to your matrix (i.e. when n becomes n+1):
You need to iterate over the new element, which is O(1). We are talking about one iteration here, so double loop does not matter.
You have another iteration for printing, which is also O(1), assuming cout<< is O(1).
You have to find the element which is O(log(n)) - the std::set is typically implemented as a red-black tree.
You have to retry the find (via goto) potentially several times. Depending on rnd, min, max and the width of int, number of retries may be O(1) (i.e. it does not increase with increase in number of elements) or it may be worse than that.
You have to insert the element which is O(log(n)).
Assuming the "best" rnd, you are looking at the following increase for one element...
(O(1) + O(1)) * (O(log(n)) * O(1) + O(log(n)) = O(1) * O(log(n)) = O(log(n))
...so for n elements, your complexity is:
(O(n) + O(n)) * (O(log(n)) * O(1) + O(log(n)) = O(n) * O(log(n)) = O(n * log(n))
Assuming "bad" rnd of O(n), you are looking at...
(O(n) + O(n)) * (O(log(n)) * O(n) + O(log(n)) = O(n) * O(n * log(n)) = O(n^2 * log(n))
Space Complexity
Your matrix is O(n) and std::set is O(n) so you are O(n) here overall.

Top K smallest selection algorithm - O (n + k log n) vs O (n log k) for k << N

I'm asking this in regards to Top K algorithm. I'd think that O(n + k log n) should be faster, because well.. for instance if you try plugging in k = 300 and n = 100000000 for example, we can see that O(n + k log n) is smaller.
However when I do a benchmark with C++, it's showing me that O (n log k) is more than 2x faster. Here's the complete benchmarking program:
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <ctime>
#include <cstdlib>
using namespace std;
int RandomNumber () { return rand(); }
vector<int> find_topk(int arr[], int k, int n)
{
make_heap(arr, arr + n, greater<int>());
vector<int> result(k);
for (int i = 0; i < k; ++i)
{
result[i] = arr[0];
pop_heap(arr, arr + n - i, greater<int>());
}
return result;
}
vector<int> find_topk2(int arr[], int k, int n)
{
make_heap(arr, arr + k, less<int>());
for (int i = k; i < n; ++i)
{
if (arr[i] < arr[0])
{
pop_heap(arr, arr + k, less<int>());
arr[k - 1] = arr[i];
push_heap(arr, arr + k, less<int>());
}
}
vector<int> result(arr, arr + k);
return result;
}
int main()
{
const int n = 220000000;
const int k = 300;
srand (time(0));
int* arr = new int[n];
generate(arr, arr + n, RandomNumber);
// replace with topk or topk2
vector<int> result = find_topk2(arr, k, n);
copy(result.begin(), result.end(), ostream_iterator<int>(cout, "\n"));
return 0;
}
find_topk 's approach is to build a complete heap of size n, in O(n) and then remove the top element of heap k times O(log n).
find_topk2 's approach is to build a heap of size k (O(k)) such that max element is at the top, and then from k to n, compare to see if any element is smaller than top element, and if so pop the top element, and push the new element which would mean n times O(log k).
Both approach are written quite similarly so I don't believe any implementation detail (like creating temporaries etc.) can cause a difference besides the algo and the dataset (which is random).
I could actually profile the results of the benchmark and could see that find_topk actually called the comparison operator many more times than find_topk2. But I'm interested more in the reasoning of the theoretical complexity.. so two questions.
Disregarding the implementation or benchmark, was I wrong in expecting that O(n + k log n) should be better than O(n log k)? If I'm wrong, please explain why and how to reason such that I can see O(n log k) is actually better.
If I'm not wrong to expect no 1. Then why is my benchmark showing otherwise?
Big O in several variables is complex, since you need assumptions on how your variables scale with one another, so you can take unambiguously the limit to infinity.
If eg. k ~ n^(1/2), then O(n log k) becomes O(n log n) and O(n + k log n) becomes O(n + n^(1/2) log n) = O(n), which is better.
If k ~ log n, then O(n log k) = O(n log log n) and O(n + k log n) = O(n), which is better. Note that log log 2^1024 = 10, so the constants hidden in the O(n) may be greater than log log n for any realistic n.
If k = constant, then O(n log k) = O(n) and O(n + k log n) = O(n), which is the same.
But the constants play a big role: for instance, building a heap may involve reading the array 3 times, whereas building a priority queue of length k as you go only requires one pass through the array, and a small constant times log k for the lookup.
Which is "better" is therefore unclear, although my quick analysis tended to show that O(n + k log n) performs better under mild assumptions on k.
For instance, if k is a very small constant (say k = 3), then I'm ready to bet that the make_heap approach performs worse than the priority queue one on real world data.
Use asymptotic analysis wisely, and above all, profile your code before drawing conclusions.
You are comparing two worst case upper bounds. For the first approach, the worst case is pretty much equal to the average case. For the second case, if the input is random, by the time you have passed more than a handful of items into the heap, the chance of throwing away the new value at once because it is not going to replace any of the top K is pretty high, so the worst case estimate for this is pessimistic.
If you are comparing wall clock time as opposed to comparisons you may find that heap based algorithms with large heaps tend not to win many races because they have horrible storage locality - and constant factors on modern microprocessors are heavily influenced by what level of memory you end up working in - finding your data is out in real memory chips (or worse, on disk) and not some level of cache will slow you down a lot - which is a shame because I really like heapsort.
Keep in mind that you can now use std::nth_element instead of having to use a heap and do things yourself. Since the default comparator operator is std::less<>(), you can say something like this:
std::nth_element(myList.begin(), myList.begin() + k, myList.end());
Now, myList from positions 0 to k will be the smallest k elements.