Is my heap sort algorithm time complexity analysis correct? - c++

Here is the algorithm:
void heapSort(int * arr, int startIndex, int endIndex)
{
minHeap<int> h(endIndex + 1);
for (int i = 0; i < endIndex + 1; i++)
h.insert(arr[i]);
for (int i = 0; i < endIndex + 1; i++)
arr[i] = h.deleteAndReturnMin();
}
The methods insert() and deleteAndReturnMin() are both O(log n). endIndex + 1 can be referred to as n number of elements. So given that information, am I correct in saying the first and the second loop are both O(n log n) and thus the time complexity for the whole algorithm is O(n log n)? And to be more precise, would the total time complexity be O(2(n log n)) (not including the initializations)? I'm learning about big O notation and time complexity so I just want to make sure I'm understanding it correctly.

Your analysis is correct.
Given that the two methods you have provided are logarithmic time, your entire runtime is logarithmic time iterated over n elements O(n log n) total. You should also realize that Big-O notation ignores constant factors, so the factor of 2 is meaningless.
Note that there is a bug in your code. The inputs seem to suggest that the array starts from startIndex, but startIndex is completely ignored in the implementation.
You can fix this by changing the size of the heap to endIndex + 1 - startIndex and looping from int i = startIndex.

Related

What is the time complexity of this code? Is it O(logn) or O(loglogn)?

int n = 8; // In the video n = 8
int p = 0;
for (int i = 1; i < n; i *= 2) { // In the video i = 1
p++;
}
for (int j = 1; j < p; j *= 2) { // In the video j = 1
//code;
}
This is code from Abdul Bari Youtube channel ( link of the video), they said time complexity of this is O(loglogn) but I think it is O(log), what is the correct answer?
Fix the initial value. 0 multiplied by 2 will never end the loop.
The last loop is O(log log N) because p == log(n). However, the first loop is O(log N), hence in total it is also O(log N).
On the other hand, once you put some code in place of //code then the first loop can be negligible compared to the second and we have:
O ( log N + X * log log N)
^ first loop
^ second loop
and when X is just big enough, one can consider it as O( log log N) in total. However strictly speaking that is wrong, because complexity is about asymptotic behavior and no matter how big X, for N going to infinity, log N will always be bigger than X * log log N at some point.
PS: I assumed that //code does not depend on N, ie it has constant complexity. The above consideration changes if this is not the case.
PPS: In general complexity is important when designing algorithms. When using an algorithm it is rather irrelevant. In that case you rather care about actual runtime for your specific value of N. Complexity can be misleading and even lead to wrong expectations for a specific use case with given N.
You are correct, the time complexity of the complete code is O(log(n)).
But, Abdul Bari Sir is also correct, Because:-
In the video, Abdul Sir is trying to find the time complexity of the second for loop and not the time complexity of the whole code. Take a look at the video again and listen properly what he is saying at this time https://youtu.be/9SgLBjXqwd4?t=568
Once again, what he has derived is the time complexity of the second loop and not the time complexity of the complete code. Please listen to what he says at 9 mins and 28 secs in the video.
If your confusion is clear, please mark this as correct.
The time complexity of
int n;
int p = 0;
for (int i = 1; i < n; i *= 2) { // start at 1, not at 0
p++;
}
is O(log(n)), because you do p++ log2(n) times. The logarithms base does not matter in big O notation, because it just scales by a constant.
for (int j = 1; j < p; j *= 2) {
//code;
}
has O(log(log(n)), because you only loop up to p=log(n) by multiplying, so you have O(log(p)), so O(log(log(n)).
However, both together still are O(log(n)), because O(log(n)+log(log(n)))=O(log(n)

Is the Time Complexity of this code O(N)?

Is the Time Complexity of this code O(N) ?
(in this code I want to find Kth largest element of array)
class Solution {
public:
int findKthLargest(vector<int>& nums, int k) {
make_heap(nums.begin(), nums.end());
for (int i = 0; i < k - 1; i ++) {
pop_heap(nums.begin(), nums.end());
nums.pop_back();
}
return nums.front();
}
};
Depends.
Because make_heap is already at O(n), and each loop is at O(log n), the total time complexity for your algorithm is O(n + k log n). With a small k or a "good" set of data, the result is roughly O(n) and the constant behind the O() mark, but with a large k (near or surpassing n/2) or random data, it's O(n log n).
Also I'd like to point out that this code is modifying the original array (using reference for passing arguments), which isn't often a good practice.
log is 2-based in this post

Time complexity with log in loop

what is the complexity of a loop which goes this
for (int i = 0; i < n; i++)
{
for (int j = 0; j < log(i); j++)
{
// do something
}
}
According to me the inner loop will be running log(1)+log(2)+log(3)+...+log(n) times so how do i calculate its complexity?
So, you have a sum log(1) + log(2) + log(3) + ... + log(n) = log(n!). By using Stirling's approximation and the fact that ln(x) = log(x) / log(e) one can get
log(n!) = log(e) * ln(n!) = log(e) (n ln(n) - n + O(ln(n)))
which gives the same complexity O(n ln(n)) as in the other answer (with slightly better understanding of the constants involved).
Without doing this in a formal manner, such complexity calculations can be "guessed" using integrals. The integrand is the complexity of do_something, which is assumed to be O(1), and combined with the interval of log N, this then becomes log N for the inner loop. Combined with the outer loop, the overall complexity is O(N log N). So between linear and quadratic.
Note: this assumes that "do something" is O(1) (in terms of N, it could be of a very high constant of course).
Lets start with log(1)+log(2)+log(3)+...+log(n). Roughly half of the elements of this sum are greater than or equal to log(n/2) = log(n) - log(2). Hence the lower bound of this sum is n / 2 * (log(n) - log(2)) = Omega(nlog(n)). To get upper bound simply multiply n by the largest element which is log(n), hence O(nlog(n)).

Algorithm analysis: Am I analyzing these algorithms correctly? How to approach problems like these [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
1)
x = 25;
for (int i = 0; i < myArray.length; i++)
{
if (myArray[i] == x)
System.out.println("found!");
}
I think this one is O(n).
2)
for (int r = 0; r < 10000; r++)
for (int c = 0; c < 10000; c++)
if (c % r == 0)
System.out.println("blah!");
I think this one is O(1), because for any input n, it will run 10000 * 10000 times. Not sure if this is right.
3)
a = 0
for (int i = 0; i < k; i++)
{
for (int j = 0; j < i; j++)
a++;
}
I think this one is O(i * k). I don't really know how to approach problems like this where the inner loop is affected by variables being incremented in the outer loop. Some key insights here would be much appreciated. The outer loop runs k times, and the inner loop runs 1 + 2 + 3 + ... + k times. So that sum should be (k/2) * (k+1), which would be order of k^2. So would it actually be O(k^3)? That seems too large. Again, don't know how to approach this.
4)
int key = 0; //key may be any value
int first = 0;
int last = intArray.length-1;;
int mid = 0;
boolean found = false;
while( (!found) && (first <= last) )
{
mid = (first + last) / 2;
if(key == intArray[mid])
found = true;
if(key < intArray[mid])
last = mid - 1;
if(key > intArray[mid])
first = mid + 1;
}
This one, I think is O(log n). But, I came to this conclusion because I believe it is a binary search and I know from reading that the runtime is O(log n). I think it's because you divide the input size by 2 for each iteration of the loop. But, I don't know if this is the correct reasoning or how to approach similar algorithms that I haven't seen and be able to deduce that they run in logarithmic time in a more verifiable or formal way.
5)
int currentMinIndex = 0;
for (int front = 0; front < intArray.length; front++)
{
currentMinIndex = front;
for (int i = front; i < intArray.length; i++)
{
if (intArray[i] < intArray[currentMinIndex])
{
currentMinIndex = i;
}
}
int tmp = intArray[front];
intArray[front] = intArray[currentMinIndex];
intArray[currentMinIndex] = tmp;
}
I am confused about this one. The outer loop runs n times. And the inner for loop runs
n + (n-1) + (n-2) + ... (n - k) + 1 times? So is that O(n^3) ??
More or less, yes.
1 is correct - it seems you are searching for a specific element in what I assume is an un-sorted collection. If so, the worst case is that the element is at the very end of the list, hence O(n).
2 is correct, though a bit strange. It is O(1) assuming r and c are constants and the bounds are not variables. If they are constant, then yes O(1) because there is nothing to input.
3 I believe that is considered O(n^2) still. There would be some constant factor like k * n^2, drop the constant and you got O(n^2).
4 looks a lot like a binary search algorithm for a sorted collection. O(logn) is correct. It is log because at each iteration you are essentially halving the # of possible choices in which the element you are looking for could be in.
5 is looking like a bubble sort, O(n^2), for similar reasons to 3.
O() doesn't mean anything in itself: you need to specify if you are counting the "worst-case" O, or the average-case O. For some sorting algorithm, they have a O(n log n) on average but a O(n^2) in worst case.
Basically you need to count the overall number of iterations of the most inner loop, and take the biggest component of the result without any constant (for example if you have k*(k+1)/2 = 1/2 k^2 + 1/2 k, the biggest component is 1/2 k^2 therefore you are O(k^2)).
For example, your item 4) is in O(log(n)) because, if you work on an array of size n, then you will run one iteration on this array, and the next one will be on an array of size n/2, then n/4, ..., until this size reaches 1. So it is log(n) iterations.
Your question is mostly about the definition of O().
When someone say this algorithm is O(log(n)), you have to read:
When the input parameter n becomes very big, the number of operations performed by the algorithm grows at most in log(n)
Now, this means two things:
You have to have at least one input parameter n. There is no point in talking about O() without one (as in your case 2).
You need to define the operations that you are counting. These can be additions, comparison between two elements, number of allocated bytes, number of function calls, but you have to decide. Usually you take the operation that's most costly to you, or the one that will become costly if done too many times.
So keeping this in mind, back to your problems:
n is myArray.Length, and the number of operations you're counting is '=='. In that case the answer is exactly n, which is O(n)
you can't specify an n
the n can only be k, and the number of operations you count is ++. You have exactly k*(k+1)/2 which is O(n2) as you say
this time n is the length of your array again, and the operation you count is ==. In this case, the number of operations depends on the data, usually we talk about 'worst case scenario', meaning that of all the possible outcome, we look at the one that takes the most time. At best, the algorithm takes one comparison. For the worst case, let's take an example. If the array is [[1,2,3,4,5,6,7,8,9]] and you are looking for 4, your intArray[mid] will become successively, 5, 3 and then 4, and so you would have done the comparison 3 times. In fact, for an array which size is 2^k + 1, the maximum number of comparison is k (you can check). So n = 2^k + 1 => k = ln(n-1)/ln(2). You can extend this result to the case when n is not = 2^k + 1, and you will get complexity = O(ln(n))
In any case, I think you are confused because you don't exactly know what O(n) means. I hope this is a start.

Top K smallest selection algorithm - O (n + k log n) vs O (n log k) for k << N

I'm asking this in regards to Top K algorithm. I'd think that O(n + k log n) should be faster, because well.. for instance if you try plugging in k = 300 and n = 100000000 for example, we can see that O(n + k log n) is smaller.
However when I do a benchmark with C++, it's showing me that O (n log k) is more than 2x faster. Here's the complete benchmarking program:
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <ctime>
#include <cstdlib>
using namespace std;
int RandomNumber () { return rand(); }
vector<int> find_topk(int arr[], int k, int n)
{
make_heap(arr, arr + n, greater<int>());
vector<int> result(k);
for (int i = 0; i < k; ++i)
{
result[i] = arr[0];
pop_heap(arr, arr + n - i, greater<int>());
}
return result;
}
vector<int> find_topk2(int arr[], int k, int n)
{
make_heap(arr, arr + k, less<int>());
for (int i = k; i < n; ++i)
{
if (arr[i] < arr[0])
{
pop_heap(arr, arr + k, less<int>());
arr[k - 1] = arr[i];
push_heap(arr, arr + k, less<int>());
}
}
vector<int> result(arr, arr + k);
return result;
}
int main()
{
const int n = 220000000;
const int k = 300;
srand (time(0));
int* arr = new int[n];
generate(arr, arr + n, RandomNumber);
// replace with topk or topk2
vector<int> result = find_topk2(arr, k, n);
copy(result.begin(), result.end(), ostream_iterator<int>(cout, "\n"));
return 0;
}
find_topk 's approach is to build a complete heap of size n, in O(n) and then remove the top element of heap k times O(log n).
find_topk2 's approach is to build a heap of size k (O(k)) such that max element is at the top, and then from k to n, compare to see if any element is smaller than top element, and if so pop the top element, and push the new element which would mean n times O(log k).
Both approach are written quite similarly so I don't believe any implementation detail (like creating temporaries etc.) can cause a difference besides the algo and the dataset (which is random).
I could actually profile the results of the benchmark and could see that find_topk actually called the comparison operator many more times than find_topk2. But I'm interested more in the reasoning of the theoretical complexity.. so two questions.
Disregarding the implementation or benchmark, was I wrong in expecting that O(n + k log n) should be better than O(n log k)? If I'm wrong, please explain why and how to reason such that I can see O(n log k) is actually better.
If I'm not wrong to expect no 1. Then why is my benchmark showing otherwise?
Big O in several variables is complex, since you need assumptions on how your variables scale with one another, so you can take unambiguously the limit to infinity.
If eg. k ~ n^(1/2), then O(n log k) becomes O(n log n) and O(n + k log n) becomes O(n + n^(1/2) log n) = O(n), which is better.
If k ~ log n, then O(n log k) = O(n log log n) and O(n + k log n) = O(n), which is better. Note that log log 2^1024 = 10, so the constants hidden in the O(n) may be greater than log log n for any realistic n.
If k = constant, then O(n log k) = O(n) and O(n + k log n) = O(n), which is the same.
But the constants play a big role: for instance, building a heap may involve reading the array 3 times, whereas building a priority queue of length k as you go only requires one pass through the array, and a small constant times log k for the lookup.
Which is "better" is therefore unclear, although my quick analysis tended to show that O(n + k log n) performs better under mild assumptions on k.
For instance, if k is a very small constant (say k = 3), then I'm ready to bet that the make_heap approach performs worse than the priority queue one on real world data.
Use asymptotic analysis wisely, and above all, profile your code before drawing conclusions.
You are comparing two worst case upper bounds. For the first approach, the worst case is pretty much equal to the average case. For the second case, if the input is random, by the time you have passed more than a handful of items into the heap, the chance of throwing away the new value at once because it is not going to replace any of the top K is pretty high, so the worst case estimate for this is pessimistic.
If you are comparing wall clock time as opposed to comparisons you may find that heap based algorithms with large heaps tend not to win many races because they have horrible storage locality - and constant factors on modern microprocessors are heavily influenced by what level of memory you end up working in - finding your data is out in real memory chips (or worse, on disk) and not some level of cache will slow you down a lot - which is a shame because I really like heapsort.
Keep in mind that you can now use std::nth_element instead of having to use a heap and do things yourself. Since the default comparator operator is std::less<>(), you can say something like this:
std::nth_element(myList.begin(), myList.begin() + k, myList.end());
Now, myList from positions 0 to k will be the smallest k elements.