What is the time complexity of below program? - c++

Below is the program which find the length of the longest substring without repeating characters, given a string str. (details)
int test(string str) {
int left = 0, right = 0, ans = 0;
unordered_set<char> set;
while(left < str.size() and right < str.size()) {
if(set.find(str[right]) == set.end()) set.insert(str[right]);
else {
while(str[left] != str[right]){
set.erase(str[left]);
left++;
}
left++;
}
right++;
ans = (ans > set.size() ? ans : set.size());
}
return ans;
};
What is the time complexity of above solution? Is it O(n^2) or O(n) where n is the length of string?
Please note that I have gone through multiple questions on internet and also read about big oh but I am still confused. To me, it looks like O(n^2) complexity due to two while loops but I want to confirm from experts here.

It's O(n) on average.
What you see here is a sliding window technique (with variable window size, also called "two pointers technique").
Yes there are two loops, but if you look, any iteration of any of the two loops will always increase one of the pointers (either left or right).
In the first loop, either you call the second loop or you don't, but you will increase right at each iteration. The second loop always increases left.
Both left and right can have n different values (because both loops would stop when either right >= n or left == right).
So the first loop will have n executions (all the values of right from 0 to n-1) and the second loop can have at most n executions (all the possible values of left), which is a worst case of 2n = O(n) executions.
Worst case complexity
For the sake of completeness, please note that I wrote O(n) on average. The reason is that set.find has a complexity of O(1) in average but O(n) in the worst case. Same goes for set.erase. The reason is that unordered_set is implemented with a hash table and it the very unlikely case of all your items being in the same bucket, it needs to iterate on all the items.
So even though we have O(n) iterations of the loop, some iterations could be O(n). It means that in some very unlikely cases, the execution could go up to O(n^2). You shouldn't really worry about it as the probability of this to happen is close to 0, and even though I don't exactly know what the hashing technique for char in C++, I would bet that we will never end up with all characters in the same bucket.

Related

How to calculate time complexitiy?

I'm really having trouble calculating big O. I get the basics but when it gets to nested for loops and all that, my mind just blanks out. I was asked to write down the complexity of the following algorithm which I have no clue how to do. The input string contains only A,B,C and D
string solution(string &S) {
int length = S.length();
int i = 0;
while(i < length - 1)
{
if ( (S[i] == 'A' && S[i+1] == 'B') || (S[i] == 'B' && S[i+1] == 'A'))
{
S = S.erase(i,2);
i = 0;
length = S.length();
}
if ( (S[i] == 'C' && S[i+1] == 'D') || (S[i] == 'D' && S[i+1] == 'C'))
{
S = S.erase(i,2);
i = 0;
length = S.length();
}
i++;
}
return S;
}
What would the big O of this algorithm be?
It is O(n^2).
DDDDDDDDDDDDDDDDDDDABABABABABABABABABABABAB
First n/2 characters are D
Last n/2 characters are AB
For each AB, (there are 1/4n such) - O(n)
You are resetting i (iterating from start)
shifting all successive elements to fill the gap created after erase.
Total:
O(n)*(O(n) + O(n)) = O(n^2)
It's easy to get hung up about the precise detail of how efficient an algorithm is. Fundamentally though, all you're concerned about is whether the operation is:
Constant time
Proportional to the number of elements
Proportional to the square of the number of elements
etc...
Look at this for guidance on how to estimate the Big-O for a compound operation:
https://hackernoon.com/big-o-for-beginners-622a64760e2
The big-O essentially defines the worst-case complexity of a method, with particular regard to effects that would be observed with very large n. On the face of it you would consider how many times you repeat an operation, but you also need to consider if any embodied methods (e.g. string erase, string length) have complexity that's "constant time", "proportional to the number of elements", "proportional to the number of elements - squared" and so on.
So if your outer loop performs n scans but also invokes methods which also perform n scans on up to every item then you end up with O(n^2).
The main concern is the exponential dimension; you could have a very time-consuming linear-complexity operation, but also a very fast, say, power-of-4 element. In such a case, it's considered to be O(n^4) ( as opposed to O(20000n + n^4) ) because as n tends to infinity, all of the lesser exponent factors become insignificant. See here : https://en.wikipedia.org/wiki/Big_O_notation#Properties
So in your case, you have the following loops:
Repetition of the scan (setting i=0) whose frequency is proportional to number of matches (worst case n for argument's sake - even if it's a fraction, when n becomes infinite it remains significant). Although this is not supposedly the outer loop, it does fundamentally govern how many times the other scans are performed.
String scan whose frequency is proportional to length (n), PLUS Embodied loop in the string erase - n in the worst case. Note these operations are performed in isolation, together governed by the frequency of the aforementioned repetition. As stated elsewhere, O(n)+O(n) reduces to O(n) because we only care about exponent.
So in this case the complexity is O(n^2)
A separate consideration when assessing the performance of any algorithm regards how cache friendly it is; algorithms using hashmaps, linked lists etc are considered prima-facie to be more efficient, but in some cases a O(n^2) algorithm that operates within a cache line and doesn't invoke page faults nor cache flushes can execute a lot faster than a supposedly more efficient algorithm that has memory scattered all over the place.
I guess this would be O(n) because there is one loop thats going through the string.
The longer the string the more time it takes so i would say O(n)
In big O notation, you give the answer for the worst case. Here the worst case will be that the string does not satisfy any if statements. Then time complexity here will be O(n) because there is only one loop.

Is this Insertion Sort implementation worst case O(n)?

I know that Insertion Sort is supposed to be worst case O(n^2), but I'm wondering why the following implementation isn't O(n).
void main()
{
//insertion sort runs from i = 1 to i = n, thus is worst case O(n)
for (
int i = 1,
placeholder = 0,
A[] = { 10,9,8,7,6,5,4,3,2,1 },
j = i;
i <= 10;
j-- > 0 && A[j - 1] > A[j]
? placeholder = A[j], A[j] = A[j - 1], A[j - 1] = placeholder
: j = ++i
)
{
for (
int x = 0;
x < 10; x++
)
cout << A[x] << ' ';
cout << endl;
}
system("pause");
}
There is only one for loop involved here and it runs from 1 to n. It seems to me that this would be the definition of O(n). What exactly am I missing here?
Sloppy terminology has led many people to false conclusions. This appears to be an example.
There is only one for loop involved here and it runs from 1 to n.
Yes, there is only one loop, but what is this "it" to which you refer? I really do mean for you to think about it. Should "it" refer to the loop? That would match a fairly common, yet sloppy, use of terminology, but a loop does not evaluate to a value. So a loop cannot actually run from one value to another. The sloppiness can be overlooked in simpler contexts, but not in yours.
Normally, the "it" would really refer to the loop control variable. With a simple loop, like for (int i = 0; i < 10; ++i), there is a one-to-one correspondence between iterations of the loop and values assigned to the control variable (which is i in my example). So there is an equivalence present, allowing one to refer to the loop when one really means the control variable. Saying that a loop runs from x to y really means that the control variable runs from x to y, and that there is one iteration of the loop per value assigned to the control variable. This correspondence fails in your code.
In your loop, the thing that runs from 1 to n is i. However, i is not incremented with each iteration of the loop, so "it runs from 1 to n" is not an accurate assessment of your loop. When i is 1, there are up to 2 iterations. That's not a one-to-one correspondence between iterations and values of i. As i increases, the divergence from one-to-one grows. Each value of i potentially corresponds to i+1 iterations, as j counts down from i to 0. The total number of iterations in the worst case scenario for n entries is the sum of the potential number of iterations for each value of i: 2 + 3 + &ctdot; + (n+1) = (n² + 3n)/2. That's O(n²).
Moral of the story: writing compact, cryptic code does not magically change the complexity of the algorithm being implemented. Cryptic code can make the complexity harder to pin down, but the main thing you've accomplished is making your code harder to read.
Thats a very odd way to write code.But You have 2 for loops in the definition. It is not always necessary to have nested loops to have O(n^2), you can have it with recursion also.
In simple terms O(n^2)n simply means number of operations performed when the input size is n.
The code given is not a correct c++ code and not even close to a pseudocode.
The correct code should be like this:
void main()
{
int i,j,key;
int A[]={10,9,8,7,6,5,4,3,2,1};
//cout<<"Array before sorting:"<<endl;
//for(i=0;i<10;i++)
//cout<<A[i]<<"\t";
//cout<<endl;
for(i=1;i<10;i++)
{
key=A[i];
for(j=i-1;j>=0 && A[j]>key;j--)
{
A[j+1]=A[j];
}
A[j+1]=key;
}
//cout<<"Array after sorting:"<<endl;
//for(i=0;i<10;i++)
//cout<<A[i]<<"\t";
//cout<<endl;
}
See, insertion sort has two loops. Outer loop is to maintain the key variable and the inner loop is to compare the elements prior to key variable with the key variable. And therefore the worst case time complexity is O(n^2) and not O(n), as the basic algorithm of insertion sort contains two loops, both of which eventually iterate n times in case of worst case i.e. when the array is inverted.

How to convert a simple computer algorithm into a mathematical function in order to determine the big o notation?

In my University we are learning Big O Notation. However, one question that I have in light of big o notation is, how do you convert a simple computer algorithm, say for example, a linear searching algorithm, into a mathematical function, say for example 2n^2 + 1?
Here is a simple and non-robust linear searching algorithm that I have written in c++11. Note: I have disregarded all header files (iostream) and function parameters just for simplicity. I will just be using basic operators, loops, and data types in order to show the algorithm.
int array[5] = {1,2,3,4,5};
// Variable to hold the value we are searching for
int searchValue;
// Ask the user to enter a search value
cout << "Enter a search value: ";
cin >> searchValue;
// Create a loop to traverse through each element of the array and find
// the search value
for (int i = 0; i < 5; i++)
{
if (searchValue == array[i])
{
cout << "Search Value Found!" << endl;
}
else
// If S.V. not found then print out a message
cout << "Sorry... Search Value not found" << endl;
In conclusion, how do you translate an algorithm into a mathematical function so that we can analyze how efficient an algorithm really is using big o notation? Thanks world.
First, be aware that it's not always possible to analyze the time complexity of an algorithm, there are some where we do not know their complexity, so we have to rely on experimental data.
All of the methods imply to count the number of operations done. So first, we have to define the cost of basic operations like assignation, memory allocation, control structures (if, else, for, ...). Some values I will use (working with different models can provide different values):
Assignation takes constant time (ex: int i = 0;)
Basic operations take constant time (+ - * ∕)
Memory allocation is proportional to the memory allocated: allocating an array of n elements takes linear time.
Conditions take constant time (if, else, else if)
Loops take time proportional to the number of time the code is ran.
Basic analysis
The basic analysis of a piece of code is: count the number of operations for each line. Sum those cost. Done.
int i = 1;
i = i*2;
System.out.println(i);
For this, there is one operation on line 1, one on line 2 and one on line 3. Those operations are constant: This is O(1).
for(int i = 0; i < N; i++) {
System.out.println(i);
}
For a loop, count the number of operations inside the loop and multiply by the number of times the loop is ran. There is one operation on the inside which takes constant time. This is ran n times -> Complexity is n * 1 -> O(n).
for (int i = 0; i < N; i++) {
for (int j = i; j < N; j++) {
System.out.println(i+j);
}
}
This one is more tricky because the second loop starts its iteration based on i. Line 3 does 2 operations (addition + print) which take constant time, so it takes constant time. Now, how much time line 3 is ran depends on the value of i. Enumerate the cases:
When i = 0, j goes from 0 to N so line 3 is ran N times.
When i = 1, j goes from 1 to N so line 3 is ran N-1 times.
...
Now, summing all this we have to evaluate N + N-1 + N-2 + ... + 2 + 1. The result of the sum is N*(N+1)/2 which is quadratic, so complexity is O(n^2).
And that's how it works for many cases: count the number of operations, sum all of them, get the result.
Amortized time
An important notion in complexity theory is amortized time. Let's take this example: running operation() n times:
for (int i = 0; i < N; i++) {
operation();
}
If one says that operation takes amortized constant time, it means that running n operations took linear time, even though one particular operation may have taken linear time.
Imagine you have an empty array of 1000 elements. Now, insert 1000 elements into it. Easy as pie, every insertion took constant time. And now, insert another element. For that, you have to create a new array (bigger), copy the data from the old array into the new one, and insert the element 1001. The 1000 first insertions took constant time, the last one took linear time. In this case, we say that all insertions took amortized constant time because the cost of that last insertion was amortized by the others.
Make assumptions
In some other cases, getting the number of operations require to make hypothesises. A perfect example for this is insertion sort, because it is simple and it's running time depends of how is the data ordered.
First, we have to make some more assumptions. Sorting involves two elementary operations, that is comparing two elements and swapping two elements. Here I will consider both of them to take constant time. Here is the algorithm where we want to sort array a:
for (int i = 0; i < a.length; i++) {
int j = i;
while (j > 0 && a[j] < a[j-1]) {
swap(a, i, j);
j--;
}
}
First loop is easy. No matter what happens inside, it will run n times. So the running time of the algorithm is at least linear. Now, to evaluate the second loop we have to make assumptions about how the array is ordered. Usually, we try to define the best-case, worst-case and average case running time.
Best-case: We do never enter the while loop. Is this possible ? Yes. If a is a sorted array, then a[j] > a[j-1] no matter what j is. Thus, we never enter the second loop. So, what operations are done in this case is the assignation on line 2 and the evaluation of the condition on line 3. Both take constant time. Because of the first loop, those operations are ran n times. Then in the best case, insertion sort is linear.
Worst-case: We leave the while loop only when we reach the beginning of the array. That is, we swap every element all the way to the 0 index, for every element in the array. It corresponds to an array sorted in reverse order. In this case, we end up with the first element being swapped 0 times, element 2 is swapped 1 times, element 3 is swapped 2 times, etc up to element n being swapped n-1 times. We already know the result of this: worst-case insertion is quadratic.
Average case: For the average case, we assume the items are randomly distributed inside the array. If you're interested in the maths, it involves probabilities and you can find the proof in many places. Result is quadratic.
Conclusion
Those were basics about analyzing the time complexity of an algorithm. The cases were easy, but there are some algorithms which aren't as nice. For example, you can look at the complexity of the pairing heap data structure which is much more complex.

find duplicate number in an array

I am debugging below problem and post the solution I am debugging and working on, the solution or similar is posted on a couple of forums, but I think the solution has a bug when num[0] = 0 or in general num[x] = x? Am I correct? Please feel free to correct me if I am wrong.
Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.
Note:
You must not modify the array (assume the array is read only).
You must use only constant, O(1) extra space.
Your runtime complexity should be less than O(n2).
There is only one duplicate number in the array, but it could be repeated more than once.
int findDuplicate3(vector<int>& nums)
{
if (nums.size() > 1)
{
int slow = nums[0];
int fast = nums[nums[0]];
while (slow != fast)
{
slow = nums[slow];
fast = nums[nums[fast]];
}
fast = 0;
while (fast != slow)
{
fast = nums[fast];
slow = nums[slow];
}
return slow;
}
return -1;
}
Below is my code which uses Floyd's cycle-finding algorithm:
#include <iostream>
#include <vector>
using namespace std;
int findDup(vector<int>&arr){
int len = arr.size();
if(len>1){
int slow = arr[0];
int fast = arr[arr[0]];
while(slow!=fast){
slow = arr[slow];
fast = arr[arr[fast]];
}
fast = 0;
while(slow!=fast){
slow = arr[slow];
fast = arr[fast];
}
return slow;
}
return -1;
}
int main() {
vector<int>v = {1,2,2,3,4};
cout<<findDup(v)<<endl;
return 0;
}
Comment This works because zeroes aren't allowed, so the first element of the array isn't part of a cycle, and so the first element of the first cycle we find is referred to both outside and inside the cycle. If zeroes were allowed, this would fail if arr[0] were on a cycle. E.g., [0,1,1].
The sum of integers from 1 to N = (N * (N + 1)) / 2. You can use this to find the duplicate -- sum the integers in the array, then subtract the above formula from the sum. That's the duplicate.
Update: The above solution is based on the (possibly invalid) assumption that the input array consists of the values from 1 to N plus a single duplicate.
Start with two pointers to the first element: fast and slow.
Define a 'move' as incrementing fast by 2 step(positions) and slow by 1.
After each move, check if slow & fast point to the same node.
If there is a loop, at some point they will. This is because after they are both in the loop, fast is moving twice as quickly as slow and will eventually 'run into' it.
Say they meet after k moves. This is NOT NECESSARILY the repeated element, since it might not be the first element of the loop reached from outside the loop.
Call this element X.
Notice that fast has stepped 2k times, and slow has stepped k times.
Move fast back to zero.
Repeatedly advance fast and slow by ONE STEP EACH, comparing after each step.
Notice that after another k steps, slow will have moved a total of 2k steps and fast a total of k steps from the start, so they will again both be pointing to X.
Notice that if the prior step is on the loop for both of them, they were both pointing to X-1. If the prior step was only on the loop for slow, then they were pointing to different elements.
Ditto for X-2, X-3, ...
So in going forward, the first time they are pointing to the same element is the first element of the cycle reached from outside the cycle, which is the repeated element you're looking for.
Since you cannot use any additional space, using another hash table would be ruled out.
Now, coming to the approach of hashing on existing array, it can be acheived if we are allowed to modify the array in place.
Algo:
1) Start with the first element.
2) Hash the first element and apply a transformation to the value of hash.Let's say this transformation is making the value -ve.
3)Proceed to next element.Hash the element and before applying the transformation, check if a transformation has already been applied.
4) If yes, then element is a duplicate.
Code:
for(i = 0; i < size; i++)
{
if(arr[abs(arr[i])] > 0)
arr[abs(arr[i])] = -arr[abs(arr[i])];
else
cout<< abs(arr[i]) <<endl;
}
This transformation is required since if we are to use hashing approach,then, there has to be a collision for hashing the same key.
I cant think of a way in which hashing can be used without any additional space and not modifying the array.

why is Insertion sort best case big O complexity O(n)?

Following is my insertion sort code:
void InsertionSort(vector<int> & ioList)
{
int n = ioList.size();
for (int i = 1 ; i < n ; ++i)
{
for (int j = 0 ; j <= i ; ++j)
{
//Shift elements if needed(insert at correct loc)
if (ioList[j] > ioList[i])
{
int temp = ioList[j];
ioList[j] = ioList[i];
ioList[i] = temp;
}
}
}
}
The average complexity of the algorithm is O(n^2).
From my understanding of big O notation, this is because we run two loops in this case(outer one n-1 times and inner one 1,2,...n-1 = n(n-1)/2 times and thus the resulting asymptomatic complexity of the algorithm is O(n^2).
Now I have read that best case is the case when the input array is already sorted.
And the big O complexity of the algorithm is O(n) in such a case. But I fail to understand how this is possible as in both cases (average and best case) we have to run the loops the same number of times and have to compare the elements. The only thing that is avoided is the shifting of elements.
So does complexity calculation also involve a component of this swapping operation?
Yes, this is because your implementation is incorrect. The inner loop should count backward from i-1 down to 0, and it should terminate as soon as it finds an element ioList[j] that is already smaller than ioList[i].
It is because of that termination criterion that the algorithm performs in O(n) time in the best case:
If the input list is already sorted, the inner loop will terminate immediately for any i, i.e. the number of computational steps performed ends up being proportional to the number of times the outer loop is performed, i.e. O(n).
Your implementation of "insertion sort" is poor.
In your inner loop, you should not scan all the way up to i-1 swapping each element greater than ioList[i]. Instead, you should scan backwards from i-1 until you find the correct place to insert the new element (that is, until you find an element less than or equal to the new element), and insert it there. If the input is already sorted, then the correct insertion point is always found immediately, and so the inner loop does not execute i-1 times, it only executes once.
Your sort is also worse than insertion sort on average, since you always do i+1 operations for each iteration of the outer loop -- some of those ops are just a comparison, and some are a comparison followed by a swap. An insertion sort only needs to do on average half that, since for random/average input, the correct insertion point is half way through the initial sorted segment. It's also possible to avoid swaps, so that each operation is a comparison plus a copy.