understanding algorithmic complexity

understanding algorithmic complexity - c++

I'm looking at some online algorithm solutions for coding interviews, and I don't understand why this algorithm is claimed to be O(n^3).
Caveat: I understand that big-Oh notation is abused in industry, and when I refer to O(n), I'm using that notation to mean the upper bound of an algorithms runtime as is common outside of academia in most places.
Finding the longest palindromic substring. A simple solution might be:
bool isPalindrome(std::string s) {
if (s.length() <= 1) {
return true;
}
if (s[0] == s[s.length() - 1]) {
return isPalindrome(s.substr(1, s.length() - 2));
} else {
return false;
}
}
std::string longestPalindrome(std::string s) {
std::string max_pal = "";
for (size_t i = 0; i < s.length(); ++i) {
for (size_t len = 1; len <= s.length() - i; ++len) {
std::string sub = s.substr(i,len);
if (isPalindrome(sub)) {
if (max_pal.size() < sub.size()) max_pal = sub;
}
}
}
return max_pal;
}
Isn't this algorithm O(n^2)? Very simply, it takes O(n^2) time to generate all substrings, and O(n) time to determine if it's a palindrome. Where n is the number of characters in the initial string.

Isn't this algorithm O(n^2)? Very simply, it takes O(n^2) time to
generate all substrings, and O(n) time to determine if it's a
palindrome.
What you are describing is exactly O(n^3), because for each substring, you are doing an operation which costs O(n), so total number of operations is O(n^2 * C*n), which is O(n^3)
However, the code described is actually O(n^4), isPalindrome() is O(n^2):
You are creating O(n) substrings, of sizes: 1 + 3 + 5 + ... + n-2, which is O(n^2) total time.
Doing this O(n^2) times in longestPalindrome() totals to O(n^4).
(This assumes O(n) substr() complexity. It's not defined - but it's usually the case)

You are almost right,
it takes O(n^2) and O(n) operations to generate the strings and check them.
Thus, you need O(n^2) (amount of strings) times O(n) checks.
Since n^2 * n = n^3, the total run time is in O(n^3).

O(n^2) (substring turns out to be O(n) itself) is executed inside double loop (O(n^2)). That gives us O(n^4).

Actually this'd be even O(N^4), because of the barbarity of the implementation.
isPalindrome is implemented in such a way, that for every recursive invocation it allocates a new string, which is essentially the source string with first and last chars removed. So every such a call is already O(n).

Related

What's the Time Complexity and Space Complexity this function to check valid parenthesis?

I am unable to find the Time Complexity and Space Complexity of the code to check valid parenthesis. Can anyone help, please?
code -
bool isValid(string s)
{
int s_size = s.length();
for (int i = 0; i < s_size-1;i++)
{
if((s[i]=='(' && s[i+1]==')') || (s[i]=='[' && s[i+1]==']')|| (s[i]=='{' && s[i+1]=='}'))
{
s.erase(i, 2);
i=-1;
s_size-=2;
}
}
if(s.empty())
return true;
else
return false;
}
I think the space complexity is O(1) and the time complexity is O(n). Correct me please, if I am wrong.

For just the code snippet you posted above, the time complexity is O(n3).
As rightly pointed out by n. 'pronouns' m., the outer for loop for (int i = 0; i < s_size-1;i++) runs n2 times because i resets to the starting point every time a match is found. And the s.erase() function also takes O(n) time to run. In total making up for a cubic O(n3) time complexity.
You can read up on std::string.erase() function here
You are right about space complexity. Its O(1).
To solve the above problem of Valid Parenthesis efficiently i.e., within O(n) time complexity and O(n) space complexity, consider using a stack.

How to calculate time complexitiy?

I'm really having trouble calculating big O. I get the basics but when it gets to nested for loops and all that, my mind just blanks out. I was asked to write down the complexity of the following algorithm which I have no clue how to do. The input string contains only A,B,C and D
string solution(string &S) {
int length = S.length();
int i = 0;
while(i < length - 1)
{
if ( (S[i] == 'A' && S[i+1] == 'B') || (S[i] == 'B' && S[i+1] == 'A'))
{
S = S.erase(i,2);
i = 0;
length = S.length();
}
if ( (S[i] == 'C' && S[i+1] == 'D') || (S[i] == 'D' && S[i+1] == 'C'))
{
S = S.erase(i,2);
i = 0;
length = S.length();
}
i++;
}
return S;
}
What would the big O of this algorithm be?

It is O(n^2).
DDDDDDDDDDDDDDDDDDDABABABABABABABABABABABAB
First n/2 characters are D
Last n/2 characters are AB
For each AB, (there are 1/4n such) - O(n)
You are resetting i (iterating from start)
shifting all successive elements to fill the gap created after erase.
Total:
O(n)*(O(n) + O(n)) = O(n^2)

It's easy to get hung up about the precise detail of how efficient an algorithm is. Fundamentally though, all you're concerned about is whether the operation is:
Constant time
Proportional to the number of elements
Proportional to the square of the number of elements
etc...
Look at this for guidance on how to estimate the Big-O for a compound operation:
https://hackernoon.com/big-o-for-beginners-622a64760e2
The big-O essentially defines the worst-case complexity of a method, with particular regard to effects that would be observed with very large n. On the face of it you would consider how many times you repeat an operation, but you also need to consider if any embodied methods (e.g. string erase, string length) have complexity that's "constant time", "proportional to the number of elements", "proportional to the number of elements - squared" and so on.
So if your outer loop performs n scans but also invokes methods which also perform n scans on up to every item then you end up with O(n^2).
The main concern is the exponential dimension; you could have a very time-consuming linear-complexity operation, but also a very fast, say, power-of-4 element. In such a case, it's considered to be O(n^4) ( as opposed to O(20000n + n^4) ) because as n tends to infinity, all of the lesser exponent factors become insignificant. See here : https://en.wikipedia.org/wiki/Big_O_notation#Properties
So in your case, you have the following loops:
Repetition of the scan (setting i=0) whose frequency is proportional to number of matches (worst case n for argument's sake - even if it's a fraction, when n becomes infinite it remains significant). Although this is not supposedly the outer loop, it does fundamentally govern how many times the other scans are performed.
String scan whose frequency is proportional to length (n), PLUS Embodied loop in the string erase - n in the worst case. Note these operations are performed in isolation, together governed by the frequency of the aforementioned repetition. As stated elsewhere, O(n)+O(n) reduces to O(n) because we only care about exponent.
So in this case the complexity is O(n^2)
A separate consideration when assessing the performance of any algorithm regards how cache friendly it is; algorithms using hashmaps, linked lists etc are considered prima-facie to be more efficient, but in some cases a O(n^2) algorithm that operates within a cache line and doesn't invoke page faults nor cache flushes can execute a lot faster than a supposedly more efficient algorithm that has memory scattered all over the place.

I guess this would be O(n) because there is one loop thats going through the string.
The longer the string the more time it takes so i would say O(n)

In big O notation, you give the answer for the worst case. Here the worst case will be that the string does not satisfy any if statements. Then time complexity here will be O(n) because there is only one loop.

I don't understand the shell sort complexity with shell gap 8,4,2,1 [duplicate]

First, here's my Shell sort code (using Java):
public char[] shellSort(char[] chars) {
int n = chars.length;
int increment = n / 2;
while(increment > 0) {
int last = increment;
while(last < n) {
int current = last - increment;
while(current >= 0) {
if(chars[current] > chars[current + increment]) {
//swap
char tmp = chars[current];
chars[current] = chars[current + increment];
chars[current + increment] = tmp;
current -= increment;
}
else { break; }
}
last++;
}
increment /= 2;
}
return chars;
}
Is this a correct implementation of Shell sort (forgetting for now about the most efficient gap sequence - e.g., 1,3,7,21...)? I ask because I've heard that the best-case time complexity for Shell Sort is O(n). (See http://en.wikipedia.org/wiki/Sorting_algorithm). I can't see this level of efficiency being realized by my code. If I added heuristics to it, then yeah, but as it stands, no.
That being said, my main question now - I'm having difficulty calculating the Big O time complexity for my Shell sort implementation. I identified that the outer-most loop as O(log n), the middle loop as O(n), and the inner-most loop also as O(n), but I realize the inner two loops would not actually be O(n) - they would be much less than this - what should they be? Because obviously this algorithm runs much more efficiently than O((log n) n^2).
Any guidance is much appreciated as I'm very lost! :P

The worst-case of your implementation is Θ(n^2) and the best-case is O(nlogn) which is reasonable for shell-sort.
The best case ∊ O(nlogn):
The best-case is when the array is already sorted. The would mean that the inner if statement will never be true, making the inner while loop a constant time operation. Using the bounds you've used for the other loops gives O(nlogn). The best case of O(n) is reached by using a constant number of increments.
The worst case ∊ O(n^2):
Given your upper bound for each loop you get O((log n)n^2) for the worst-case. But add another variable for the gap size g. The number of compare/exchanges needed in the inner while is now <= n/g. The number of compare/exchanges of the middle while is <= n^2/g. Add the upper-bound of the number of compare/exchanges for each gap together: n^2 + n^2/2 + n^2/4 + ... <= 2n^2 ∊ O(n^2). This matches the known worst-case complexity for the gaps you've used.
The worst case ∊ Ω(n^2):
Consider the array where all the even positioned elements are greater than the median. The odd and even elements are not compared until we reach the last increment of 1. The number of compare/exchanges needed for the last iteration is Ω(n^2).

Insertion Sort
If we analyse
static void sort(int[] ary) {
int i, j, insertVal;
int aryLen = ary.length;
for (i = 1; i < aryLen; i++) {
insertVal = ary[i];
j = i;
/*
* while loop exits as soon as it finds left hand side element less than insertVal
*/
while (j >= 1 && ary[j - 1] > insertVal) {
ary[j] = ary[j - 1];
j--;
}
ary[j] = insertVal;
}
}
Hence in case of average case the while loop will exit in middle
i.e 1/2 + 2/2 + 3/2 + 4/2 + .... + (n-1)/2 = Theta((n^2)/2) = Theta(n^2)
You saw here we achieved (n^2)/2 even though divide by two doesn't make more difference.
Shell Sort is nothing but insertion sort by using gap like n/2, n/4, n/8, ...., 2, 1
mean it takes advantage of Best case complexity of insertion sort (i.e while loop exit) starts happening very quickly as soon as we find small element to the left of insert element, hence it adds up to the total execution time.
n/2 + n/4 + n/8 + n/16 + .... + n/n = n(1/2 + 1/4 + 1/8 + 1/16 + ... + 1/n) = nlogn (Harmonic Series)
Hence its time complexity is some thing close to n(logn)^2

Algorithm analysis: Am I analyzing these algorithms correctly? How to approach problems like these [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
1)
x = 25;
for (int i = 0; i < myArray.length; i++)
{
if (myArray[i] == x)
System.out.println("found!");
}
I think this one is O(n).
2)
for (int r = 0; r < 10000; r++)
for (int c = 0; c < 10000; c++)
if (c % r == 0)
System.out.println("blah!");
I think this one is O(1), because for any input n, it will run 10000 * 10000 times. Not sure if this is right.
3)
a = 0
for (int i = 0; i < k; i++)
{
for (int j = 0; j < i; j++)
a++;
}
I think this one is O(i * k). I don't really know how to approach problems like this where the inner loop is affected by variables being incremented in the outer loop. Some key insights here would be much appreciated. The outer loop runs k times, and the inner loop runs 1 + 2 + 3 + ... + k times. So that sum should be (k/2) * (k+1), which would be order of k^2. So would it actually be O(k^3)? That seems too large. Again, don't know how to approach this.
4)
int key = 0; //key may be any value
int first = 0;
int last = intArray.length-1;;
int mid = 0;
boolean found = false;
while( (!found) && (first <= last) )
{
mid = (first + last) / 2;
if(key == intArray[mid])
found = true;
if(key < intArray[mid])
last = mid - 1;
if(key > intArray[mid])
first = mid + 1;
}
This one, I think is O(log n). But, I came to this conclusion because I believe it is a binary search and I know from reading that the runtime is O(log n). I think it's because you divide the input size by 2 for each iteration of the loop. But, I don't know if this is the correct reasoning or how to approach similar algorithms that I haven't seen and be able to deduce that they run in logarithmic time in a more verifiable or formal way.
5)
int currentMinIndex = 0;
for (int front = 0; front < intArray.length; front++)
{
currentMinIndex = front;
for (int i = front; i < intArray.length; i++)
{
if (intArray[i] < intArray[currentMinIndex])
{
currentMinIndex = i;
}
}
int tmp = intArray[front];
intArray[front] = intArray[currentMinIndex];
intArray[currentMinIndex] = tmp;
}
I am confused about this one. The outer loop runs n times. And the inner for loop runs
n + (n-1) + (n-2) + ... (n - k) + 1 times? So is that O(n^3) ??

More or less, yes.
1 is correct - it seems you are searching for a specific element in what I assume is an un-sorted collection. If so, the worst case is that the element is at the very end of the list, hence O(n).
2 is correct, though a bit strange. It is O(1) assuming r and c are constants and the bounds are not variables. If they are constant, then yes O(1) because there is nothing to input.
3 I believe that is considered O(n^2) still. There would be some constant factor like k * n^2, drop the constant and you got O(n^2).
4 looks a lot like a binary search algorithm for a sorted collection. O(logn) is correct. It is log because at each iteration you are essentially halving the # of possible choices in which the element you are looking for could be in.
5 is looking like a bubble sort, O(n^2), for similar reasons to 3.

O() doesn't mean anything in itself: you need to specify if you are counting the "worst-case" O, or the average-case O. For some sorting algorithm, they have a O(n log n) on average but a O(n^2) in worst case.
Basically you need to count the overall number of iterations of the most inner loop, and take the biggest component of the result without any constant (for example if you have k*(k+1)/2 = 1/2 k^2 + 1/2 k, the biggest component is 1/2 k^2 therefore you are O(k^2)).
For example, your item 4) is in O(log(n)) because, if you work on an array of size n, then you will run one iteration on this array, and the next one will be on an array of size n/2, then n/4, ..., until this size reaches 1. So it is log(n) iterations.

Your question is mostly about the definition of O().
When someone say this algorithm is O(log(n)), you have to read:
When the input parameter n becomes very big, the number of operations performed by the algorithm grows at most in log(n)
Now, this means two things:
You have to have at least one input parameter n. There is no point in talking about O() without one (as in your case 2).
You need to define the operations that you are counting. These can be additions, comparison between two elements, number of allocated bytes, number of function calls, but you have to decide. Usually you take the operation that's most costly to you, or the one that will become costly if done too many times.
So keeping this in mind, back to your problems:
n is myArray.Length, and the number of operations you're counting is '=='. In that case the answer is exactly n, which is O(n)
you can't specify an n
the n can only be k, and the number of operations you count is ++. You have exactly k*(k+1)/2 which is O(n2) as you say
this time n is the length of your array again, and the operation you count is ==. In this case, the number of operations depends on the data, usually we talk about 'worst case scenario', meaning that of all the possible outcome, we look at the one that takes the most time. At best, the algorithm takes one comparison. For the worst case, let's take an example. If the array is [[1,2,3,4,5,6,7,8,9]] and you are looking for 4, your intArray[mid] will become successively, 5, 3 and then 4, and so you would have done the comparison 3 times. In fact, for an array which size is 2^k + 1, the maximum number of comparison is k (you can check). So n = 2^k + 1 => k = ln(n-1)/ln(2). You can extend this result to the case when n is not = 2^k + 1, and you will get complexity = O(ln(n))
In any case, I think you are confused because you don't exactly know what O(n) means. I hope this is a start.

Rabin-Karp Algorithm

I am interested in implementing the Rabin-Karp algorithm to search for sub strings as stated on wiki: http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm. Not for homework, but for self-interest. I have implemented the Rabin-Karp algorithm (shown below) and it works. However, the performance is really, really bad!!! I understand that my hash function is basic. However, it seems that a simple call to strstr() will always outperform my function rabin_karp(). I can understand why - the hash function is doing more work than a simple char-by-char compare each loop. What am I missing here? Should the Rabin-Karp algorithm be faster than a call to strstr()? When is the Rabin-Karp algorithm best used? Hence my self-interest. Have I even implemented the algorithm right?
size_t hash(char* str, size_t i)
{
size_t h = 0;
size_t magic_exp = 1;
// if (str != NULL)
{
while (i-- != 0)
{
magic_exp *= 101;
h += magic_exp + *str;
++str;
}
}
return h;
}
char* rabin_karp(char* s, char* find)
{
char* p = NULL;
if (s != NULL && find != NULL)
{
size_t n = strlen(s);
size_t m = strlen(find);
if (n > m)
{
size_t hfind = hash(find, m);
char* end = s + (n - m + 1);
for (char* i = s; i < end; ++i)
{
size_t hs = hash(i, m);
if (hs == hfind)
{
if (strncmp(i, find, m) == 0)
{
p = i;
break;
}
}
}
}
}
return p;
}

You haven't implemented the hash correctly. The key to Rabin-Karp is to incrementally update the hash as the potential match moves along the string to be searched. As you've determined, if you recalculate the entire hash for each potential match position, things will be really slow.
For every case except for the first comparison, your hash function should take an existing hash, one new character, and one old character, and return an updated hash.

Rabin-Karp is a rolling hash algorithm - the idea is to be able to move the substring one position to either direction(left or right) and be able to recompute the hash with constant number of operations. As you have implemented it the search has complexity O(N * L) where N is the length of the big string and L is the length of the string you are searching for. This is the complexity of the most naive approach and is in fact a little pesimization to it in my opinion.
To improve your algorithm precompute the exponents of magic_exp and use them to 'roll' your hash - basically just as with polynoms you need to subtract the highest degree multiply by magic_exp and add the hash of the symbol to the right(for moving the hash to the right).
Hope this helps.

strstr is using the KMP algorithm which is also linear in nature. This means that the complexity of the two algorithms is approximately the same. From then on the constant is the important factor. Especially in the case where you have bad hash functions with a lot of collisions the KMP will be a lot faster.
EDIT: One more thing. It is very important for the Rabin Karp algorithm to have all the hash codes of the prefixes precalculated. Now you are not implementing proper Rabin Karp, because the calls to your function will be linear, not constant in complexity. (Which by the way means that wikipedia is not very good source to learn Rabin Karp from).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js