Why can the KMP failure function be computed in O(n) time? - c++

Wikipedia claims that the failure function table can be computed in O(n) time.
Let's look at its `canonical' implementation (in C++):
vector<int> prefix_function (string s) {
int n = (int) s.length();
vector<int> pi (n);
for (int i=1; i<n; ++i) {
int j = pi[i-1];
while (j > 0 && s[i] != s[j])
j = pi[j-1];
if (s[i] == s[j]) ++j;
pi[i] = j;
}
return pi;
}
Why does it work in O(n) time, even if there is an inner while-loop? I'm not really strong at the analysis of algorithms, so may somebody explain it?

This line: if (s[i] == s[j]) ++j; is executed at most O(n) times.
It caused increase in the value of p[i]. Note that p[i] starts at same value as p[i-1].
Now this line: j = pi[j-1]; causes decrease of p[i] by at least one. And since it was increased at most O(n) times (we count also increases and decreases on previous values), it cannot be decreased more than O(n) times.
So it as also executed at most O(n) times.
Thus the whole time complexity is O(n).

There's already two answers here that are correct, but I often think a fully laid out
proof can make things clearer. You said you wanted an answer for a 9-year-old, but
I don't think it's feasible (I think it's easy to be fooled into thinking it's true
without actually having any intuition for why it's true). Maybe working through this answer will help.
First off, the outer loop runs n times clearly because i is not modified
within the loop. The only code within the loop that could run more than once is
the block
while (j > 0 && s[i] != s[j])
{
j = pi[j-1]
}
So how many times can that run? Well notice that every time that condition is
satisfied we decrease the value of j which, at this point, is at most
pi[i-1]. If it hits 0 then the while loop is done. To see why this is important,
we first prove a lemma (you're a very smart 9-year-old):
pi[i] <= i
This is done by induction. pi[0] <= 0 since it's set once in the initialization of pi and never touched again. Then inductively we let 0 < k < n and assume
the claim holds for 0 <= a < k. Consider the value of p[k]. It's set
precisely once in the line pi[i] = j. Well how big can j be? It's initialized
to pi[k-1] <= k-1 by induction. In the while block it then may be updated to pi[j-1] <= j-1 < pi[k-1]. By another mini-induction you can see that j will never increase past pi[k-1]. Hence after the
while loop we still have j <= k-1. Finally it might be incremented once so we have
j <= k and so pi[k] = j <= k (which is what we needed to prove to finish our induction).
Now returning back to the original point, we ask "how many times can we decrease the value of
j"? Well with our lemma we can now see that every iteration of the while loop will
monotonically decrease the value of j. In particular we have:
pi[j-1] <= j-1 < j
So how many times can this run? At most pi[i-1] times. The astute reader might think
"you've proven nothing! We have pi[i-1] <= i-1 but it's inside the while loop so
it's still O(n^2)!". The slightly more astute reader notices this extra fact:
However many times we run j = pi[j-1] we then decrease the value of pi[i] which shortens the next iteration of the loop!
For example, let's say j = pi[i-1] = 10. But after ~6 iterations of the while loop we have
j = 3 and let's say it gets incremented by 1 in the s[i] == s[j] line so j = 4 = pi[i].
Well then at the next iteration of the outer loop we start with j = 4... so we can only execute the while at most 4 times.
The final piece of the puzzle is that ++j runs at most once per loop. So it's not like we can have
something like this in our pi vector:
0 1 2 3 4 5 1 6 1 7 1 8 1 9 1
^ ^ ^ ^ ^
Those spots might mean multiple iterations of the while loop if this
could happen
To make this actually formal you might establish the invariants described above and then use induction
to show that the total number of times that while loop is run, summed with pi[i] is at most i.
From that, it follows that the total number of times the while loop is run is O(n) which means that the entire outer loop has complexity:
O(n) // from the rest of the outer loop excluding the while loop
+ O(n) // from the while loop
=> O(n)

Let's start with the fact the outer loop executes n times, where n is the length of the pattern we seek. The inner loop decreases the value of j by at least 1, since pi[j] < j. The loop terminates at the latest when j == -1, therefore it can decrease the value of j at most as often as it has been increased previously by j++ (the outer loop). Since j++ is executed in the outer loop exactly n times, the overall number of executions of the inner while loop is limited to n. The preprocessing algorithm therefore requires O(n) steps.
If you care, consider this simpler implementation of the preprocessing stage:
/* ff stands for 'failure function': */
void kmp_table(const char *needle, int *ff, size_t nff)
{
int pos = 2, cnd = 0;
if (nff > 1){
ff[0] = -1;
ff[1] = 0;
} else {
ff[0] = -1;
}
while (pos < nff) {
if (needle[pos - 1] == needle[cnd]) {
ff[pos++] = ++cnd;
} else if (cnd > 0) {
cnd = ff[cnd]; /* This is O(1) for the reasons above. */
} else {
ff[pos++] = 0;
}
}
}
from which it is painfully obvious the failure function is O(n), where n is the length of the pattern sought.

Related

Is this Insertion Sort implementation worst case O(n)?

I know that Insertion Sort is supposed to be worst case O(n^2), but I'm wondering why the following implementation isn't O(n).
void main()
{
//insertion sort runs from i = 1 to i = n, thus is worst case O(n)
for (
int i = 1,
placeholder = 0,
A[] = { 10,9,8,7,6,5,4,3,2,1 },
j = i;
i <= 10;
j-- > 0 && A[j - 1] > A[j]
? placeholder = A[j], A[j] = A[j - 1], A[j - 1] = placeholder
: j = ++i
)
{
for (
int x = 0;
x < 10; x++
)
cout << A[x] << ' ';
cout << endl;
}
system("pause");
}
There is only one for loop involved here and it runs from 1 to n. It seems to me that this would be the definition of O(n). What exactly am I missing here?
Sloppy terminology has led many people to false conclusions. This appears to be an example.
There is only one for loop involved here and it runs from 1 to n.
Yes, there is only one loop, but what is this "it" to which you refer? I really do mean for you to think about it. Should "it" refer to the loop? That would match a fairly common, yet sloppy, use of terminology, but a loop does not evaluate to a value. So a loop cannot actually run from one value to another. The sloppiness can be overlooked in simpler contexts, but not in yours.
Normally, the "it" would really refer to the loop control variable. With a simple loop, like for (int i = 0; i < 10; ++i), there is a one-to-one correspondence between iterations of the loop and values assigned to the control variable (which is i in my example). So there is an equivalence present, allowing one to refer to the loop when one really means the control variable. Saying that a loop runs from x to y really means that the control variable runs from x to y, and that there is one iteration of the loop per value assigned to the control variable. This correspondence fails in your code.
In your loop, the thing that runs from 1 to n is i. However, i is not incremented with each iteration of the loop, so "it runs from 1 to n" is not an accurate assessment of your loop. When i is 1, there are up to 2 iterations. That's not a one-to-one correspondence between iterations and values of i. As i increases, the divergence from one-to-one grows. Each value of i potentially corresponds to i+1 iterations, as j counts down from i to 0. The total number of iterations in the worst case scenario for n entries is the sum of the potential number of iterations for each value of i: 2 + 3 + &ctdot; + (n+1) = (n² + 3n)/2. That's O(n²).
Moral of the story: writing compact, cryptic code does not magically change the complexity of the algorithm being implemented. Cryptic code can make the complexity harder to pin down, but the main thing you've accomplished is making your code harder to read.
Thats a very odd way to write code.But You have 2 for loops in the definition. It is not always necessary to have nested loops to have O(n^2), you can have it with recursion also.
In simple terms O(n^2)n simply means number of operations performed when the input size is n.
The code given is not a correct c++ code and not even close to a pseudocode.
The correct code should be like this:
void main()
{
int i,j,key;
int A[]={10,9,8,7,6,5,4,3,2,1};
//cout<<"Array before sorting:"<<endl;
//for(i=0;i<10;i++)
//cout<<A[i]<<"\t";
//cout<<endl;
for(i=1;i<10;i++)
{
key=A[i];
for(j=i-1;j>=0 && A[j]>key;j--)
{
A[j+1]=A[j];
}
A[j+1]=key;
}
//cout<<"Array after sorting:"<<endl;
//for(i=0;i<10;i++)
//cout<<A[i]<<"\t";
//cout<<endl;
}
See, insertion sort has two loops. Outer loop is to maintain the key variable and the inner loop is to compare the elements prior to key variable with the key variable. And therefore the worst case time complexity is O(n^2) and not O(n), as the basic algorithm of insertion sort contains two loops, both of which eventually iterate n times in case of worst case i.e. when the array is inverted.

Big O - Nested For Loop Breakdown Loop by Loop

I understand how to get a general picture of the big O of a nested loop, but what would be the operations for each loop in a nested for loop?
If we have:
for(int i=0; i<n; i++)
{
for(int j=i+1; j<1000; j++)
{
do something of constant time;
}
}
How exactly would we get T(N)? The outer for loop would be n operations, the inner would be 1000(n-1) and the inside would just be c is that right?
So T(n)=cn(1000(n-1)) is that right?
You want to collapse the loops and do a double summation. When i = 0, you run 1000-1 times. When i = 1, you run 1000 - 2 times, and so on up to n-1. This is equivalent to the sum from i = 0 to n of the series 999 - i, Note that you can separate the terms and get 999 n - n (n - 1)/2.
This is a pretty strange formula, because once n hits 1,000, the inner loop immediately short-circuits and does nothing. In this case, then, the asymptotic time complexity is actually O(n), because for high values of n, the code will just skip the inner loop in constant time.

Counting the basic operations of a given program

I am looking at the following: Operations Counting Example
Which is supposed to present the operations count of the following pseudocode:
Algorithm prefixAverages(A)
Input array A of n numbers
Output array B of n numbers such that B[i] is the average
of elements A[0], A[1], … , A[i]
for i = 0 to n - 1 do
b = 0
for j = 0 to i do
b = b + A[j]
j++;
B[i] = b / (i + 1)
return B
But I don't see how the counts on the inner for loop are reached. It says that for case i=0; j=0; the inner for loop runs twice? But it strikes me that it should only run once to see that 0 < 0. Can anyone provide insight into where the given operations count comes from or provide their own operations count?
This is under the assumption that primitive operations are:
Assignment
Array access
Mathematical operators (+, -, /, *)
Comparison
Increment/Decrement (math in disguise)
Return statements
Let me know if anything is unclear or you need more information
When the article you are following says "for var <- 0 to var2", it is like "for (var = 0; var <= var2; var++), so yes, when i = 0, it enters the "for" twice (once when i = 0, and again when i = 1, then it goes out).
(Sorry if bad english)
Edit and improve: When I calculate the complexity of a program, the only thing that interest me is the big O complexity; in this case, you have that the 'i' loop run 'n' times, and the 'j' loop run 'i' times, so the 'i' loop runs (1+2+3+...+n) times, that is n(n+1)/2 times, and that is an O(n**2) complexity.
In the first line, you have an assignament (i = something), and a comparison (i <= n-1) ("2 operations") for each i value, and as the last value is i=n, it does those 2 operations since i=0, until i=n, and as those are n+1 values (from 0 to n), this line do 2(n+1) operations.
The second line is a little obvious, as it enters the loop n times (since i=0, until i=n-1).
On the second loop, it do 2 things, an assignament, and a comparison (just as the first loop), and it do this i+2 times (for example, when i=0, it enters the loop 1 time, but it has to do the i=1 assignament, and the 1<=0 comparison, so its 2 times in total), so it do this calculus 2(i+2) times, but it do this since i=0, until i=n-1, so to calculate all of this, we have to do the sum (sum from i=0 until i=n-1: 2(i+2)) = 2((sum of i from 0 to n-1) + (sum of 2 from i=0 to i=n-1)) = 2((n(n-1)/2) + 2n) = n(n-1) + 4n = n"2 - n + 4n = n"2 + 3n.
I'll continue this later, I hope my answer so far is helpful for you. (again, sorry if some bad english)

Nested loops and prime numbers

Hello I have spent many hours now trying to figure out how this example given by my tutorial works, and there is a few things I don't understand and yes i have searched the web for help, but there is not much when it is this specific example i really want to understand.
The first thing I don't understand is that 'i' and 'j' = 2 and both the for loops has i++ and j++, won't that make 'i' and 'j' equal all the time?
So in the second for loop, if 'j' has to be less than e.g.. 4/4 = 1 then it has to be less than 1? when it has been initialized to be 2.
int i, j;
for(i=2; i<100; i++)
{
for(j=2; j <= (i/j); j++)
{
if(!(i%j))
break; // if factor found, not prime
if(j > (i/j))
cout << i << " is prime\n";
}
}
both the for loops has i++ and j++, won't that make 'i' and 'j' equal all the time?
Nope! i++ increments the outer loop, and j++ increments the inner loop. For each round of the outer loop, the inner loop can be iterated (and thus incremented) several times. So for every round of the outer loop, j goes through values from 2 to i/j in the inner loop.
I recommend you to try this code out in a debugger, or simulate it on pen and paper to understand what's happening.
The for loop on j will execute it full range for each and every value of i. so no, they will not always be equal.
And yes, when the value of i is low, the loop on j will not even get started, but then as i takes on progressively higher values, the loop on j will run a little longer for each value of i.
Just for example, think of the case i == 81. Then j will take on values in the range [2..9]
The code is searching for all the prime numbers between 2 and 99, so i and j are initialized to 2.*
Understood that, the first for loop try if every number between 2 and 99 is prime, using the second for loop, which searches for divisors of i.
If the second for loop doesn't find divisors, then i is prime.
The two nested loop don't have the same value because they are nested! So when i = 2, j=2, then j=3(and i is still 2), then j=4,(i is still 2)........then j=99, so the second loop is ended,then also the first for loop increment : i=3, j=2, then j=3(i is still 3) ..... Hope i've been clear :) Ask for doubts!
It doesn't look like the existing code will actually ever declare i to be prime, because of the upper limit on j. The cout statement that declares i to be prime triggers when j > (i/j), but j only is incremented up to (i/j) (it currently will never be greater than (i/j), even if i is prime).
Try adjusting the inner loop to be:
for (j = 2; j <= ceilf(float(i)/float(j)) + 1; j++)
or something along those lines.

Algorithm analysis: Am I analyzing these algorithms correctly? How to approach problems like these [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
1)
x = 25;
for (int i = 0; i < myArray.length; i++)
{
if (myArray[i] == x)
System.out.println("found!");
}
I think this one is O(n).
2)
for (int r = 0; r < 10000; r++)
for (int c = 0; c < 10000; c++)
if (c % r == 0)
System.out.println("blah!");
I think this one is O(1), because for any input n, it will run 10000 * 10000 times. Not sure if this is right.
3)
a = 0
for (int i = 0; i < k; i++)
{
for (int j = 0; j < i; j++)
a++;
}
I think this one is O(i * k). I don't really know how to approach problems like this where the inner loop is affected by variables being incremented in the outer loop. Some key insights here would be much appreciated. The outer loop runs k times, and the inner loop runs 1 + 2 + 3 + ... + k times. So that sum should be (k/2) * (k+1), which would be order of k^2. So would it actually be O(k^3)? That seems too large. Again, don't know how to approach this.
4)
int key = 0; //key may be any value
int first = 0;
int last = intArray.length-1;;
int mid = 0;
boolean found = false;
while( (!found) && (first <= last) )
{
mid = (first + last) / 2;
if(key == intArray[mid])
found = true;
if(key < intArray[mid])
last = mid - 1;
if(key > intArray[mid])
first = mid + 1;
}
This one, I think is O(log n). But, I came to this conclusion because I believe it is a binary search and I know from reading that the runtime is O(log n). I think it's because you divide the input size by 2 for each iteration of the loop. But, I don't know if this is the correct reasoning or how to approach similar algorithms that I haven't seen and be able to deduce that they run in logarithmic time in a more verifiable or formal way.
5)
int currentMinIndex = 0;
for (int front = 0; front < intArray.length; front++)
{
currentMinIndex = front;
for (int i = front; i < intArray.length; i++)
{
if (intArray[i] < intArray[currentMinIndex])
{
currentMinIndex = i;
}
}
int tmp = intArray[front];
intArray[front] = intArray[currentMinIndex];
intArray[currentMinIndex] = tmp;
}
I am confused about this one. The outer loop runs n times. And the inner for loop runs
n + (n-1) + (n-2) + ... (n - k) + 1 times? So is that O(n^3) ??
More or less, yes.
1 is correct - it seems you are searching for a specific element in what I assume is an un-sorted collection. If so, the worst case is that the element is at the very end of the list, hence O(n).
2 is correct, though a bit strange. It is O(1) assuming r and c are constants and the bounds are not variables. If they are constant, then yes O(1) because there is nothing to input.
3 I believe that is considered O(n^2) still. There would be some constant factor like k * n^2, drop the constant and you got O(n^2).
4 looks a lot like a binary search algorithm for a sorted collection. O(logn) is correct. It is log because at each iteration you are essentially halving the # of possible choices in which the element you are looking for could be in.
5 is looking like a bubble sort, O(n^2), for similar reasons to 3.
O() doesn't mean anything in itself: you need to specify if you are counting the "worst-case" O, or the average-case O. For some sorting algorithm, they have a O(n log n) on average but a O(n^2) in worst case.
Basically you need to count the overall number of iterations of the most inner loop, and take the biggest component of the result without any constant (for example if you have k*(k+1)/2 = 1/2 k^2 + 1/2 k, the biggest component is 1/2 k^2 therefore you are O(k^2)).
For example, your item 4) is in O(log(n)) because, if you work on an array of size n, then you will run one iteration on this array, and the next one will be on an array of size n/2, then n/4, ..., until this size reaches 1. So it is log(n) iterations.
Your question is mostly about the definition of O().
When someone say this algorithm is O(log(n)), you have to read:
When the input parameter n becomes very big, the number of operations performed by the algorithm grows at most in log(n)
Now, this means two things:
You have to have at least one input parameter n. There is no point in talking about O() without one (as in your case 2).
You need to define the operations that you are counting. These can be additions, comparison between two elements, number of allocated bytes, number of function calls, but you have to decide. Usually you take the operation that's most costly to you, or the one that will become costly if done too many times.
So keeping this in mind, back to your problems:
n is myArray.Length, and the number of operations you're counting is '=='. In that case the answer is exactly n, which is O(n)
you can't specify an n
the n can only be k, and the number of operations you count is ++. You have exactly k*(k+1)/2 which is O(n2) as you say
this time n is the length of your array again, and the operation you count is ==. In this case, the number of operations depends on the data, usually we talk about 'worst case scenario', meaning that of all the possible outcome, we look at the one that takes the most time. At best, the algorithm takes one comparison. For the worst case, let's take an example. If the array is [[1,2,3,4,5,6,7,8,9]] and you are looking for 4, your intArray[mid] will become successively, 5, 3 and then 4, and so you would have done the comparison 3 times. In fact, for an array which size is 2^k + 1, the maximum number of comparison is k (you can check). So n = 2^k + 1 => k = ln(n-1)/ln(2). You can extend this result to the case when n is not = 2^k + 1, and you will get complexity = O(ln(n))
In any case, I think you are confused because you don't exactly know what O(n) means. I hope this is a start.