Time complexity of KMP algorithm - c++

I am trying to implement strstr using KMP algorithm. This is the algorithm given in wikipedia. The time complexity of KMP algorithm is given as O(n) where n is the size of larger string.
vector<int> KMP(string S, string K)
{
vector<int> T(K.size() + 1, -1);
vector<int> matches;
if(K.size() == 0)
{
matches.push_back(0);
return matches;
}
for(int i = 1; i <= K.size(); i++)
{
int pos = T[i - 1];
while(pos != -1 && K[pos] != K[i - 1]) pos = T[pos];
T[i] = pos + 1;
}
int sp = 0;
int kp = 0;
while(sp < S.size())
{
while(kp != -1 && (kp == K.size() || K[kp] != S[sp])) kp = T[kp];
kp++;
sp++;
if(kp == K.size()) matches.push_back(sp - K.size());
}
return matches;
}
I do not understand how the complexity of this algorithm is O(n). Can anybody explain how the complexity of this code is O(n) ?

I presume your worry is that the inner loop in both cases may be executed up to m times per outer loop iteration, leading to the worst-case complexities you mention. In fact, this cannot happen.
Looking at the first loop notice inductively that, since T[] is initialized to -1 we have T[0] < 0, and T[1] < 1, and... you can see that we in fact have T[i] < i, which makes sense because T[i] is a pointer back down the string we are searching for.
Now go to the second nested loop and see that kp is only incremented once per outer loop iteration and the inner while loop can only decrease it. Since kp is bounded below by -1 and starts off at 0, over the whole life of the method, we can execute only one more inner loop iteration, in total, than outer loop iteration, because otherwise kp would end up much smaller than -1. So the second nested loop can cost only a total of O(n).
The first nested loop looks trickier until you notice that pos is read from T[i - 1] at the start of the outer loop and then written as T[i] = pos + 1 at the end, so pos is equivalent to kp in the nested loop we have just analysed and the same argument shows that the cost is at most O(K.size()).

Related

How do I incorporate this do-while loop into the overall Big O analysis of my algorithm?

I'm doing problems on leetcode and was able to solve this one, but I'm not exactly sure what the Big O notation for my solution is. Here is the problem:
Given an array of integers 'nums' sorted in non-decreasing order, find the starting and ending position of a given target value.
If target is not found in the array, return [-1, -1].
You must write an algorithm with O(log n) runtime complexity.
Example 1:
Input: nums = [5,7,7,8,8,10], target = 8
Output: [3,4]
Example 2:
Input: nums = [5,7,7,8,8,10], target = 6
Output: [-1,-1]
Example 3:
Input: nums = [], target = 0
Output: [-1,-1]
My code:
class Solution {
public:
vector<int> searchRange(vector<int>& nums, int target) {
int l = 0,m,h = nums.size()-1;
vector<int> ans;
ans.push_back(-1);
ans.push_back(-1);
while (l <= h) {
m = (l+h)/2;
if (nums[m] == target) {
l = m-1;
h = m+1;
ans.at(0)=m;
ans.at(1)=m;
do {
if (l >= 0 and nums[l] == target) {
ans.at(0)=l;
l--;
}
else {
l = -99;
}
if (h <= nums.size()-1 and nums[h] == target) {
ans.at(1)=h;
h++;
}
else {
h = nums.size();
}
} while (l >= 0 or h < nums.size());
return ans;
}
else if (nums[m] < target) {
l = m+1;
}
else {
h = m-1;
}
}
return ans;
}
};
My Thoughts:
I used a binary search to locate the first instance of the target value so I know its at least O(logN), but what gets me confused is my inner do-while loop within the outer while loop. In class I was told the Big O notation of an algorithm will be, for instance, O(N^2) if there is a for loop nested within another for loop because for every iteration of the outer loop, the inner loop executes N times, assuming N is used as the value in the terminating condition for both loops. However, in this case the inner do-while loop will begin executing for only one outer loop iteration if and only if the target value is even in 'nums'. Using the same logic from class, this leaves me unsure as to how the inner do-while loop effects the Big O because if its O(N*N) for a for loop whose nested for loop occurs N times for every outer loop iteration, then what would it be for my solution where the inner do-while loop can begin executing either only for a single outer loop iteration or not at all? O(logN * 1) = O(logN) seems to be a viable answer until I consider the fact that the worst case runtime for the inner while loop would be O(N) if 'nums' consisted of N elements that were all the target value. I'd imagine this would make the Big O notation be O(N * logN * 1) = O(N * logN), which would then make my solution invalid but I'm not very confidant in that answer. Any help is greatly appreciated, thanks!
Your code complexity is O(log(N)) + O(N). As you can rearrange your code like below. Its not the code structure that determines the time complexity. Its how the program counter moves.
vector<int> searchRange(vector<int>& nums, int target) {
int l = 0,m,h = nums.size()-1;
vector<int> ans;
ans.push_back(-1);
ans.push_back(-1);
bool found = false;
//takes log(n) time
while (l <= h) {
m = (l+h)/2;
if (nums[m] == target) {
found = true;
break
}
else if (nums[m] < target) {
l = m+1;
}
else {
h = m-1;
}
}
if(found){
// takes O(n) time.
l = m-1;
h = m+1;
ans.at(0)=m;
ans.at(1)=m;
do {
if (l >= 0 and nums[l] == target) {
ans.at(0)=l;
l--;
}
else {
l = -99;
}
if (h <= nums.size()-1 and nums[h] == target) {
ans.at(1)=h;
h++;
}
else {
h = nums.size();
}
} while (l >= 0 or h < nums.size());
}
return ans;
}
This worst case can occur if all your entries are target and hence you need to traverse full array.
You can easily get the answer with O(log(n)) if you use https://en.cppreference.com/w/cpp/algorithm/equal_range function, it uses lower_bound and upper_bound both of which are O(log(n)).

Find the number of triples (i, j, k) in array such that A[i] + A[j] = 2 * A[k]

How to finds the number of tuplets/pairs i, j, k in array such that a[i] + a[j] = 2 * a[k]. The complexity should be O(n * logn) or O(n) since n <= 10^5.
Edit 2(important): abs(a[i]) <= 10^3.
Edit:
i, j, k must all be distinct.
Here is my code, but it's too slow, it's complexity O(is n^2 logn).
#include <bits/stdc++.h>
using namespace std;
int binarna(vector<int> a, int k){
int n = a.size();
int low = 0, high = n - 1;
bool b = false;
int mid;
while(low <= high){
mid = (low + high) / 2;
if (a[mid] == k){
b = true;
break;
}
if (k < a[mid])
high = mid - 1;
else
low = mid + 1;
}
if (b)
return mid;
else
return -1;
}
int main()
{
int n;
cin >> n;
vector<int> a(n);
for (auto& i : a)
cin >> i;
sort(a.begin(), a.end());
int sol = 0;
for (int i = 0; i < n - 1; ++i){
for (int j = i + 1; j < n; ++j){
if ((a[i] + a[j]) % 2)
continue;
int k = (a[i] + a[j]) / 2;
if (binarna(a, k) != -1)
++sol;
}
}
cout << sol << '\n';
}
The complexity can't probably be better than O(N²) because in the case of elements forming a single arithmetic progression, all pairs (i, j) with j-i even have a suitable element in the middle and the count is O(N²)*.
An O(N²) solution is as follows:
sort the array increasingly;
for every i,
set k=i and for every j>i,
increment k until 2 A[k] >= A[i] + A[j]
increment the count if equality is achieved
For a given i, j and k are monotonously increasing up to N so that the total number of operations is O(N-i). This justifies the global behavior O(N²), which is optimal.
*There is a little subtlety here as you might contradict the argument by claiming: "we can identify that the array forms an arithmetic sequence in time O(N), and from this compute the count in a single go".
But if instead of a single arithmetic sequence, we have two of them of length N/2, the quadratic behavior of the count remains even if they are intertwined. And there are at least N ways to intertwine two arithmetic sequences.
If the range of elements is much smaller than their number, it is advantageous to compress the data by means of an histogram.
The triple detection algorithm simplifies a little because k is systematically (i+j)/2. Every triple now counts for Hi.Hk.Hj instead of 1. The complexity is O(M²), where M is the size of the histogram.
Let's call D - total number of distinct values in the array. If abs(a[i]) <= 10^3, then you can't have more than 2*10^3 distinct values in the array. It means that if you are a bit smart, complexity of your algorithm becomes minimum of O(D^2*log(D)) and O(N*log(N)) which is far better than O(N^2*log(N)) and if you use smart algorithm suggested by Yves, you get minimum of O(D^2*log(D)) and O(N*log(N)).
Obviously O(N*log(N)) comes from sorting and you can't avoid it but that's OK even for N = 10^5. So how to reduce N to D in the main part of the algorithm? It is not hard. What you need is to replace the array of int values with an array of tuples (value, count) (let's call it B). It is easy to get such an array by scanning the original array after it is being sorted. The size of this new array is D (instead of N). Now you apply your algorithm or Yves improved algorithm to this array but each time you find a triplet (i,j,k) such that
2*B[k].value == B[i].value + B[j].value
you increment your total counter by
totalCount += B[k].count * B[i].count * B[j].count
Why this works? Consider the original sorted array. When you find a triplet (i,j,k) such that
2*A[k].value == A[i].value + A[j].value
You actually find 3 ranges for i, j and k such that in each range values are equal and so you can pick any number from the corresponding range. And simple combinatorics suggest the formula above.

Time complexity of loop with multiple inner loops

for (int i = 0; i < n; ++i ) { //n
for (int j = 0; j < i; ++j) { //n
cout<< i* j<<endl;
cout<< ("j = " + j);
}
for (int k = 0; k < n * 3; ++k) //n?
cout<<"k = " + k);
}
In this loop I see that the first for loop is O(n), the second loop is also O(n) but the 3rd for loop is confusing for me. K being less than something expanding would this also be O(n) for this loop? If so, what does two loops within another loop's time complexity come out to be in this context?
I am assuming O(n^2) due to the two n's in the middle not being multiplied in any way. Is this correct? Also if I'm correct and the second loop is O(n), what would the time complexity be if it was O(logn)?
(Not homework, simply for understanding purposes)
A good rule of thumb for big-O notation is the following:
When in doubt, work inside-out!
Here, let's start by analyzing the two inner loops and then work outward to get the overall time complexity. The two inner loops are shown here:
for (int j = 0; j < i; ++j) {
cout<< i* j<<endl;
cout<< (”j = ” + j);
}
for (int k = 0; k < n * 3; ++k)
cout<<”k = ” + k);
The first loop runs O(i) times and does O(1) work per iteration, so it does O(i) total work. That second loop runs O(n) times (it runs 3n times, and since big-O notation munches up constants, that's O(n) total times) and does O(1) work per iteration, so it does O(n) total work. This means that your overall loop can be rewritten as
for (int i = 0; i < n; ++i) {
do O(i) work;
do O(n) work;
}
If you do O(i) work and then do O(n) work, the total work done is O(i + n), so we can rewrite this even further as
for (int i = 0; i < n; ++i) {
do O(i + n) work;
}
If we look at the loop bounds here, we can see that i ranges from 0 up to n-1, so i is never greater than n. As a result, the O(i + n) term is equivalent to an O(n) term, since i + n = O(n). This makes our overall loop
for (int i = 0; i < n; ++i) {
do O(n) work;
}
From here, it should be a bit clearer that the overall runtime is O(n2), so we do O(n) iterations, each of which does O(n) total work.
You asked in a comment in another answer about what would happen if the second of the nested loops only ran O(log n) times instead of O(n) times. That's a great exercise, so let's see what happens if we try that out!
Imagine the code looked like this:
for (int i = 0; i < n; ++i) {
for (int j = 0; j < i; ++j) {
cout<< i* j<<endl;
cout<< ("j = " + j);
}
for (int k = 0; k < n; k *= 2)
cout<<"k = " + k);
}
Here, the second loop runs only O(log n) times because k grows geometrically. Let's again apply the idea of working from the inside out. The inside now consists of these two loops:
for (int j = 0; j < i; ++j) {
cout<< i* j<<endl;
cout<< ("j = " + j);
}
for (int k = 0; k < n; k *= 2)
cout<<"k = " + k);
Here, that first loop runs in time O(i) (as before) and the new loop runs in time O(log n), so the total work done per iteration is O(i + log n). If we rewrite our original loops using this, we get something like this:
for (int i = 0; i < n; ++i) {
do O(i + log n) work;
}
This one is a bit trickier to analyze, because i changes from one iteration of the loop to the next. In this case, it often helps to approach the analysis not by multiplying the work done per iteration by the number of iterations, but rather by just adding up the work done across the loop iterations. If we do this here, we'll see that the work done is proportional to
(0 + log n) + (1 + log n) + (2 + log n) + ... + (n-1 + log n).
If we regroup these terms, we get
(0 + 1 + 2 + ... + n - 1) + (log n + log n + ... + log n) (n times)
That simplifies to
(0 + 1 + 2 + ... + n - 1) + n log n
That first part of the summation is Gauss's famous sum 0 + 1 + 2 + ... + n - 1, which happens to be equal to n(n-1) / 2. (It's good to know this!) This means we can rewrite the total work done as
n(n - 1) / 2 + n log n
= O(n2) + O(n log n)
= O(n2)
with that last step following because O(n log n) is dominated by the O(n2) term.
Hopefully this shows you both where the result comes from and how to come up with it. Work from the inside out, working out how much work each loop does and replacing it with a simpler "do O(X) work" statement to make things easier to follow. When you have some amount of work that changes as a loop counter changes, sometimes it's easiest to approach the problem by bounding the value and showing that it never leaves some range, and other times it's easiest to solve the problem by explicitly working out how much work is done from one loop iteration to the next.
When you have multiple loops in sequence, the time complexity of all of them is the worst complexity of any of them. Since both of the inner loops are O(n), the worst is also O(n).
So since you have O(n) code inside an O(n) loop, the total complexity of everything is O(n2).
O n squared; calculate the area of a triangle.
We get 1+2+3+4+5+...+n, which is the nth triangular number. If you graph it, it is basically a triangle of height and width n.
A triangle with base n and height n has area 1/2 n^2. O doesn't care about constants like 1/2.

determining time complexity

I was reading a book which tells that the outer loops time complexity is O(n-m) whereas for inner loop the books gives explanation as
" The inner while loop goes around at most m times, and potentially
far less when the pattern match fails. This, plus two other
statements, lies within the outer for loop. The outer loop goes around
at most n−m times, since no complete alignment is possible once we get
too far to the right of the text. The time complexity of nested loops
multiplies, so this gives a worst-case running time of O((n − m)(m +
2)). "
I didn't understand for what reason the time complexity of inner loop is O(m+2) instead of O(m)? Please help.
int findmatch(char *p, char *t)
{
int i,j; /* counters */
int m, n; /* string lengths */
m = strlen(p);
n = strlen(t);
for (i=0; i<=(n-m); i=i+1) {
j=0;
while ((j<m) && (t[i+j]==p[j]))
j = j+1;
if (j == m) return(i);
}
return(-1);
}
The while loop:
while ((j<m) && (t[i+j]==p[j]))
j = j+1;
is O(m), then you have +2 from (other statements):
j=0; // + 1
// loop
if (j == m) return(i); // + 1

Find the running time in Big O notation

1) for (i = 1; i < n; i++) { > n
2) SmallPos = i; > n-1
3) Smallest = Array[SmallPos]; > n-1
4) for (j = i+1; j <= n; j++) > n*(n+1 -i-1)??
5) if (Array[j] < Smallest) { > n*(n+1 -i-1 +1) ??
6) SmallPos = j; > n*(n+1 -i-1 +1) ??
7) Smallest = Array[SmallPos] > n*(n+1 -i-1 +1) ??
}
8) Array[SmallPos] = Array[i]; > n-1
9) Array[i] = Smallest; > n-1
}
i know the big O notation is n^2 ( my bad its not n^3)
i am not sure between line 4-7 anyone care to help out?
im not sure how to get the out put for the second loop since j = i +1 as i changes so does j
also for line 4 the ans suppose to be n(n+1)/2 -1 i want to know why as i can never get that
i am not really solving for the big O i am trying to do the steps that gets to big O as constant and variables are excuded in big O notations.
I would say this is O(n^2) (although as Fred points out above, O(n^2) is a subset of O(n^3), so it's not wrong to say that it's O(n^3)).
Note that it's generally not necessary to compute the number of executions of every single line; as Big-O notation discards low-order terms, it's sufficient to focus only on the most-executed section (which will typically be inside the innermost loop).
So in your case, none of the loops are affected by the values in Array, so we can safely ignore all that. The innermost loop runs (n-1) + (n-2) + (n-3) + ... times; this is an arithmetic series, and so has a term in n^2.
Is this an algorithm given to you, or one you wrote?
I think your loop indexes are wrong.
for (i = 1; i < n; i++) {
should be either
for (i = 0; i < n; i++) {
or
for (i = 1; i <= n; i++) {
depending on whether your array indexes start at 0 or 1 (it's 0 in C and Java).
Assuming we correct it to:
for (i = 0; i < n; i++) {
SmallPos = i;
Smallest = Array[SmallPos];
for (j = i+1; j < n; j++)
if (Array[j] < Smallest) {
SmallPos = j;
Smallest = Array[SmallPos];
}
Array[SmallPos] = Array[i];
Array[i] = Smallest;
}
Then I think the complexity is n2-3/2n = O(n2).
Here's how...
The most costly operation in the innermost loop (my lecturer called this the "basic operation") is key comparison at line 5. It is done once per loop.
So now, you create a summation:
Sum(i=0 to n-1) of Sum(j=i+1 to n-1) of 1.
Now expand the innermost (rightmost) Sum to get:
Sum(i=0 to n-1) of (n-1)-(i+1)+1
and then:
Sum(i=0 to n-1) of n-i-1
and then:
[Sum(i=0 to n-1) of n] - [Sum(i=0 to n-1) of i] - [Sum (i=0 to n-1) of 1]
and then:
n[Sum(i=0 to n-1) of 1] - [(n-1)(n)/2] - [(n-1)-0+1]
and then:
n[(n-1)-0+1] - [(n^2-n)/2] - [n]
and then:
n^2 - [(n^2/2) - n/2] - n
equals:
1/2n^2 - 1/2n
is in:
O(n^2)
If you're asking why it's not O(n3)...
Consider the worst case. if (Array[j] < Smallest) will be true the most times if Array is reverse sorted.
Then you have an inner loop that looks like this:
Array[j] < Smallest;
SmallPos = j;
Smallest = Array[SmallPos];
Now we've got a constant three operations for every inner for (j...) loop.
And O(3) = O(1).
So really, it's i and j that determine how much work we do. Nothing in the inner if loop changes anything.
You can think of it as you should only count while and for loops.
As to why for (j = i+1; j <= n; j++) is n(n+1)/2. It's called an arithmetic series.
You're doing n-1 passes of the for (j...) loop when i==0, n-2 passes when i==1, n-3, etc, until 0.
So the summation is
n-1 + n-2 + n-3 + ... 3 + 2 + 1
now, you sum pairs from outside in, re-writing it as:
n-1+1 + n-2+2 + n-3+3 + ...
equals:
n + n + n + ...
and there are n/2 of these pairs, so you have:
n*(n/2)
Two for() loops, the outer loop from 1 to n, the inner loop runs between 1..n, to n. This makes it O(n^2).
If you 'draw this out', it'll be triangular, rather than rectangular, so O(n^2), while true, is hiding the fact that the constant factor term is smaller than if the inner loop also iterated from 1 to n.
It is O(n^2).
For each of the n iterations of the outer loop you have n iterations in the inner loop.