Calculating algorithm complexity by counting steps - c++

Trying to calculate the big o of the function by counting the steps. I think those are how to count each step by following how they did it in the examples, but not sure how to calculate the total.
int function (int n){
int count = 0; // 1 step
for (int i = 0; i <= n; i++) // 1 + 1 + n * (2 steps)
for (int j = 0; j < n; j++) // 1 + 1 + n * (2 steps)
for (int k = 0; k < n; k++) // 1 + 1 + n * (2 steps)
for (int m = 0; m <= n; m++) // 1 + 1 + n * (2 steps)
count++; // 1 step
return count; // 1 step
}
I want to say this function is O(n^2), but I dont understand how that was calculated.
Examples I've been looking at
int func1 (int n){
int sum = 0; // 1 step
for (int i = 0; i <= n; i++) // 1 + 1 + n * (2 steps)
sum += i; // 1 step
return sum; // 1 step
} //total steps: 4 + 3n
and
int func2 (int n){
int sum = 0; // 1 step
for (int i = 0; i <= n; i++) // 1 + 1 + n * (2 steps)
for (int j = 0; j <= n; j++) // 1 + 1 + n * (2 steps)
sum ++; // 1 step
for (int k = 0; k <= n; k++) // 1 + 1 + n * (2 steps)
sum--; // 1 step
return sum; // 1 step
}
//total steps: 3n^2 + 7n + 6

What you've just proposed here are quite simple examples.
In my opinion you just need to understand how the complexity in a cycle works, in order to understand your examples.
In short (very briefly) a cycle has to be considered in asymptotic complexity as following:
loop (condition) :
// loop body
end loop
The condition of the loop should tell you how many times the loop will be executed compared to the size of the input.
The complexity of the body (you can consider the body as an sub-function and compute the complexity as well) has to be multiplied by the complexity of the loop.
The reason is quite intuitive: what you have in the body will be executed repetitively until the condition is verified, that is the number of times the loop (and so the body) will be executed.
Just some example:
// Array linear assignment
std::vector<int> array(SIZE_ARRAY);
for (int i = 0; i < SIZE_ARRAY; ++i) {
array[i] = i;
}
Let's analyse that simple loop:
First of all, we need to select the input relative to our complexity function will be computed. That case is pretty trivial: the variable is the size of the array. That's because we want to know how our program act respect the growing of the size of the input array.
The loop will be repeat SIZE_ARRAY times. So number of times the body will be executed is SIZE_ARRAY times (note: that values is variable, is not constant value).
Now consider the loop body. The instruction array[i] = i does not depend on how big is the array. It takes an unknown number of CPU cycles, but that number is always the same, that is constant.
Summarizing, we repeat SIZE_ARRAY times an instruction which takes a constant number of CPU clocks (let's say k is that value, is constant).
So, mathematically the number of CPU clocks will be executed for that simple program will be SIZE_ARRAY * k.
With the O Big notation we can describe the limiting behaviour. That is the behaviour a function will assume when the independent variable goes to infinity.
We can write:
O(SIZE_ARRAY * k) = O(SIZE_ARRAY)
That's because k is a constant value and by definition of Big O Notation the constant does not grown at infinity (is constant ever).
If we call SIZE_ARRAY as N (the size of the input) we can say that our function is a O(N) in time complexity.
The last ("more complicate") example:
for (int i = 0; i < SIZE_ARRAY; ++i) {
for (int j = 0; j < SIZE_ARRAY; ++j) {
array[j] += j;
}
}
As before our problem size is compared to the SIZE_ARRAY.
Shortly:
The first cycle will be execute SIZE_ARRAY times, that is O(SIZE_ARRAY).
The second cycle will be execute SIZE_ARRAY times.
The body of the second cycle is an instruction which will take a constant number of CPU cycle, let's say that number is k.
We take the number of time the first loop will be executed and we multiply it by its body complexity.
O(SIZE_ARRAY) * [first_loop_body_complexity].
But the body of the first loop is:
for (int j = 0; j < SIZE_ARRAY; ++j) {
array[j] += j;
}
Which is a single loop as the previous example, and we've just computed is complexity. It is an O(SIZE_ARRAY). So we can see that:
[first_loop_body_complexity] = O(SIZE_ARRAY)
Finally, our entire complexity is:
O(SIZE_ARRAY) * O(SIZE_ARRAY) = O(SIZE_ARRAY * SIZE_ARRAY)
That is
O(SIZE_ARRAY^2)
Using N instead of SIZE_ARRAY.
O(N^2)

Disclaimer: this is not a mathematical explication. It's a dumb down version which I think can help someone who is introduced to the world of complexities and is as clueless as I was when I first met this concept. Also I don't give you the answers. Just try to help you get there.
Moral of the story: don't count steps. Complexity is not about how many instructions (I will use this instead of "steps") are executed. That in itself is (almost) completely irelevant. In layman terms (time) complexity is about how does the execution time grow depending on how the input grows - that's how I finally understood complexity.
Let's take it step by step with some of the most encountered complexity:
constant complexity: O(1)
this represents an algorithm whose execution time does not depend on the input. The execution time doesn't grow when the input grows.
For instance:
auto foo_o1(int n) {
instr 1;
instr 2;
instr 3;
if (n > 20) {
instr 4;
instr 5;
instr 6;
}
instr 7;
instr 8;
};
The execution time of this function doesn't depend on the value of n. Notice how I can say that even if some instructions get executed or not depending on the value of n. Mathematically this is because O(constant) == O(1). Intuitively it's because the growth of the number of instructions it's not proportional with n. In the same ideea, it's irrelevant if the function has 10 instr or 1k instructions. It's still O(1) - constant complexity.
Linear complexity: O(n)
this represents an algorithm whose execution time is proportional with the input. When given a small input it takes a certain amount. When increasing the input the execution time grows proportionally:
auto foo1_on(int n)
{
for (i = 0; i < n; ++i)
instr;
}
This function is O(n). This means that when the input doubles, the execution time grows by a factor. This is true for any input. E.g when you double the input from 10 to 20 and when you double the input from 1000 to 2000 there is more or less the same factor in the growth of the execution time of the algorithm.
In line with the ideea of ignoring what doesn't contribute much comparatively with the "fastest" growth, all the next functions still have O(n) complexity. Mathematically O complexities are upper-bounded. This leads to O(c1*n + c0) = O(n)
auto foo2_on(int n)
{
for (i = 0; i < n / 2; ++i)
instr;
}
here: O(n / 2) = O(n)
auto foo3_on(int n)
{
for (i = 0; i < n; ++i)
instr 1;
for (i = 0; i < n; ++i)
instr 2;
}
here O(n) + O(n) = O(2*n) = O(n)
polynomial order 2 complexity: O(n^2)
This tells you that as you grow the input, the execution time grows by factor bigger and bigger. For instance the next is a valid behavior of an O(n^2) algorithm:
Read: When you double the input from .. to .. you could get an increase of execution time of .. times
from 100 to 200 : 1.5 times
from 200 to 400 : 1.8 times
from 400 to 800 : 2.2 times
from 800 to 1600 : 6 times
from 1600 to 3200 : 500 times
Try this!. Write an O(n^2) algorithm. And double the input. At first you will see small increases of computation time. At one time it just blows and you have to wait few minutes when at the previous steps it just took mere seconds.
This can be easily understand once you look over a n^2 graph.
auto foo_on2(int n)
{
for (i = 0; i < n; ++i)
for (j = 0; j < n; ++j)
instr;
}
How is this function O(n)? Simple: first loop executes n times. (I don't care if it's n times plus 3 or 4*n. Then, for each step of the first loop the second loop executes n times. There are n iterations of the i loop. For each i iteration there are n j iterations. So in total we have n * n = n^2 j iterations. Thus O(n^2)
There are other interesting complexities like logarithmic, exponential etc etc. Once you understand the concept behind the math, it gets very interesting. For instance a logarithmic complexity O(log(n)) has an execution time that grows slower and slower as the input grows. You can clearly see that when you look over a log graph.
There are a lot of resources on the net about complexities. Search. Read. Don't understand! Search again. Read. Take paper and pen. Understand!. Repeat.

To keep it simple:
O(N) means less than or equal to N. Therefore, and within a one code snippet, we ignore all and focus on the code that takes the highest number of steps (Highest power) to solve the problem / finish the execution.
Following your examples:
int function (int n){
int count = 0; // ignore
for (int i = 0; i <= n; i++) // This looks interesting
for (int j = 0; j < n; j++) // This looks interesting
for (int k = 0; k < n; k++) // This looks interesting
for (int m = 0; m <= n; m++) // This looks interesting
count++; // This is what we are looking for.
return count; // ignore
}
For that statement to finish we will need to "wait" or "cover" or "step" (n + 1) * n * n * (n + 1) => O(~N^4).
Second example:
int func1 (int n){
int sum = 0; // ignore
for (int i = 0; i <= n; i++) // This looks interesting
sum += i; // This is what we are looking for.
return sum; // ignore
}
For that to finish it needs n + 1 steps => O(~n).
Third example:
int func2 (int n){
int sum = 0; // ignore
for (int i = 0; i <= n; i++) // This looks interesting
for (int j = 0; j <= n; j++) // This looks interesting
sum ++; // This is what we are looking for.
for (int k = 0; k <= n; k++) // ignore
sum--; // ignore
return sum; // ignore
}
For that to finish we will need (n + 1) * (n + 1) steps => O(~N^2)

In these simple cases you can determine time complexity by finding the instruction that is executed most often and then find out how this number depends on n.
In example 1, count++ is executed n^4 times => O(n^4)
In example 2, sum += i; is executed n times => O(n)
In examples 3, sum ++; is executed n^2 times => O(n^2)
Well, actually that is not correct because some of your loops are executed n+1 times, but that doesn't matter at all. In example 1, the instruction is actually executed (n+1)^2*n^2 times which is the same as n^4 + 2 n^3 + n^2. For time complexity only the largest power counts.

Related

Big O notation calculation for nested loop

for ( int i = 1; i < n*n*n; i *= n ) {
for ( int j = 0; j < n; j += 2 ) {
for ( int k = 1; k < n; k *= 3 ) {
cout<<k*n;
}
}
}
I am facing an issue with this exercise, where I need to find the big O notation of the following code, but I got O(n^5) where the first loop is n^3, 2nd loop n, and the 3rd loop is n and I am not sure if I am correct or not. Can someone help me please?
Your analysis is not correct.
The outer loop multiplies i by n each ietration,starting from 1 till n^3.
Therefore there will be 3 iterations which is O(1).
The middle loop increments j by 2 each iteration, starting from 0 till n.
Therefore there will be n/2 iterations which is O(n).
The inner loop multiplies k by 3, from 1 till n.
Therefore there will be log3(n) iterations, which is O(log(n)).
If this step is not clear, see here: Big O confusion: log2(N) vs log3(N).
The overall time complexity of the code is a multiplication of all the 3 expressions above (since these are nested loops).
Therefore is it: O(1) * O(n) * O(log(n)), i.e.: O(n*log(n)).
First loop is i=1 and every time multiplies by n until its n^3 so its 3 times, which is O(1).
Second loop is j=1 and every time added 2 until its n so its n/2 times, which is O(n).
Third loop is k=1 and multiplies by 3 until its n, which is O(log(n))
Overall: O(n*log(n))
You are not correct. In first loop for ( int i = 1; i < n*n*n; i *= n ) pay attention to "statement 3"(i *= n), and in third loop for ( int k = 1; k < n; k *= 3 ) also pay attention to "statement 3"(k *= 3). Your calculation for second loop is great.

What is time complexity for this function?

Consider this function:
void func()
{
int n;
std::cin >> n;
int var = 0;
for (int i = n; i > 0; i--)
for (int j = 1; j < n; j *= 2)
for (int k = 0; k < j; k++)
var++;
}
I think that the time complexity is O(n^2 * log n)
but when n is 2^m, I have a hard time thinking what complexity it is.
How can I analyze the complexity of this function?
n isn't a constant, it's dynamic, so that's the variable in your analysis. If it was constant, your complexity would be O(1) regardless of its value because all constants are discarded in complexity analysis.
Similarly, "n is 2^m" is sort of nonsensical because m isn't a variable in the code, so I'm not sure how to analyze that. Complexity analysis is done relative to the size of the input; you don't have to introduce any more variables.
Let's break down the loops, then multiply them together:
for (int i = n; i > 0; i--) // O(n)
for (int j = 1; j < n; j *= 2) // O(log(n))
for (int k = 0; k < j; k++) // O(n / log(n))
Total time complexity: O(n * log(n) * (n / log(n))) => O(n^2).
The first two loops are trivial (if the second one isn't obvious, it's logarithmic because of repeated multiplication by 2, the sequence is 1, 2, 4, 8, 16...).
The third loop is tougher to analyze because it runs on j, not n. We can simplify matters by disregarding the outermost loop completely, analyzing the inner loops, then multiplying whatever complexity we get for the two inner loops by the outermost loop's O(n).
The trick is to look at the shape of the enclosing loop; as the j loop approaches n, k is running from 0..n linearly, giving a baseline of O(n) for the k loop. This is scaled by a logarithmic factor j, O(log(n)). The logarithmic factors cancel and we're left with O(n) for the inner loops.
Here's some empirical evidence of the inner loop complexity:
import math
from matplotlib import pyplot
def f(N):
count = 0
j = 1
while j < N:
j *= 2
count += j
return count
def linear(N):
return N
def linearithmic(N):
return N * math.log2(N) if N else 0
def plot_fn(fn, n_start, n_end, n_step):
x = []
y = []
for N in range(n_start, n_end, n_step):
x.append(N)
y.append(fn(N))
pyplot.plot(x, y, "-", label=fn.__name__)
def main():
max_n = 10 ** 10
step = 10 ** 5
for fn in [f, linear, linearithmic]:
plot_fn(fn, 0, max_n, step)
pyplot.legend()
pyplot.show()
if __name__ == "__main__":
main()
The plot this produces is:
This shows that the innermost two loops (in blue) are linear, not linearithmic, confirming the overall quadratic complexity once the outermost linear loop is re-introduced.

Determining the time complexity for code segments [duplicate]

This question already has answers here:
How can I find the time complexity of an algorithm?
(10 answers)
Closed 5 years ago.
What is the time complexity for each of the following code segments?
1. int i, j, y=0, s=0;
for ( j = 1; j <= n; j ++)
{
y=y+j;
}
for ( i = 1; i <= y; i ++)
{
s++;
}
My answer is O(2n) since it iterates through each loop n times and there are two loops
2. function (n) {
while (n > 1) {
n = n/3 ;
}
My answer for this one is n^(1/3) since n becomes a third of its size every time
3. function (n) {
int i, j, k ;
for ( i = n/2; i <= n; i ++ ) { //n/2?
for ( j = 1; j <= n; j = 2*j ) { //logn
for ( k = 1; k <= n; k = 2*k ) { //logn
cout << ”COSC 2437.201, 301” << endl;
}
}
}
}
I said the answer to this one was O(log2*log2n*n/2) but I'm pretty confused about the first for loop. The loop only has to iterate half of n times, so it would be n/2 correct? Thanks for your help, everyone.
Question 1
The first loop is O(n), as it runs n times. However, the second loop executes y times, not n - so the total runtime is not "2n"
At the end of the first loop, the value of y is:
Therefore the second loop dominates since it is O(n^2), which is thus also the overall complexity.
Question 3
This answer is correct (but again, drop the factors of 2 in O-notation).
However, you must be careful about naively multiplying the complexities of the loops together, because the boundaries of the inner loops may depend on the spontaneous values of the outer ones.
Question 2
This is not O(n^(1/3))! Your reasoning is incorrect.
If you look closely at this loop, it is in-fact similar to the reverse of the inner loops in question 3:
In Q3 the value of k starts at 1 and is multiplied by 2 each time until it reaches n
In Q2 the value of n is divided by 3 each time until it reaches 1.
Therefore they both take O(log n) steps.
(As a side note, a O(n^(1/3)) loop would look something like this:)
for (int i = 1; i*i*i <= n; i++)
/* ... */

Time complexity for an algorithm is ok?

I want to design an algorithm with O(n(log(2)n)^2) time complexity. I wrote this:
for(int i=1; i<=n; i++){
j=i;
while(j != 1)
j=j/2;
j=i;
while(j !=1)
j=j/2;
}
Does it have O(n(log(2)n)^2) time complexity? If not, where I am going wrong and how can I fix it so that its time complexity is O(n(log(2)n)^2)?
Slight digression:
As the guys said in the comments, the algorithm is indeed O(n log n). This is coincidentally identical to the result obtained by multiplying the complexity of the inner loop by the outer loop, i.e. O(log i) x O(n).
This may lead you to believe that we can simply add another iteration of the inner loop to obtain the (log n)2 part:
for (int i = 1; i < n; i++) {
int k = i;
while (k >= 1)
k /= 2;
int j = i;
while (j >= 1)
j /= 2;
}
}
But let's look at how the original complexity is derived:
(Using Sterling's approximation)
Therefore the proposed modification would give:
Which is not what we want.
An example I can think of from a recent personal project is semi-naive KD-tree construction. The pseudocode is given below:
def kd_recursive_cons (list_points):
if length(list_points) < predefined_threshold:
return new leaf(list_points)
A <- random axis (X, Y, Z)
sort list_points by their A-coordinate
mid <- find middle element in list_points
list_L, list_R <- split list_points at mid
node_L <- kd_recursive_cons(list_L)
node_R <- kd_recursive_cons(list_R)
return new node (node_L, node_R)
end
The time complexity function is therefore given by:
Where the n log n part is from sorting. We can obviously ignore the Dn linear part, and also the constant C. Thus:
Which is what we wanted.
Now to write a simpler piece of code with the same time complexity. We can make use of the summation we obtained in the above derivation...
And noting that the parameter passed to the log function is divided by two in every loop, we can thus write the code:
for (int i = 1; i < n; i++) {
for (int k = n; k >= 1; k /= 2) {
int j = k;
while (j >= 1)
j /= 2;
}
}
This looks like the "naive" but incorrect solution mentioned at the beginning, with the difference being that the nested loops there had different bounds (j did not depend on k, but k depended on i instead of directly on n).
EDIT: some numerical tests to confirm that the complexity is as intended:
Test function code:
int T(int n) {
int o = 0;
for (int i = 1; i < n; i++)
for (int j = n; j >= 1; j /= 2)
for (int k = j; k >= 1; k /= 2, o++);
return o;
}
Numerical results:
n T(n)
-------------------
2 3
4 18
8 70
16 225
32 651
64 1764
128 4572
256 11475
512 28105
1024 67518
2048 159666
4096 372645
8192 860055
16384 1965960
32768 4456312
65536 10026855
131072 22413141
262144 49807170
524288 110100270
Then I plotted sqrt(T(n) / n) against n. If the complexity is correct this should give a log(n) graph, or a straight line if plotted with a log-scale horizontal axis.
And this is indeed what we get:

Time complexity of loop with multiple inner loops

for (int i = 0; i < n; ++i ) { //n
for (int j = 0; j < i; ++j) { //n
cout<< i* j<<endl;
cout<< ("j = " + j);
}
for (int k = 0; k < n * 3; ++k) //n?
cout<<"k = " + k);
}
In this loop I see that the first for loop is O(n), the second loop is also O(n) but the 3rd for loop is confusing for me. K being less than something expanding would this also be O(n) for this loop? If so, what does two loops within another loop's time complexity come out to be in this context?
I am assuming O(n^2) due to the two n's in the middle not being multiplied in any way. Is this correct? Also if I'm correct and the second loop is O(n), what would the time complexity be if it was O(logn)?
(Not homework, simply for understanding purposes)
A good rule of thumb for big-O notation is the following:
When in doubt, work inside-out!
Here, let's start by analyzing the two inner loops and then work outward to get the overall time complexity. The two inner loops are shown here:
for (int j = 0; j < i; ++j) {
cout<< i* j<<endl;
cout<< (”j = ” + j);
}
for (int k = 0; k < n * 3; ++k)
cout<<”k = ” + k);
The first loop runs O(i) times and does O(1) work per iteration, so it does O(i) total work. That second loop runs O(n) times (it runs 3n times, and since big-O notation munches up constants, that's O(n) total times) and does O(1) work per iteration, so it does O(n) total work. This means that your overall loop can be rewritten as
for (int i = 0; i < n; ++i) {
do O(i) work;
do O(n) work;
}
If you do O(i) work and then do O(n) work, the total work done is O(i + n), so we can rewrite this even further as
for (int i = 0; i < n; ++i) {
do O(i + n) work;
}
If we look at the loop bounds here, we can see that i ranges from 0 up to n-1, so i is never greater than n. As a result, the O(i + n) term is equivalent to an O(n) term, since i + n = O(n). This makes our overall loop
for (int i = 0; i < n; ++i) {
do O(n) work;
}
From here, it should be a bit clearer that the overall runtime is O(n2), so we do O(n) iterations, each of which does O(n) total work.
You asked in a comment in another answer about what would happen if the second of the nested loops only ran O(log n) times instead of O(n) times. That's a great exercise, so let's see what happens if we try that out!
Imagine the code looked like this:
for (int i = 0; i < n; ++i) {
for (int j = 0; j < i; ++j) {
cout<< i* j<<endl;
cout<< ("j = " + j);
}
for (int k = 0; k < n; k *= 2)
cout<<"k = " + k);
}
Here, the second loop runs only O(log n) times because k grows geometrically. Let's again apply the idea of working from the inside out. The inside now consists of these two loops:
for (int j = 0; j < i; ++j) {
cout<< i* j<<endl;
cout<< ("j = " + j);
}
for (int k = 0; k < n; k *= 2)
cout<<"k = " + k);
Here, that first loop runs in time O(i) (as before) and the new loop runs in time O(log n), so the total work done per iteration is O(i + log n). If we rewrite our original loops using this, we get something like this:
for (int i = 0; i < n; ++i) {
do O(i + log n) work;
}
This one is a bit trickier to analyze, because i changes from one iteration of the loop to the next. In this case, it often helps to approach the analysis not by multiplying the work done per iteration by the number of iterations, but rather by just adding up the work done across the loop iterations. If we do this here, we'll see that the work done is proportional to
(0 + log n) + (1 + log n) + (2 + log n) + ... + (n-1 + log n).
If we regroup these terms, we get
(0 + 1 + 2 + ... + n - 1) + (log n + log n + ... + log n) (n times)
That simplifies to
(0 + 1 + 2 + ... + n - 1) + n log n
That first part of the summation is Gauss's famous sum 0 + 1 + 2 + ... + n - 1, which happens to be equal to n(n-1) / 2. (It's good to know this!) This means we can rewrite the total work done as
n(n - 1) / 2 + n log n
= O(n2) + O(n log n)
= O(n2)
with that last step following because O(n log n) is dominated by the O(n2) term.
Hopefully this shows you both where the result comes from and how to come up with it. Work from the inside out, working out how much work each loop does and replacing it with a simpler "do O(X) work" statement to make things easier to follow. When you have some amount of work that changes as a loop counter changes, sometimes it's easiest to approach the problem by bounding the value and showing that it never leaves some range, and other times it's easiest to solve the problem by explicitly working out how much work is done from one loop iteration to the next.
When you have multiple loops in sequence, the time complexity of all of them is the worst complexity of any of them. Since both of the inner loops are O(n), the worst is also O(n).
So since you have O(n) code inside an O(n) loop, the total complexity of everything is O(n2).
O n squared; calculate the area of a triangle.
We get 1+2+3+4+5+...+n, which is the nth triangular number. If you graph it, it is basically a triangle of height and width n.
A triangle with base n and height n has area 1/2 n^2. O doesn't care about constants like 1/2.