How would you find a T(n) run-time (not the big O run time) for a function that has two inputs? Do you just consider the a input your 'n'?
int h(int a, int b) {
if (a > 0) {
return h(a-1, a+b);
}
else {
return 0;
}
}
In this case we just need to consider a since the length of this algorithm isn't dependent on b.
In other words since we can pass in 20000 or -2 for b and not impact our time in the slightest (ignoring the actual time of adding a+b) we shouldn't have to consider b in our calculations.
In a more general case, if the input did depend on a and b we would simply account for this in our time complexity function. In other words it would be T(a, b) not just T(a).
as this function recurs only on a and a is decreasing by 1 in each step, it would give linear complexity. So answer would be T(a).
Given that for each and every (a,b)-pair the function value is zero - the recursion will always end in the else-branch - the compiler may be smart enough to reduce the code effectively to "return 0" for the whole body and leave all the if/else and recursion stuff out, resulting in O(1) complexity and corresponding run time.
Related
I'm learning about Big-O Notation and algorithms to improve my interview skills, but I don't quite understand how to get the time complexity.
Suppose I want to sum all the elements of the following list.
std::vector<int> myList = {1,2,3,4,5} ;
Case 1:
int sum = 0;
for (int it: myList)
{
sum += it;
}
Case 2:
int sum = std::accumulate(std::begin(myList), std::end(myList), 0);
Case 1 is O(N), and case 2 is apparently O(1), but I'm sure those functions do some kind of iteration, so the question is whether Big-O notation is calculated only from of the written code of that block or also of the functions used.
If you talk about big-O, you have to talk in respect of some unit of data being processed. Both your case 1 and case 2 are O(N) where N is the number of items in the container: the unit is an int.
You tend to want the unit - and N to be the count of - the thing that's likely to grow/vary most in your program. For example, if you're talking about processing names in phonebooks, then the number of names should be N; even though the length of individual names is also somewhat variable, there's no expected pattern of increasing average name length as your program handles larger phonebooks.
Similarly, if your program had to handle an arbitrary number of containers that tended to be roughly the same length, then your unit might be a container, and then you could think of your code - case 1 and case 2 - as being big-O O(1) with respect to the number of containers, because whether there are 0, 1, 10 or a million other containers lying around someone in your program, you're only processing the one - myList. But, any individual accumulate call is O(N) with respect to any individual container's ints.
I think this example should give you an idea.
int sum(std::vector<int> const& list)
{
int result = 0;
for( elem const& : list )
{
result += elem;
}
return result;
}
int main()
{
std::vector<int> test = {1,2,3,4,5,6};
// O(n)
int sum1 = 0;
for( elem const& : test )
{
sum1 += elem;
}
// O(???)
int sum2 = sum(test);
}
For an evaluation of the time complexity, it makes more sense to count the operations that take constant time. Hence sum is not a particularly good candidate unless
the sums are always done on the same number of elements, or
the distribution of the sum lengths is known and independent of the circumstances where the calls are made (to avoid any bias).
Such evaluations are rather unusual.
case 2 is apparently O(1)
Says who? cplusplus.com says about accumulate:
Complexity
Linear in the distance between first and last.
Which is the same O(N) as your case 1 code.
(I also checked cppreference.com but in this case it doesn't say something about the complexity.)
Here is a recursive function. Which traverses a map of strings(multimap<string, string> graph). Checks the itr -> second (s_tmp) if the s_tmp is equal to the desired string(Exp), prints it (itr -> first) and the function is executed for that itr -> first again.
string findOriginalExp(string Exp){
cout<<"*****findOriginalExp Function*****"<<endl;
string str;
if(graph.empty()){
str ="map is empty";
}else{
for(auto itr=graph.begin();itr!=graph.end();itr++){
string s_tmp = itr->second;
string f_tmp = itr->first;
string nll = "null";
//s_tmp.compare(Exp) == 0
if(s_tmp == Exp){
if(f_tmp.compare(nll) == 0){
cout<< Exp <<" :is original experience.";
return Exp;
}else{
return findOriginalExp(itr->first);
}
}else{
str="No element is equal to Exp.";
}
}
}
return str;
}
There are no rules for stopping and it seems to be completely random. How is the time complexity of this function calculated?
I am not going to analyse your function but instead try to answer in a more general way. It seems like you are looking for an simple expression such as O(n) or O(n^2) for the complexity for your function. However, not always complexity is that simple to estimate.
In your case it strongly depends on what are the contents of graph and what the user passes as parameter.
As an analogy consider this function:
int foo(int x){
if (x == 0) return x;
if (x == 42) return foo(42);
if (x > 0) return foo(x-1);
return foo(x/2);
}
In the worst case it never returns to the caller. If we ignore x >= 42 then worst case complexity is O(n). This alone isn't that useful as information for the user. What I really need to know as user is:
Don't ever call it with x >= 42.
O(1) if x==0
O(x) if x>0
O(ln(x)) if x < 0
Now try to make similar considerations for your function. The easy case is when Exp is not in graph, in that case there is no recursion. I am almost sure that for the "right" input your function can be made to never return. Find out what cases those are and document them. In between you have cases that return after a finite number of steps. If you have no clue at all how to get your hands on them analytically you can always setup a benchmark and measure. Measuring the runtime for input sizes 10,50, 100,1000.. should be sufficient to distinguish between linear, quadratic and logarithmic dependence.
PS: Just a tip: Don't forget what the code is actually supposed to do and what time complexity is needed to solve that problem (often it is easier to discuss that in an abstract way rather than diving too deep into code). In the silly example above the whole function can be replaced by its equivalent int foo(int){ return 0; } which obviously has constant complexity and does not need to be any more complex than that.
This function takes a directed graph and a vertex in that graph and chases edges going into it backwards to find a vertex with no edge pointing into it. The operation of finding the vertex "behind" any given vertex takes O(n) string comparisons in n the number of k/v pairs in the graph (this is the for loop). It does this m times, where m is the length of the path it must follow (which it does through the recursion). Therefore, it has time complexity O(m * n) string comparisons in n the number of k/v pairs and m the length of the path.
Note that there's generally no such thing as "the" time complexity for just some function you see written in code. You have to define what variables you want to describe the time in terms of, and also the operations with which you want to measure the time. E.g. if we want to write this purely in terms of n the number of k/v pairs, you run into a problem, because if the graph contains a suitably placed cycle, the function doesn't terminate! If you further constrain the graph to be acyclic, then the maximum length of any path is constrained by m < n, and then you can also get that this function does O(n^2) string comparisons for an acyclic graph with n edges.
You should approximate the control flow of the recursive calling by using a recurrence relation. It's been like 30 years since I took college classes in Discrete Math, but generally you do like pseuocode, just enough to see how many calls there are. In some cases just counting how many are on the longest condition on the right hand side is useful, but you generally need to plug one expansion back in and from that derive a polynomial or power relationship.
I am a bit confused on the topic of calculating complexity.
I know about Big O and also how to calculate the complexity of loops (nested also).
Suppose I have a program with 3 loops running from 1 to n
for (int i=0;i<n;i++)
{
cout << i ;
}
Now if I ran my CPP code having 3 for loops, will it take 3*n time?
Will the CPP compiler run all the 3 loops at the same time or will do it one after another?
I am very confused on this topic. Please help!
Now if I ran my CPP code having 3 for loops, will it take 3*n time?
Yes, assuming that the time of each loop iteration is the same, but in Big O notation O(3*n) == O(n), so the complexity is still linear.
Will the CPP compiler run all the 3 loops at the same time or will do it one after another?
Implicit concurrency requires a compiler to be 100% sure that parallelizing code will not change the outcome. It can be (and it is, see comments) done for simple operations, but cout << i is unlikely to be parallelized. It can be optimized in different ways however, e.g. if n is known at compile time, compiler could generate the whole string in one go and change the loop into cout << "123456...";.
Also, time complexity and concurrency are rather unrelated topics. Code executed on 20 threads will have the same complexity as code executed on one thread, it will just be faster (or not).
Now if I ran my CPP code having 3 for loops, will it take 3*n time?
Run a thousand loops and still it would be O(n), since while calculating the upper bound time complexity of a function any constant is neglected. So O(n*m) will always be O(n) if m doesn't depend on input size.
Also, the compiler won't run them at the same time, but sequentially one after the other(unless multi-threading, ofc). But even then, 3,10 or 1000 loops one after another will probably be considered O(n) as per the definition as long as the number of times you loop is not dependent on input size.
How to calculate complexity if a code contains multiple n complexity loops?
To understand Big-O notation and asymptotic complexity, it can be useful to resort at least to semi-formal notation.
Consider the problem of finding and upper bound on the asymptotic time complexity a function f(n) based on the growth of n.
To our help, lets loosely define a function or algorithm f being in O(g(n)) (to be picky, O(g(n)) being a set of functions, hence f ∈ O(...), rather than the commonly misused f(n) ∈ O(...)) as follows:
If a function f is in O(g(n)), then c · g(n) is an upper
bound on f(n), for some non-negative constant c such that f(n) ≤ c · g(n)
holds, for sufficiently large n (i.e. , n ≥ n0 for some constant
n0).
Hence, to show that f ∈ O(g(n)), we need to find a set of (non-negative) constants (c, n0) that fulfils
f(n) ≤ c · g(n), for all n ≥ n0, (+)
Let's consider your actual problem
void foo(int n) {
for (int i = 0; i < n; ++i) { std::cout << i << "\n"; }
for (int i = 0; i < n; ++i) { std::cout << i << "\n"; }
for (int i = 0; i < n; ++i) { std::cout << i << "\n"; }
}
and for analyzing the asymptotic behaviour of foo based on growth on n, consider std::cout << i << "\n"; as our basic operation. Thus, based on this definition, foo contains 3 * n basic operations, and we may consider foo mathematically as
f(n) = 3 * n.
Now, we need to find a g(n) and some set of constants c and n0 such that (+) holds. For this particular analysis this is nearly trivial; insert f(n) as above in (+) and let g(n) = n:
3 * n ≤ c · g(n), for all n ≥ n0, [let g(n) = n]
3 * n ≤ c · n, for all n ≥ n0, [choose c = 3]
3 * n ≤ 3 · n, for all n ≥ n0.
The latter holds for any valid n, and we may arbitrarily choose n0 = 0. Thus, as per our definition above of a function f being in O(g(n)), we have showed that f is in O(n).
It is apparent that even if we multiply the loop in foo a multiple of times, as long as this multiple is constant (and not dependent on n itself), we can always find a degenerate number of constants c and n0 that will fulfill (+) for g(n) = n, thus showing that the function f describing the number of basic operations in foo based on n is upper bounded by linear growth.
Now if I ran my CPP code having 3 for loops, will it take 3*n time?
However, it is essential to understand that Big-O notation describes the upper bound on the asymptotic behaviour of a mathematically described algorithm, or e.g. of programmatically implemented function that based on the definition of a basic operation can be described as the former. It does not, however, present an accurate description what runtime you may expect of different variations of how to implement a function. Cache locality, parallelism/vectorization, compiler optimizations and hardware intrinsics, inaccuracy in describing the basic operation are just a few of many factors that make the asymptotic complexity disjoint from actual runtime. The linked list data structure is good example of one where asymptotic analysis is not likely to give a good view of runtime performance (as loss of cache locality, actual size of lists and so on will likely have a larger effect).
For actual runtime of your algorithms, in case you are hitting a bottle neck, actually measuring on target hardware with product representative compiler and optimization flags is key.
What would be the efficieny of the following program, it is a for loop which runs for a finite no. of times.
for(int i = 0; i < 10; i++ )
{
//do something here, no more loops though.
}
So, what should be the efficiecy. O(1) or O(n) ?
That entirely depends on what is in the for loop. Also, computational complexity is normally measured in terms of the size n of the input, and I can't see anything in your example that models or represents or encodes directly or indirectly the size of the input. There is just the constant 10.
Besides, although sometimes the analysis of computational complexity may give unexpected, surprising results, the correct term is not "Big Oh", but rather Big-O.
You can only talk about the complexity with respect to some specific input to the calculation. If you are looping ten times because there are ten "somethings" that you need to do work for, then your complexity is O(N) with respect to those somethings. If you just need to loop 10 times regardless of the number of somethings - and the processing time inside the loop doesn't change with the number of somethings - then your complexity with respect to them is O(1). If there's no "something" for which the order is greater than 1, then it's fair to describe the loop as O(1).
bit of further rambling discussion...
O(N) indicates the time taken for the work to complete can be reasonably approximated by some constant amount of time plus some function of N - the number of somethings in the input - for huge values of N:
O(N) indicates the time is c + xN, where c is a fixed overhead and x is the per-something processing time,
O(log2N) indicates time is c + x(log2N),
O(N2) indicates time is c + x(N2),
O(N!) indicates time is c + x(N!)
O(NN) indicates time is c + x(NN)
etc..
Again, in your example there's no mention of the number of inputs, and the loop iterations is fixed. I can see how it's tempting to say it's O(1) even if there are 10 input "somethings", but consider: if you have a function capable of processing an arbitrary number of inputs, then decide you'll only use it in your application with exactly 10 inputs and hard-code that, you clearly haven't changed the performance characteristics of the function - you've just locked in a single point on the time-for-N-input curve - and any big-O complexity that was valid before the hardcoding must still be valid afterwards. It's less meaningful and useful though as N of 10 is a small amount and unless you've got an horrific big-O complexity like O(NN) the constants c and x take on a lot more importance in describing the overall performance than they would for huge values of N (where changes in the big-O notation generally have much more impact on performance than changing c or even x - which is of course the whole point of having big-O analysis).
Sure O(1), because here nothing does not depend linearly of n.
EDIT:
Let the loop body to contain some complex action with complexity O(P(n)) in Big O terms.
If we have a constant C number of iterations, the complexity of loop will be O(C * P(n)) = O(P(n)).
Else, now let the number of iterations to be Q(n), depends of n. It makes the complexity of loop O(Q(n) * P(n)).
I'm just trying to say that when the number of iterations is constant, it does not change the complexity of the whole loop.
n in Big O notation denotes the input size. We can't tell what is the complexity, because we don't know what is happening inside the for loop. For example, maybe there are recursive calls, depending on the input size? In this example overall is O(n):
void f(int n) // input size = n
{
for (int i = 0; i < 10; i++ )
{
//do something here, no more loops though.
g(n); // O(n)
}
}
void g(int n)
{
if (n > 0)
{
g(n - 1);
}
}
What would the big O notation of the function foo be?
int foo(char *s1, char *s2)
{
int c=0, s, p, found;
for (s=0; s1[s] != '\0'; s++)
{
for (p=0, found=0; s2[p] != '\0'; p++)
{
if (s2[p] == s1[s])
{
found = 1;
break;
}
}
if (!found) c++;
}
return c;
}
What is the efficiency of the function foo?
a) O(n!)
b) O(n^2)
c) O(n lg(base2) n )
d) O(n)
I would have said O(MN)...?
It is O(n²) where n = max(length(s1),length(s2)) (which can be determined in less than quadratic time - see below). Let's take a look at a textbook definition:
f(n) ∈ O(g(n)) if a positive real number c and positive integer N exist such that f(n) <= c g(n) for all n >= N
By this definition we see that n represents a number - in this case that number is the length of the string passed in. However, there is an apparent discrepancy, since this definition provides only for a single variable function f(n) and here we clearly pass in 2 strings with independent lengths. So we search for a multivariable definition for Big O. However, as demonstrated by Howell in "On Asymptotic Notation with Multiple Variables":
"it is impossible to define big-O notation for multi-variable functions in a way that implies all of these [commonly-assumed] properties."
There is actually a formal definition for Big O with multiple variables however this requires extra constraints beyond single variable Big O be met, and is beyond the scope of most (if not all) algorithms courses. For typical algorithm analysis we can effectively reduce our function to a single variable by bounding all variables to a limiting variable n. In this case the variables (specifically, length(s1) and length(s2)) are clearly independent, but it is possible to bound them:
Method 1
Let x1 = length(s1)
Let x2 = length(s2)
The worst case scenario for this function occurs when there are no matches, therefore we perform x1 * x2 iterations.
Because multiplication is commutative, the worst case scenario foo(s1,s2) == the worst case scenario of foo(s2,s1). We can therefore assume, without loss of generality, that x1 >= x2. (This is because, if x1 < x2 we could get the same result by passing the arguments in the reverse order).
Method 2 (in case you don't like the first method)
For the worst case scenario (in which s1 and s2 contain no common characters), we can determine length(s1) and length(s2) prior to iterating through the loops (in .NET and Java, determining the length of a string is O(1) - but in this case it is O(n)), assigning the greater to x1 and the lesser to x2. Here it is clear that x1 >= x2.
For this scenario, we will see that the extra calculations to determine x1 and x2 make this O(n² + 2n) We use the following simplification rule which can be found here to simplify to O(n²):
If f(x) is a sum of several terms, the one with the largest growth rate is kept, and all others omitted.
Conclusion
for n = x1 (our limiting variable), such that x1 >= x2, the worst case scenario is x1 = x2.
Therefore: f(x1) ∈ O(n²)
Extra Hint
For all homework problems posted to SO related to Big O notation, if the answer is not one of:
O(1)
O(log log n)
O(log n)
O(n^c), 0<c<1
O(n)
O(n log n) = O(log n!)
O(n^2)
O(n^c)
O(c^n)
O(n!)
Then the question is probably better off being posted to https://math.stackexchange.com/
In big-O notation, we always have to define what the occuring variables mean. O(n) doesn't mean anything unless we define what n is. Often, we can omit this information because it is clear from context. For example if we say that some sorting algorithm is O(n log(n)), n always denotes the number of items to sort, so we don't have to always state this.
Another important thing about big-O notation is that it only gives an upper limit -- every algorithm in O(n) is also in O(n^2). The notation is often used as meaning "the algorithm has the exact asymptotic complexity given by the expression (up to a constant factor)", but it's actual definition is "the complexity of the alogrithm is bounded by the given expression (up to a constant factor)".
In the example you gave, you took m and n to be the respective lengths of the two strings. With this definition, the algorithm is indeed O(m n). If we define n to be the length of the longer of the two strings though, we can also write this as O(n^2) -- this is also an upper limit for the complexity of the algorithm. And with the same definition of n, the algorithm is also O(n!), but not O(n) or O(n log(n)).
O(n^2)
The relevant part of the function, in terms of complexity, is the nested loops. The maximum number of iterations is the length of s1 times the length of s2, both of which are linear factors, so the worst-case computing time is O(n^2), i.e. the square of a linear factor. As Ethan said, O(mn) and O(n^2) are effectively the same thing.
Think of it this way:
There are two inputs. If the function simply returned, then it's performance is unrelated to the arguments. This would be O(1).
If the function looped over one string, then the performance is linearly related to the length of that string. Therefore O(N).
But the function has a loop within a loop. The performance is related to the length of s1 and the length of S2. Multiply those lengths together and you get the number of loop iterations. It's not linear any more, it follows a curve. This is O(N^2).