Multiple linear operations impact on overall function worse case complexity? - c++

This is perhaps somewhat of a simple question, but I'm going to ask anyway folks:
I've written the below function:
std::vector<int> V={1,2,3,4,5};
int myFunction()
{
for(int i=V.size();i--;){//Do Stuff};
for int e=V.size();i--;){//Do Stuff};
}
This needs to have time complexity worse case O(n) and space complexity of worse case O(1).
Does having two linear operations (for-loops) change the time complexity to something other than O(n)?

No. It does not. O(N) means something like aN+b + something weaker than linear N.
Even something like: T(N)= 5N+100 + Log(N) is considered as O(N).
By "something weaker than linear N", I mean any function R(N) that satisfies the equation:
lim R(N)/N = 0 ; N-->Inifinity //Use L'Hospital's Rule for solving these kind of limits
So O(N) can be written as:
O(N) = aN+b + R(N)
Side Note: Complexity does not equal to Performance. Although (N+N) is still O(N), this does not mean it is not slower than (N). Performance, in its most basic form, is about the number cycles you need to do something not about the theoretical complexity.
However, it should be related at least when N goes to very big number (almost Infinity).

Related

Problem figuring out time and space complexity?

The question was to check whether two strings are rotation of each other or not. So, here is the function I wrote for the same:
bool areRotations(string s1, string s2)
{
int n1 = s1.length(), n2 = s2.length();
if (n1 != n2) return 0;
s1 += s1;
if (s1.find(s2) != string::npos)
return 1;
else
return 0;
}
I just checked whether s2 is present in s1+s1, if it is there, then s1 and s2 must be rotation of each other.
I am not able to figure out the time and space complexity of my code. What I can understand is that it should be O(n) time complexity because first to concatenate s1 to s1, we have to create a copy of s1, and also to find s2 in s1, we have to traverse, hence making time complexity O(n).
For space also, it should be O(n), because we are making a copy of s1. Is this correct?
I am not able to figure out the time and space complexity of my code. [...] Is this correct?
std::string::length runs in constant time (since C++11). The comparison and the concatenation run in linear time. But the overall algorithm could run in a non-linear time.
Indeed, the C++ standard does not actually require any specific algorithm or guarantee a complexity for std::string::find. Consequently, it is not possible to give an answers independent of the STL implementation you use.
If the implementation is naive or use a famous Boyer-Moore algorithm, the worst-case time-complexity is likely to be O(n^2) in your case (where n is the size of the input string). This could happen with inputs like s1="aaaaaaaaaca" and s2="aaaaaaaaaac". Despite std::search provide stronger guarantees, it does not provide any search algorithm running in linear-time. To ensure a linear-time complexity, you can use the KMP search algorithm (or better variants like the 2-way string-matching algorithm).
Thus, with the KMP algorithm, the complexity in time and space of your solution would be O(n). This is optimal as the input strings need to be read and stored somewhere (at least in your implementation).

Space Complexity of an initialised pointer data structure

I have got a question.
In terms of theoretical computer science, when we analyse an algorithm, if an algorithm initialises a new data structure, then we consider that data structure as part of space complexity.
Now I am not too sure about this part then.
Let's say I have got an array of int and I would like to map them by using a map of int pointers. Such as
std::map<int*,int*> mymap;
for (int i = 1; i < arraySize; i++) {
mymap[&arr[i-1]]=&arr[i];
}
If this algorithm was not using pointers, then we could clearly state that it is initialising a map with size of n, hence space complexity is O(n), however for this case, where we are using pointers, what would be the space complexity of this algorithm?
The space complexity of a single pointer is the same as that of any other primitive - i.e. O(1).
std::map<K,V> is implemented as a tree of N nodes. Its space complexity is O(N*space-complexity-of-one-node), so the total space complexity in your case is O(N).
Note that the big-O notation factors out the constant multiplier: although the big-O space complexity of an std::map<Ptr1,Ptr2> and std::vector<Ptr1> is the same, the multiplier for the map is higher, because tree construction imposes its overhead for storing tree nodes and connections among them.

order of an operation when upper bound is fixed

I recently had an interview and was asked to find number of bits in integer supplied. I had something like this:
#include <iostream>
using namespace std;
int givemCountOnes (unsigned int X) {
int count =0;
while (X != 0 ) {
if(X & 1)
count++;
X= X>>1;
}
return count;
}
int main() {
cout << givemCountOnes (4);
return 0;
}
I know there are better approaches but that is not the question here.
Question is, What is the complexity of this program?
Since it goes for number of bits in the input, people say this is O(n) where n is the number of bits in input.
However I feel that since the upper bound is sizeof(unsigned int) i.e. say 64 bits, I should say order is o(1).
Am I wrong?
The complexity is O(N). The complexity rises linearly with the size of the type used (unsigned int).
The upper bound does not matter as it can be extended any time in the future. It also does not matter because there is always an upper bound (memory size, number of atoms in the universe) and then everything could be considered O(1).
I will just add a better solution to above problem.
Use the following step in the Loop
x = x & (x-1);
This will remove the right most ON bit one at a time.
So your loop will at max run as long as there is an ON bit. Terminate when the number approaches 0.
Hence the complexity improves from O(number of bits in int) to O(number of on bits).
The O notation is used to tell what the difference is between different values of n. In this case, n would be the number of bits, as (in your case) the number of bits will change the (relative) time it takes to perform the calculation. So O(n) is correct - a one bit integer will take 1 unit of time, a 32-bit integer will take 32 units of time, and a 64-bit integer will take 64 units of time.
Actually, your algorithm is not dependent on the actual number of bits in the number, but the number of the highest bit set in the number, but that's a different matter. However, since we're typically talking about O as the "worst case", it's still O(n), where n is the number of bits in the integer.
And I can't really think of any method that is sufficiently better than that in terms of O - I can think of methods that improve the number of iterations in the loop (e.g using a 256 entry table, and dealing with 8 bits at a time), but it's still "bigger data -> longer time". Since O(n) and O(n/2) or O(n/8) are all the same (it's just that the overall time is 1/8 in the latter case than in the first case).
Big O notation describes count of algorithm steps in worst case scenario. Which is in this case, when there is a 1 in the last bit. So there will be n iterations/steps when you pass n bit number as input.
Imagine a similar algorithm which searches count of 1's in a list. It's complexity is O(n), where n is a list length. By your assumption, if you always pass fixed size lists as input, then algorithm complexity will become O(1) which is incorrect.
However if you fix bit length in algorithm: i.e. something like for (int i = 0; i < 64; ++i) ... then it will have O(1) complexity, since it doing O(1) operation 64 times, you can ignore constant here. Otherwise O(c*n) is O(n), O(c) is O(1), where c is constant.
Hope all these examples helped. BTW, there is O(1) solution for this, I'll post it when I remember :)
There's one thing should be cleared: the complexity of operation on your integer. It is not clear in this example, as you work on int, which is natural word size on your machine its complexity seem to be just 1.
But O-notation is about large amount of data and large tasks, say you have n bit integer, where n is about 4096 or so. In this case complexity addition, subtraction and shift are of O(n) complexity at least, so your algorithm then applied to such integer would be O(n²) complexity (n operations of O(n) complexity applied).
Direct count algorithm without shifting of whole number (in assumption that one bit test is O(1)) gives O(n log(n)) complexity (it involves up to n additions on log(n) sized integer).
But for fixed length data (which is C's int) big O analysis is simply meaningless, because it based on input data of variable length, say more, data of virtually any length upto infinity.

What is the Big Oh Efficiency of Program with Finite For Loop?

What would be the efficieny of the following program, it is a for loop which runs for a finite no. of times.
for(int i = 0; i < 10; i++ )
{
//do something here, no more loops though.
}
So, what should be the efficiecy. O(1) or O(n) ?
That entirely depends on what is in the for loop. Also, computational complexity is normally measured in terms of the size n of the input, and I can't see anything in your example that models or represents or encodes directly or indirectly the size of the input. There is just the constant 10.
Besides, although sometimes the analysis of computational complexity may give unexpected, surprising results, the correct term is not "Big Oh", but rather Big-O.
You can only talk about the complexity with respect to some specific input to the calculation. If you are looping ten times because there are ten "somethings" that you need to do work for, then your complexity is O(N) with respect to those somethings. If you just need to loop 10 times regardless of the number of somethings - and the processing time inside the loop doesn't change with the number of somethings - then your complexity with respect to them is O(1). If there's no "something" for which the order is greater than 1, then it's fair to describe the loop as O(1).
bit of further rambling discussion...
O(N) indicates the time taken for the work to complete can be reasonably approximated by some constant amount of time plus some function of N - the number of somethings in the input - for huge values of N:
O(N) indicates the time is c + xN, where c is a fixed overhead and x is the per-something processing time,
O(log2N) indicates time is c + x(log2N),
O(N2) indicates time is c + x(N2),
O(N!) indicates time is c + x(N!)
O(NN) indicates time is c + x(NN)
etc..
Again, in your example there's no mention of the number of inputs, and the loop iterations is fixed. I can see how it's tempting to say it's O(1) even if there are 10 input "somethings", but consider: if you have a function capable of processing an arbitrary number of inputs, then decide you'll only use it in your application with exactly 10 inputs and hard-code that, you clearly haven't changed the performance characteristics of the function - you've just locked in a single point on the time-for-N-input curve - and any big-O complexity that was valid before the hardcoding must still be valid afterwards. It's less meaningful and useful though as N of 10 is a small amount and unless you've got an horrific big-O complexity like O(NN) the constants c and x take on a lot more importance in describing the overall performance than they would for huge values of N (where changes in the big-O notation generally have much more impact on performance than changing c or even x - which is of course the whole point of having big-O analysis).
Sure O(1), because here nothing does not depend linearly of n.
EDIT:
Let the loop body to contain some complex action with complexity O(P(n)) in Big O terms.
If we have a constant C number of iterations, the complexity of loop will be O(C * P(n)) = O(P(n)).
Else, now let the number of iterations to be Q(n), depends of n. It makes the complexity of loop O(Q(n) * P(n)).
I'm just trying to say that when the number of iterations is constant, it does not change the complexity of the whole loop.
n in Big O notation denotes the input size. We can't tell what is the complexity, because we don't know what is happening inside the for loop. For example, maybe there are recursive calls, depending on the input size? In this example overall is O(n):
void f(int n) // input size = n
{
for (int i = 0; i < 10; i++ )
{
//do something here, no more loops though.
g(n); // O(n)
}
}
void g(int n)
{
if (n > 0)
{
g(n - 1);
}
}

Big O Notation for string matching algo

What would the big O notation of the function foo be?
int foo(char *s1, char *s2)
{
int c=0, s, p, found;
for (s=0; s1[s] != '\0'; s++)
{
for (p=0, found=0; s2[p] != '\0'; p++)
{
if (s2[p] == s1[s])
{
found = 1;
break;
}
}
if (!found) c++;
}
return c;
}
What is the efficiency of the function foo?
a) O(n!)
b) O(n^2)
c) O(n lg(base2) n )
d) O(n)
I would have said O(MN)...?
It is O(n²) where n = max(length(s1),length(s2)) (which can be determined in less than quadratic time - see below). Let's take a look at a textbook definition:
f(n) ∈ O(g(n)) if a positive real number c and positive integer N exist such that f(n) <= c g(n) for all n >= N
By this definition we see that n represents a number - in this case that number is the length of the string passed in. However, there is an apparent discrepancy, since this definition provides only for a single variable function f(n) and here we clearly pass in 2 strings with independent lengths. So we search for a multivariable definition for Big O. However, as demonstrated by Howell in "On Asymptotic Notation with Multiple Variables":
"it is impossible to define big-O notation for multi-variable functions in a way that implies all of these [commonly-assumed] properties."
There is actually a formal definition for Big O with multiple variables however this requires extra constraints beyond single variable Big O be met, and is beyond the scope of most (if not all) algorithms courses. For typical algorithm analysis we can effectively reduce our function to a single variable by bounding all variables to a limiting variable n. In this case the variables (specifically, length(s1) and length(s2)) are clearly independent, but it is possible to bound them:
Method 1
Let x1 = length(s1)
Let x2 = length(s2)
The worst case scenario for this function occurs when there are no matches, therefore we perform x1 * x2 iterations.
Because multiplication is commutative, the worst case scenario foo(s1,s2) == the worst case scenario of foo(s2,s1). We can therefore assume, without loss of generality, that x1 >= x2. (This is because, if x1 < x2 we could get the same result by passing the arguments in the reverse order).
Method 2 (in case you don't like the first method)
For the worst case scenario (in which s1 and s2 contain no common characters), we can determine length(s1) and length(s2) prior to iterating through the loops (in .NET and Java, determining the length of a string is O(1) - but in this case it is O(n)), assigning the greater to x1 and the lesser to x2. Here it is clear that x1 >= x2.
For this scenario, we will see that the extra calculations to determine x1 and x2 make this O(n² + 2n) We use the following simplification rule which can be found here to simplify to O(n²):
If f(x) is a sum of several terms, the one with the largest growth rate is kept, and all others omitted.
Conclusion
for n = x1 (our limiting variable), such that x1 >= x2, the worst case scenario is x1 = x2.
Therefore: f(x1) ∈ O(n²)
Extra Hint
For all homework problems posted to SO related to Big O notation, if the answer is not one of:
O(1)
O(log log n)
O(log n)
O(n^c), 0<c<1
O(n)
O(n log n) = O(log n!)
O(n^2)
O(n^c)
O(c^n)
O(n!)
Then the question is probably better off being posted to https://math.stackexchange.com/
In big-O notation, we always have to define what the occuring variables mean. O(n) doesn't mean anything unless we define what n is. Often, we can omit this information because it is clear from context. For example if we say that some sorting algorithm is O(n log(n)), n always denotes the number of items to sort, so we don't have to always state this.
Another important thing about big-O notation is that it only gives an upper limit -- every algorithm in O(n) is also in O(n^2). The notation is often used as meaning "the algorithm has the exact asymptotic complexity given by the expression (up to a constant factor)", but it's actual definition is "the complexity of the alogrithm is bounded by the given expression (up to a constant factor)".
In the example you gave, you took m and n to be the respective lengths of the two strings. With this definition, the algorithm is indeed O(m n). If we define n to be the length of the longer of the two strings though, we can also write this as O(n^2) -- this is also an upper limit for the complexity of the algorithm. And with the same definition of n, the algorithm is also O(n!), but not O(n) or O(n log(n)).
O(n^2)
The relevant part of the function, in terms of complexity, is the nested loops. The maximum number of iterations is the length of s1 times the length of s2, both of which are linear factors, so the worst-case computing time is O(n^2), i.e. the square of a linear factor. As Ethan said, O(mn) and O(n^2) are effectively the same thing.
Think of it this way:
There are two inputs. If the function simply returned, then it's performance is unrelated to the arguments. This would be O(1).
If the function looped over one string, then the performance is linearly related to the length of that string. Therefore O(N).
But the function has a loop within a loop. The performance is related to the length of s1 and the length of S2. Multiply those lengths together and you get the number of loop iterations. It's not linear any more, it follows a curve. This is O(N^2).