Problem figuring out time and space complexity? - c++

The question was to check whether two strings are rotation of each other or not. So, here is the function I wrote for the same:
bool areRotations(string s1, string s2)
{
int n1 = s1.length(), n2 = s2.length();
if (n1 != n2) return 0;
s1 += s1;
if (s1.find(s2) != string::npos)
return 1;
else
return 0;
}
I just checked whether s2 is present in s1+s1, if it is there, then s1 and s2 must be rotation of each other.
I am not able to figure out the time and space complexity of my code. What I can understand is that it should be O(n) time complexity because first to concatenate s1 to s1, we have to create a copy of s1, and also to find s2 in s1, we have to traverse, hence making time complexity O(n).
For space also, it should be O(n), because we are making a copy of s1. Is this correct?

I am not able to figure out the time and space complexity of my code. [...] Is this correct?
std::string::length runs in constant time (since C++11). The comparison and the concatenation run in linear time. But the overall algorithm could run in a non-linear time.
Indeed, the C++ standard does not actually require any specific algorithm or guarantee a complexity for std::string::find. Consequently, it is not possible to give an answers independent of the STL implementation you use.
If the implementation is naive or use a famous Boyer-Moore algorithm, the worst-case time-complexity is likely to be O(n^2) in your case (where n is the size of the input string). This could happen with inputs like s1="aaaaaaaaaca" and s2="aaaaaaaaaac". Despite std::search provide stronger guarantees, it does not provide any search algorithm running in linear-time. To ensure a linear-time complexity, you can use the KMP search algorithm (or better variants like the 2-way string-matching algorithm).
Thus, with the KMP algorithm, the complexity in time and space of your solution would be O(n). This is optimal as the input strings need to be read and stored somewhere (at least in your implementation).

Related

How do I calculate the time complexity of the following function?

Here is a recursive function. Which traverses a map of strings(multimap<string, string> graph). Checks the itr -> second (s_tmp) if the s_tmp is equal to the desired string(Exp), prints it (itr -> first) and the function is executed for that itr -> first again.
string findOriginalExp(string Exp){
cout<<"*****findOriginalExp Function*****"<<endl;
string str;
if(graph.empty()){
str ="map is empty";
}else{
for(auto itr=graph.begin();itr!=graph.end();itr++){
string s_tmp = itr->second;
string f_tmp = itr->first;
string nll = "null";
//s_tmp.compare(Exp) == 0
if(s_tmp == Exp){
if(f_tmp.compare(nll) == 0){
cout<< Exp <<" :is original experience.";
return Exp;
}else{
return findOriginalExp(itr->first);
}
}else{
str="No element is equal to Exp.";
}
}
}
return str;
}
There are no rules for stopping and it seems to be completely random. How is the time complexity of this function calculated?
I am not going to analyse your function but instead try to answer in a more general way. It seems like you are looking for an simple expression such as O(n) or O(n^2) for the complexity for your function. However, not always complexity is that simple to estimate.
In your case it strongly depends on what are the contents of graph and what the user passes as parameter.
As an analogy consider this function:
int foo(int x){
if (x == 0) return x;
if (x == 42) return foo(42);
if (x > 0) return foo(x-1);
return foo(x/2);
}
In the worst case it never returns to the caller. If we ignore x >= 42 then worst case complexity is O(n). This alone isn't that useful as information for the user. What I really need to know as user is:
Don't ever call it with x >= 42.
O(1) if x==0
O(x) if x>0
O(ln(x)) if x < 0
Now try to make similar considerations for your function. The easy case is when Exp is not in graph, in that case there is no recursion. I am almost sure that for the "right" input your function can be made to never return. Find out what cases those are and document them. In between you have cases that return after a finite number of steps. If you have no clue at all how to get your hands on them analytically you can always setup a benchmark and measure. Measuring the runtime for input sizes 10,50, 100,1000.. should be sufficient to distinguish between linear, quadratic and logarithmic dependence.
PS: Just a tip: Don't forget what the code is actually supposed to do and what time complexity is needed to solve that problem (often it is easier to discuss that in an abstract way rather than diving too deep into code). In the silly example above the whole function can be replaced by its equivalent int foo(int){ return 0; } which obviously has constant complexity and does not need to be any more complex than that.
This function takes a directed graph and a vertex in that graph and chases edges going into it backwards to find a vertex with no edge pointing into it. The operation of finding the vertex "behind" any given vertex takes O(n) string comparisons in n the number of k/v pairs in the graph (this is the for loop). It does this m times, where m is the length of the path it must follow (which it does through the recursion). Therefore, it has time complexity O(m * n) string comparisons in n the number of k/v pairs and m the length of the path.
Note that there's generally no such thing as "the" time complexity for just some function you see written in code. You have to define what variables you want to describe the time in terms of, and also the operations with which you want to measure the time. E.g. if we want to write this purely in terms of n the number of k/v pairs, you run into a problem, because if the graph contains a suitably placed cycle, the function doesn't terminate! If you further constrain the graph to be acyclic, then the maximum length of any path is constrained by m < n, and then you can also get that this function does O(n^2) string comparisons for an acyclic graph with n edges.
You should approximate the control flow of the recursive calling by using a recurrence relation. It's been like 30 years since I took college classes in Discrete Math, but generally you do like pseuocode, just enough to see how many calls there are. In some cases just counting how many are on the longest condition on the right hand side is useful, but you generally need to plug one expansion back in and from that derive a polynomial or power relationship.

Confusion about time complexity with hash maps

On leetcode I find it is common to "ignore" the worst-case time complexity involving hash maps. I thought in software interviews that it was standard to assume "worst case" as they often do. Below is my solution to a simple problem. The problem is to find the first non repeating char in a string. I understand that hash maps are on average O(1) lookup.. but when iterating over the string, and looking up the hash map, why is the time complexity not O(N^2) and instead is O(N)?
#include <unordered_map>
class Solution {
public:
unordered_map<char, int> m;
int firstUniqChar(string s) {
for(char c : s) {
m[c]++;
}
for(int i =0; i < s.length(); i++) {
if(m[s[i]] == 1) {
return i;
}
}
return -1;
}
};
It is on average O(N) because hash map is on average O(1) per lookup and you do O(N) of them.
On average means by averaging over all possible inputs. That means there might exists an input array that breaks a particular hash and achieves O(N) or much worse on every lookup.
Worst-case is heavily implementation specific - e.g. hashing into buckets depends on how are elements stored in each bucket. If they are in a simple list, then lookup is O(<duplicates>), binary tree will bring that down to O(log<duplicates>). There might also be a difference between searching for keys present and missing.
Also there is a big assumption that all hashed containers can grow with the number of elements stored. I.e. keeping the occupancy of buckets low.
It does not hurt to mention their worst-cases in interviews, it demonstrates you know they can have limits.
The time-complexity of the given problem is O(N). You may provide a perfect hash function for it, that is no collision ever happens. This perfect hash function here is static_cast<size_t>(256+c). Well, if you look at the fastest solutions to this problem on leetcode you see that guys use plain arrays.

Recursive base conversion time complexity analysis

Given an integer p and a destination base b, return a string representation of p in base b. The string should have the least significant bit at the end
^ This is the problem I'm giving myself.
The naive recursive algorithm (in C++) I came up with is as follows:
string convertIntToBaseRecursive(int number, int base) {
// Base case
if (!number) return "";
// Adding least significant digit to "the rest" computed recursively
// Could reverse these operations if we wanted the string backwards.
return convertToBaseRecursive(number / base, base) + to_string(number % base);
}
While the algorithm is incredibly simple, I want to make sure I understand the complexity breakdown. My thoughts are below and I would like to know if they are correct, or wrong, and if they are wrong then knowing where I'm off track would be nice!
Claim:
n = logb(p) is the length of return string
Time complexity: O(n^2)
Space complexity: O(n)
Reasoning:
In order to keep the least significant bit at the end of a string when it is the value we calculate before anything else, we'd either have to:
Compute the string recursively as we are
Keep "shifting" the array every time we calculate a bit so we can add the most recent bit to the front of the string, not the end
Write the string backwards, and reverse it before we return (most efficient)
We're doing the first method in the above C++ algorithm, and the + operator creates a new string at each stack frame. The initial frame creates and returns a string of length n, the next frame creates a string of length n-1, n-2, n-3, and so on. Following this trend (without going into a proof of why 1 + 2 + 3 ... + n = O(n^2), it is clear the time complexity is O(n^2) = O(logb^2(p)). We'll also only need to be storing O(n) things in memory at any time. When the original stack frame resolves (just before algorithm completes) we'll have the memory in terms of a raw string, but before it resolves it will be in terms of a single character (O(1)) + recursive stack frames (O(n)). We do this at each level storing n amounts of single characters until we complete. Therefore the space complexity is O(n).
Of course the more efficient version of this solution would be
string convertIntToBaseIterative(int number, int base) {
string retString = "";
while (number) {
retString += to_string(number % base);
number /= base;
}
// Only needed if least significant
reverse(retString.begin(), retString.end());
return retString;
}
I believe this above solution , where n = logb(p) has:
Time complexity: O(n)
Space complexity: O(n)
Are these analysis correct or am I off somewhere?
Note:
Given the chat room conversation with #user1952500 I had a couple edits to make to his answer based on what we talked about. The following is an edited version of his answer reflecting the latest of what we talked about and what I learned:
Edited answer:
Since the return value has to contain the output, you cannot get a better space complexity than O(n).
Suppose the output string is composed of the following digits in order: a_1, a_2, a_3, ..., a_n. In the
recursive approach (bullet #1), we create a string as follows "a_1" + "a_2" + .... + "a_n", which with
recursion yields O(n^2) time complexity. In bullet #2, the iterative approach constantly pushes characters
to the front of the string like (((...(a_1) + a_2) + a_3) + ... + a_n)))...) which shifts the entire string
on every character addition also yielding O(n^2) time complexity. On your written iterative approach (bullet #3)
the time complexity can be optimized depending on the version of C++ (See notes below).
The string type is not very useful for operations that involve repeated concatenation. In older versions of C++ you
could achieve O(n) time complexity by preallocating a string of size n. In C++11 this
answer indicates that certain append operations can be optimized to be amortized O(1) for a single character. Assuming
this is true the written out iterative version will have O(1) time complexity without any extra work.
Note: To get O(n) time complexity with the recursive version of this algorithm, we could take advantage of the amortized O(1)
character appends and use a single string passed by reference. This would entail the recursive version's function signature
to be re-written as follows:
void convertToBaseRecursive(int number, int base, string& str)
Since the return value has to contain the output, you cannot get a better space complexity than O(n).
Suppose the output string is composed of the following digits in order: a_1, a_2, a_3, ..., a_n. In the recursive approach, we create a string as follows:"a_1" + "a_2" + .... + "a_n". In the iterative approach, we do: (((...(a_1) + a_2) + a_3) + ... + a_n)))...). Hence the time complexity in both cases should be the same at O(n^2) (in C++03. See note below for C++11).
As you see, both approaches have been heavily influenced by implementation details. The string type is not very useful for operations that involve repeated concatenation. If you have a preallocated array of size n, you could get the complexity down to O(n).
Note 1: There is some detail about the append operation. In C++03, the append operation had no specified complexity and could lead to Copy-On_write (if the string could not be extended in place and a relocation was needed). In C++11 CoW and rope-style implementations are disallowed and append should lead to amortized O(1) time per character. Hence in the C++11 case, we should be able to get O(n) time complexity for both implementations.
Note 2: To get a O(n) time complexity with user-defined string implementation (which contains the length), the string needs to be passed by reference in the function. This will lead to the function signature getting changed to:
void convertToBaseRecursive(int number, int base, MyString& str)
This implementation will let the string be shared and updated in-place provided the string uses an array that is pre-allocated.

Multiple linear operations impact on overall function worse case complexity?

This is perhaps somewhat of a simple question, but I'm going to ask anyway folks:
I've written the below function:
std::vector<int> V={1,2,3,4,5};
int myFunction()
{
for(int i=V.size();i--;){//Do Stuff};
for int e=V.size();i--;){//Do Stuff};
}
This needs to have time complexity worse case O(n) and space complexity of worse case O(1).
Does having two linear operations (for-loops) change the time complexity to something other than O(n)?
No. It does not. O(N) means something like aN+b + something weaker than linear N.
Even something like: T(N)= 5N+100 + Log(N) is considered as O(N).
By "something weaker than linear N", I mean any function R(N) that satisfies the equation:
lim R(N)/N = 0 ; N-->Inifinity //Use L'Hospital's Rule for solving these kind of limits
So O(N) can be written as:
O(N) = aN+b + R(N)
Side Note: Complexity does not equal to Performance. Although (N+N) is still O(N), this does not mean it is not slower than (N). Performance, in its most basic form, is about the number cycles you need to do something not about the theoretical complexity.
However, it should be related at least when N goes to very big number (almost Infinity).

Curious question : What algorithm does STL set_intersect implement?

I spent a considerable amount of time coding in Baeza-Yates' fast set intersection algorithm for one of my apps. While I did marginally out-do the STL set_intersect, the fact that I required the resultant set to be sorted removed any time I had gained from implementing my own algorithm after I sorted the output. Given that the STL set_intersect performs this well, can anyone point me to the algorithm that it actually implements? Or does it implement the same Baeza-Yates algorithm but only in a much more efficient manner?
Baeza-Yates: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.7899&rep=rep1&type=pdf
STL doesn't require any particular algorithm, it just sets constraints on the algorithmic complexity of certain operations. Since it's all template based, you can easily view the source to your particular implementation to see how it works.
At least in the implementations I've looked at, the implementation is fairly simplistic -- something on this general order:
template <class inIt, class outIt>
outIt set_intersection(inIt start1, inIt end1, inIt start2, inIt end2, outIt out) {
while (start1 != end1 && start2 != end2) {
if (*start1 < *start2)
++start1;
else if (*start2 < *start1)
++start2;
else { // equal elements.
*out++ = *start1;
++start1;
++start2;
}
}
return out;
}
Of course, I'm just typing this off the top of my head -- it probably won't even compile, and certainly isn't pedantically correct (e.g., should probably use a comparator function instead of using operator< directly, and should have another template parameter to allow start1/end1 to be a different type from start2/end2).
From an algorithmic viewpoint, however, I'd guess most real implementations are pretty much as above.
Interesting. So, the number of comparisons in your algorithm linearly scales with the number of elements in both sets. The Baeza-Yates algorithm goes something like this (note that it assumes both input sets are sorted) :
1) Find the median of set A (A is the smaller set here)
2) Search for the median of A in B.
If found, add to the result
else, the insertion rank of the median in B is known.
3) Split set A about its median into two parts, and set B about its insertion rank into two parts, and repeat the procedure recursively on both parts.
This step works because all elements less than the median in A would intersect only with those elements before the insertion rank of A's median in B.
Since you can use a binary search to locate A's median in B, clearly, the number of comparisons in the this algorithm is lower than the one you mentioned. In fact, in the "best" case, the number of comparisons is O(log(m) * log(n)), where m and n are the sizes of the sets, and in the worst case, the number of comparisons is O(m + n). How on earth did I mess up the implementation this bad? :(