Recursive base conversion time complexity analysis - c++

Given an integer p and a destination base b, return a string representation of p in base b. The string should have the least significant bit at the end
^ This is the problem I'm giving myself.
The naive recursive algorithm (in C++) I came up with is as follows:
string convertIntToBaseRecursive(int number, int base) {
// Base case
if (!number) return "";
// Adding least significant digit to "the rest" computed recursively
// Could reverse these operations if we wanted the string backwards.
return convertToBaseRecursive(number / base, base) + to_string(number % base);
}
While the algorithm is incredibly simple, I want to make sure I understand the complexity breakdown. My thoughts are below and I would like to know if they are correct, or wrong, and if they are wrong then knowing where I'm off track would be nice!
Claim:
n = logb(p) is the length of return string
Time complexity: O(n^2)
Space complexity: O(n)
Reasoning:
In order to keep the least significant bit at the end of a string when it is the value we calculate before anything else, we'd either have to:
Compute the string recursively as we are
Keep "shifting" the array every time we calculate a bit so we can add the most recent bit to the front of the string, not the end
Write the string backwards, and reverse it before we return (most efficient)
We're doing the first method in the above C++ algorithm, and the + operator creates a new string at each stack frame. The initial frame creates and returns a string of length n, the next frame creates a string of length n-1, n-2, n-3, and so on. Following this trend (without going into a proof of why 1 + 2 + 3 ... + n = O(n^2), it is clear the time complexity is O(n^2) = O(logb^2(p)). We'll also only need to be storing O(n) things in memory at any time. When the original stack frame resolves (just before algorithm completes) we'll have the memory in terms of a raw string, but before it resolves it will be in terms of a single character (O(1)) + recursive stack frames (O(n)). We do this at each level storing n amounts of single characters until we complete. Therefore the space complexity is O(n).
Of course the more efficient version of this solution would be
string convertIntToBaseIterative(int number, int base) {
string retString = "";
while (number) {
retString += to_string(number % base);
number /= base;
}
// Only needed if least significant
reverse(retString.begin(), retString.end());
return retString;
}
I believe this above solution , where n = logb(p) has:
Time complexity: O(n)
Space complexity: O(n)
Are these analysis correct or am I off somewhere?

Note:
Given the chat room conversation with #user1952500 I had a couple edits to make to his answer based on what we talked about. The following is an edited version of his answer reflecting the latest of what we talked about and what I learned:
Edited answer:
Since the return value has to contain the output, you cannot get a better space complexity than O(n).
Suppose the output string is composed of the following digits in order: a_1, a_2, a_3, ..., a_n. In the
recursive approach (bullet #1), we create a string as follows "a_1" + "a_2" + .... + "a_n", which with
recursion yields O(n^2) time complexity. In bullet #2, the iterative approach constantly pushes characters
to the front of the string like (((...(a_1) + a_2) + a_3) + ... + a_n)))...) which shifts the entire string
on every character addition also yielding O(n^2) time complexity. On your written iterative approach (bullet #3)
the time complexity can be optimized depending on the version of C++ (See notes below).
The string type is not very useful for operations that involve repeated concatenation. In older versions of C++ you
could achieve O(n) time complexity by preallocating a string of size n. In C++11 this
answer indicates that certain append operations can be optimized to be amortized O(1) for a single character. Assuming
this is true the written out iterative version will have O(1) time complexity without any extra work.
Note: To get O(n) time complexity with the recursive version of this algorithm, we could take advantage of the amortized O(1)
character appends and use a single string passed by reference. This would entail the recursive version's function signature
to be re-written as follows:
void convertToBaseRecursive(int number, int base, string& str)

Since the return value has to contain the output, you cannot get a better space complexity than O(n).
Suppose the output string is composed of the following digits in order: a_1, a_2, a_3, ..., a_n. In the recursive approach, we create a string as follows:"a_1" + "a_2" + .... + "a_n". In the iterative approach, we do: (((...(a_1) + a_2) + a_3) + ... + a_n)))...). Hence the time complexity in both cases should be the same at O(n^2) (in C++03. See note below for C++11).
As you see, both approaches have been heavily influenced by implementation details. The string type is not very useful for operations that involve repeated concatenation. If you have a preallocated array of size n, you could get the complexity down to O(n).
Note 1: There is some detail about the append operation. In C++03, the append operation had no specified complexity and could lead to Copy-On_write (if the string could not be extended in place and a relocation was needed). In C++11 CoW and rope-style implementations are disallowed and append should lead to amortized O(1) time per character. Hence in the C++11 case, we should be able to get O(n) time complexity for both implementations.
Note 2: To get a O(n) time complexity with user-defined string implementation (which contains the length), the string needs to be passed by reference in the function. This will lead to the function signature getting changed to:
void convertToBaseRecursive(int number, int base, MyString& str)
This implementation will let the string be shared and updated in-place provided the string uses an array that is pre-allocated.

Related

How do I calculate the time complexity of the following function?

Here is a recursive function. Which traverses a map of strings(multimap<string, string> graph). Checks the itr -> second (s_tmp) if the s_tmp is equal to the desired string(Exp), prints it (itr -> first) and the function is executed for that itr -> first again.
string findOriginalExp(string Exp){
cout<<"*****findOriginalExp Function*****"<<endl;
string str;
if(graph.empty()){
str ="map is empty";
}else{
for(auto itr=graph.begin();itr!=graph.end();itr++){
string s_tmp = itr->second;
string f_tmp = itr->first;
string nll = "null";
//s_tmp.compare(Exp) == 0
if(s_tmp == Exp){
if(f_tmp.compare(nll) == 0){
cout<< Exp <<" :is original experience.";
return Exp;
}else{
return findOriginalExp(itr->first);
}
}else{
str="No element is equal to Exp.";
}
}
}
return str;
}
There are no rules for stopping and it seems to be completely random. How is the time complexity of this function calculated?
I am not going to analyse your function but instead try to answer in a more general way. It seems like you are looking for an simple expression such as O(n) or O(n^2) for the complexity for your function. However, not always complexity is that simple to estimate.
In your case it strongly depends on what are the contents of graph and what the user passes as parameter.
As an analogy consider this function:
int foo(int x){
if (x == 0) return x;
if (x == 42) return foo(42);
if (x > 0) return foo(x-1);
return foo(x/2);
}
In the worst case it never returns to the caller. If we ignore x >= 42 then worst case complexity is O(n). This alone isn't that useful as information for the user. What I really need to know as user is:
Don't ever call it with x >= 42.
O(1) if x==0
O(x) if x>0
O(ln(x)) if x < 0
Now try to make similar considerations for your function. The easy case is when Exp is not in graph, in that case there is no recursion. I am almost sure that for the "right" input your function can be made to never return. Find out what cases those are and document them. In between you have cases that return after a finite number of steps. If you have no clue at all how to get your hands on them analytically you can always setup a benchmark and measure. Measuring the runtime for input sizes 10,50, 100,1000.. should be sufficient to distinguish between linear, quadratic and logarithmic dependence.
PS: Just a tip: Don't forget what the code is actually supposed to do and what time complexity is needed to solve that problem (often it is easier to discuss that in an abstract way rather than diving too deep into code). In the silly example above the whole function can be replaced by its equivalent int foo(int){ return 0; } which obviously has constant complexity and does not need to be any more complex than that.
This function takes a directed graph and a vertex in that graph and chases edges going into it backwards to find a vertex with no edge pointing into it. The operation of finding the vertex "behind" any given vertex takes O(n) string comparisons in n the number of k/v pairs in the graph (this is the for loop). It does this m times, where m is the length of the path it must follow (which it does through the recursion). Therefore, it has time complexity O(m * n) string comparisons in n the number of k/v pairs and m the length of the path.
Note that there's generally no such thing as "the" time complexity for just some function you see written in code. You have to define what variables you want to describe the time in terms of, and also the operations with which you want to measure the time. E.g. if we want to write this purely in terms of n the number of k/v pairs, you run into a problem, because if the graph contains a suitably placed cycle, the function doesn't terminate! If you further constrain the graph to be acyclic, then the maximum length of any path is constrained by m < n, and then you can also get that this function does O(n^2) string comparisons for an acyclic graph with n edges.
You should approximate the control flow of the recursive calling by using a recurrence relation. It's been like 30 years since I took college classes in Discrete Math, but generally you do like pseuocode, just enough to see how many calls there are. In some cases just counting how many are on the longest condition on the right hand side is useful, but you generally need to plug one expansion back in and from that derive a polynomial or power relationship.

Problem figuring out time and space complexity?

The question was to check whether two strings are rotation of each other or not. So, here is the function I wrote for the same:
bool areRotations(string s1, string s2)
{
int n1 = s1.length(), n2 = s2.length();
if (n1 != n2) return 0;
s1 += s1;
if (s1.find(s2) != string::npos)
return 1;
else
return 0;
}
I just checked whether s2 is present in s1+s1, if it is there, then s1 and s2 must be rotation of each other.
I am not able to figure out the time and space complexity of my code. What I can understand is that it should be O(n) time complexity because first to concatenate s1 to s1, we have to create a copy of s1, and also to find s2 in s1, we have to traverse, hence making time complexity O(n).
For space also, it should be O(n), because we are making a copy of s1. Is this correct?
I am not able to figure out the time and space complexity of my code. [...] Is this correct?
std::string::length runs in constant time (since C++11). The comparison and the concatenation run in linear time. But the overall algorithm could run in a non-linear time.
Indeed, the C++ standard does not actually require any specific algorithm or guarantee a complexity for std::string::find. Consequently, it is not possible to give an answers independent of the STL implementation you use.
If the implementation is naive or use a famous Boyer-Moore algorithm, the worst-case time-complexity is likely to be O(n^2) in your case (where n is the size of the input string). This could happen with inputs like s1="aaaaaaaaaca" and s2="aaaaaaaaaac". Despite std::search provide stronger guarantees, it does not provide any search algorithm running in linear-time. To ensure a linear-time complexity, you can use the KMP search algorithm (or better variants like the 2-way string-matching algorithm).
Thus, with the KMP algorithm, the complexity in time and space of your solution would be O(n). This is optimal as the input strings need to be read and stored somewhere (at least in your implementation).

Can Pass by Value Affect the Asymptotic Time Complexity of A Recursive Algorithm?

In the following program, a helper function is called recursively in order to create a Binary Tree from the preorder and postorder traversals which are represented by an array. The runtime is fast and beats a 100% of all submissions on Leetcode.
TreeNode* buildTree(vector<int>& preorder, vector<int>& inorder) {
unordered_map<int,int> m;
for(int i=0; i<inorder.size();i++){
m[inorder[i]]=i;
}
return helper(preorder, inorder, 0,preorder.size()-1,0, inorder.size()-1, m);
}
TreeNode* helper(vector<int>& preorder, vector<int>& inorder, int pStart, int pEnd, int inStart, int inEnd,unordered_map<int,int>& m){
if(pStart>pEnd || inStart>inEnd) return NULL;
TreeNode* root= new TreeNode(preorder[pStart]);
int pivLoc=m[root->val];
int numsLeft=pivLoc-inStart;
root->left=helper(preorder, inorder, pStart+1, pStart+numsLeft,inStart, pivLoc-1,m);
root->right=helper(preorder, inorder, pStart+numsLeft+1, pEnd,pivLoc+1, inEnd,m);
return root;
}
However, if I change the helper function such that the last parameter (the unordered_map) is passed by value, it gets a runtime exceeded error. I am trying to understand why. The map's itself is never reassigned, nor are its values. Since the map is being passed by value, that would mean the copy constructor is called each time the function is called. Is that going to increase the functions runtime by a constant factor or will it actually change the asymptotic complexity? I believe the copy constructor is causing a large increase but only by a constant factor, since a copy is a constant time operation in relation to the input.
Yes.
If the size (or number of elements) of a parameter that gets copied is a function of N (and not a constant,) then it will have an effect on the asymptotic time of your implementation (even if it's not recursive.) For example, if you copy an array of size O(N) even only once, then you should consider that in your asymptotic analysis (it might not have an effect if your order is already O(N) or higher, but you have to count it in nonetheless.)
In a recursive implementation, obviously you'll have something like O(f(N)) function calls (O(log(N)) for searches, O(N) for sorts, etc.) and the cost of copy will affect or even dominate your time. Obviously, the cost of passing a parameter of size M to a function that is called N times is O(N * M). If the size changes with each invocation, you can still calculate the sum (using standard techniques.)
Even if the size of the parameter in question is constant and small (but not negligible,) if the function is called O(f(N)) times, then you have to add f(N) to your asymptotic time analysis.
The cost of copy itself depends on many things, but for a container of N elements (unless it has some reference-counting/COW optimization or the like) I daresay that the cost of copy is O(N). For containers that keep their elements in one (or a few) contiguous block(s) of memory, the constant factor on the copy operation will mostly depend on the cost of copy for individual elements, as the overhead from the container and memory management is small. For linked-list style containers (which include std::map and std::set) unless you have custom memory allocators and very specific strategies, the cost of memory allocation and traversal will be significant (depends very much on the total number of elements and heap pressure and your OS/standard library implementation, etc.)
Depending on the type of your elements in the containers, in addition to the cost of copy, you might have to consider the cost of destruction as well.
Update: After seeing more of your code (still not a working example though, but probably enough) I can give you a more detailed analysis. (Assuming that the size of your input is N,)
The function buildTree has two main parts: the loop and the recursive call to helper. The "loop" part has a complexity of O(N * log(N)) (the loop repeats N times, and each time inserting into a std::map which is logarithmic in the size of the map, hence O(N * log(N)).
To work out the cost of calling helper, we need to know how many times it is called, and how expensive its body is, and how much its input shrinks in each recursive call. Obviously, the helper function is called 2 * N + 1 times in total (twice per input element, and once in buildTree) which is obviously O(N), and its input never changes size (it does, but no part of its body is dependent on the input size except the termination condition.)
Anyway, the interesting operations inside helper's body are the new (usually considered O(1) which is a little simplistic but acceptable here,) the lookup in the std::map (which is O(log(N)),) and the calls to helper. The cost of those calls are O(N) if we copy any of the vector or map parameters (again, assuming memory allocation and copying of each element are O(1),) and O(1) if we don't.
So, the total time is time of the loop plus time of the call to helper, and time of the call is the number of calls times the time per call. The time of the loop is O(N * log(N)) and the number of calls is O(N).
The time for each invocation of helper is the time of allocating a new node (O(1)) plus looking up the value in our map (O(log(N))) plus twice the time of invoking helper again.
If we pass the parameters by value (i.e. any of inorder, preorder, or m is passed by value) then the time of each invocation of helper will be O(N), and if we pass all parameters by reference, then that time will be O(1). So, putting it all together, if we pass our large parameters by value, we get:
O(N * log(N)) + O(N) * [O(1) + O(log(N)) + O(N)]
= O(N * log(N)) + O(N) * O(N)
= O(N * log(N)) + O(N^2)
= O(N^2)
and if we only pass by reference we'll have:
O(N * log(N)) + O(N) * [O(1) + O(log(N)) + O(1)]
= O(N * log(N)) + O(N) * O(log(N))
= O(N * log(N)) + O(N * log(N))
= O(N * log(N))
and that's it.
(As a side note, if the parameter to a function is not going to be changed, and is only passed by reference to avoid a copy, then it is passed as a constant reference or const &.)

Big O Notation for string matching algo

What would the big O notation of the function foo be?
int foo(char *s1, char *s2)
{
int c=0, s, p, found;
for (s=0; s1[s] != '\0'; s++)
{
for (p=0, found=0; s2[p] != '\0'; p++)
{
if (s2[p] == s1[s])
{
found = 1;
break;
}
}
if (!found) c++;
}
return c;
}
What is the efficiency of the function foo?
a) O(n!)
b) O(n^2)
c) O(n lg(base2) n )
d) O(n)
I would have said O(MN)...?
It is O(n²) where n = max(length(s1),length(s2)) (which can be determined in less than quadratic time - see below). Let's take a look at a textbook definition:
f(n) ∈ O(g(n)) if a positive real number c and positive integer N exist such that f(n) <= c g(n) for all n >= N
By this definition we see that n represents a number - in this case that number is the length of the string passed in. However, there is an apparent discrepancy, since this definition provides only for a single variable function f(n) and here we clearly pass in 2 strings with independent lengths. So we search for a multivariable definition for Big O. However, as demonstrated by Howell in "On Asymptotic Notation with Multiple Variables":
"it is impossible to define big-O notation for multi-variable functions in a way that implies all of these [commonly-assumed] properties."
There is actually a formal definition for Big O with multiple variables however this requires extra constraints beyond single variable Big O be met, and is beyond the scope of most (if not all) algorithms courses. For typical algorithm analysis we can effectively reduce our function to a single variable by bounding all variables to a limiting variable n. In this case the variables (specifically, length(s1) and length(s2)) are clearly independent, but it is possible to bound them:
Method 1
Let x1 = length(s1)
Let x2 = length(s2)
The worst case scenario for this function occurs when there are no matches, therefore we perform x1 * x2 iterations.
Because multiplication is commutative, the worst case scenario foo(s1,s2) == the worst case scenario of foo(s2,s1). We can therefore assume, without loss of generality, that x1 >= x2. (This is because, if x1 < x2 we could get the same result by passing the arguments in the reverse order).
Method 2 (in case you don't like the first method)
For the worst case scenario (in which s1 and s2 contain no common characters), we can determine length(s1) and length(s2) prior to iterating through the loops (in .NET and Java, determining the length of a string is O(1) - but in this case it is O(n)), assigning the greater to x1 and the lesser to x2. Here it is clear that x1 >= x2.
For this scenario, we will see that the extra calculations to determine x1 and x2 make this O(n² + 2n) We use the following simplification rule which can be found here to simplify to O(n²):
If f(x) is a sum of several terms, the one with the largest growth rate is kept, and all others omitted.
Conclusion
for n = x1 (our limiting variable), such that x1 >= x2, the worst case scenario is x1 = x2.
Therefore: f(x1) ∈ O(n²)
Extra Hint
For all homework problems posted to SO related to Big O notation, if the answer is not one of:
O(1)
O(log log n)
O(log n)
O(n^c), 0<c<1
O(n)
O(n log n) = O(log n!)
O(n^2)
O(n^c)
O(c^n)
O(n!)
Then the question is probably better off being posted to https://math.stackexchange.com/
In big-O notation, we always have to define what the occuring variables mean. O(n) doesn't mean anything unless we define what n is. Often, we can omit this information because it is clear from context. For example if we say that some sorting algorithm is O(n log(n)), n always denotes the number of items to sort, so we don't have to always state this.
Another important thing about big-O notation is that it only gives an upper limit -- every algorithm in O(n) is also in O(n^2). The notation is often used as meaning "the algorithm has the exact asymptotic complexity given by the expression (up to a constant factor)", but it's actual definition is "the complexity of the alogrithm is bounded by the given expression (up to a constant factor)".
In the example you gave, you took m and n to be the respective lengths of the two strings. With this definition, the algorithm is indeed O(m n). If we define n to be the length of the longer of the two strings though, we can also write this as O(n^2) -- this is also an upper limit for the complexity of the algorithm. And with the same definition of n, the algorithm is also O(n!), but not O(n) or O(n log(n)).
O(n^2)
The relevant part of the function, in terms of complexity, is the nested loops. The maximum number of iterations is the length of s1 times the length of s2, both of which are linear factors, so the worst-case computing time is O(n^2), i.e. the square of a linear factor. As Ethan said, O(mn) and O(n^2) are effectively the same thing.
Think of it this way:
There are two inputs. If the function simply returned, then it's performance is unrelated to the arguments. This would be O(1).
If the function looped over one string, then the performance is linearly related to the length of that string. Therefore O(N).
But the function has a loop within a loop. The performance is related to the length of s1 and the length of S2. Multiply those lengths together and you get the number of loop iterations. It's not linear any more, it follows a curve. This is O(N^2).

How does one remove duplicate elements in place in an array in O(n) in C or C++?

Is there any method to remove the duplicate elements in an array in place in C/C++ in O(n)?
Suppose elements are a[5]={1,2,2,3,4}
then resulting array should contain {1,2,3,4}
The solution can be achieved using two for loops but that would be O(n^2) I believe.
If, and only if, the source array is sorted, this can be done in linear time:
std::unique(a, a + 5); //Returns a pointer to the new logical end of a.
Otherwise you'll have to sort first, which is (99.999% of the time) n lg n.
Best case is O(n log n). Perform a heap sort on the original array: O(n log n) in time, O(1)/in-place in space. Then run through the array sequentially with 2 indices (source & dest) to collapse out repetitions. This has the side effect of not preserving the original order, but since "remove duplicates" doesn't specify which duplicates to remove (first? second? last?), I'm hoping that you don't care that the order is lost.
If you do want to preserve the original order, there's no way to do things in-place. But it's trivial if you make an array of pointers to elements in the original array, do all your work on the pointers, and use them to collapse the original array at the end.
Anyone claiming it can be done in O(n) time and in-place is simply wrong, modulo some arguments about what O(n) and in-place mean. One obvious pseudo-solution, if your elements are 32-bit integers, is to use a 4-gigabit bit-array (512 megabytes in size) initialized to all zeros, flipping a bit on when you see that number and skipping over it if the bit was already on. Of course then you're taking advantage of the fact that n is bounded by a constant, so technically everything is O(1) but with a horrible constant factor. However, I do mention this approach since, if n is bounded by a small constant - for instance if you have 16-bit integers - it's a very practical solution.
Yes. Because access (insertion or lookup) on a hashtable is O(1), you can remove duplicates in O(N).
Pseudocode:
hashtable h = {}
numdups = 0
for (i = 0; i < input.length; i++) {
if (!h.contains(input[i])) {
input[i-numdups] = input[i]
h.add(input[i])
} else {
numdups = numdups + 1
}
This is O(N).
Some commenters have pointed out that whether a hashtable is O(1) depends on a number of things. But in the real world, with a good hash, you can expect constant-time performance. And it is possible to engineer a hash that is O(1) to satisfy the theoreticians.
I'm going to suggest a variation on Borealids answer, but I'll point out up front that it's cheating. Basically, it only works assuming some severe constraints on the values in the array - e.g. that all keys are 32-bit integers.
Instead of a hash table, the idea is to use a bitvector. This is an O(1) memory requirement which should in theory keep Rahul happy (but won't). With the 32-bit integers, the bitvector will require 512MB (ie 2**32 bits) - assuming 8-bit bytes, as some pedant may point out.
As Borealid should point out, this is a hashtable - just using a trivial hash function. This does guarantee that there won't be any collisions. The only way there could be a collision is by having the same value in the input array twice - but since the whole point is to ignore the second and later occurences, this doesn't matter.
Pseudocode for completeness...
src = dest = input.begin ();
while (src != input.end ())
{
if (!bitvector [*src])
{
bitvector [*src] = true;
*dest = *src; dest++;
}
src++;
}
// at this point, dest gives the new end of the array
Just to be really silly (but theoretically correct), I'll also point out that the space requirement is still O(1) even if the array holds 64-bit integers. The constant term is a bit big, I agree, and you may have issues with 64-bit CPUs that can't actually use the full 64 bits of an address, but...
Take your example. If the array elements are bounded integer, you can create a lookup bitarray.
If you find an integer such as 3, turn the 3rd bit on.
If you find an integer such as 5, turn the 5th bit on.
If the array contains elements rather than integer, or the element is not bounded, using a hashtable would be a good choice, since hashtable lookup cost is a constant.
The canonical implementation of the unique() algorithm looks like something similar to the following:
template<typename Fwd>
Fwd unique(Fwd first, Fwd last)
{
if( first == last ) return first;
Fwd result = first;
while( ++first != last ) {
if( !(*result == *first) )
*(++result) = *first;
}
return ++result;
}
This algorithm takes a range of sorted elements. If the range is not sorted, sort it before invoking the algorithm. The algorithm will run in-place, and return an iterator pointing to one-past-the-last-element of the unique'd sequence.
If you can't sort the elements then you've cornered yourself and you have no other choice but to use for the task an algorithm with runtime performance worse than O(n).
This algorithm runs in O(n) runtime. That's big-oh of n, worst case in all cases, not amortized time. It uses O(1) space.
The example you have given is a sorted array. It is possible only in that case (given your constant space constraint)