What is the time complexity of the following function? - c++

int func(int n){
if(n==1)
return 0;
else
return sqrt(n);
}
Where sqrt(n) is a C math.h library function.
O(1)
O(lg n)
O(lg lg n)
O(n)
I think that the running time entirely depends on the sqrt(n). However, I don't know how this function is actually implemented.
P.S. The general approach towards finding the square root of a number that I know of is using Newton's method. If I am not wrong, the time complexity using Newton's method turns out to be O(lg n). So should the answer be O(lg n)?
P.P.S. Got this question in a recent test that I appeared for.

I am going to give a bit more general case answer, without assuming constant size of int.
The answer is Theta(logn).
We know newton-raphson is Theta(logn) - that excludes Theta(n) (assuming sqrt() is as efficient as we can).
However, a general number n requries log_2(n) bits to encode - and you require to read all of it in order to get an accurate sqrt() function. This excludes Theta(1) and Theta(log(log(n)).
From the above, we know that the complexity of the function is Theta(log(n)).
As a side note, since O(log(n)) is a subset of O(n) - it is also a valid answer, though not tight one. For more information about big Theta and big O and their differences, you might want to have a look on this thread.

This depends on the implementation of sqrt and also on what kind of time complexity you are interested.
I would say you can consider it to be "constant", so O(1), in that sense: If you put in a random int, it will in average take the same amount of time. (Reason: numbers with many digits are much more common).
But have a look here. Another possible answer is O(M(n)), where M(n) is the complexity of a multiplication and n is the number of digits in your integer.
This looking like a text-book question and a is perhaps meant to be a trap. The teacher perhaps wants to check if you can distinguish between computing sqrt for a list of numbers (which would be O(n)), and a single number (which would be O(1)).
Be aware that the "correct" answer often also depends on the context in which it is asked.

Let n=2^m
Given T(n)=T(sqrt(n))+1
T(2^m)=T(2^m-1)+1
Let T(2^m)=S(m)
then,
S(m)=2S(m/2)+1
using master theorem-
S(m)=theta(m)
=theta(log(n))
Hence, time complexity is theta(log(n)).

Related

Does std::string::find() use KMP or a suffix tree? [duplicate]

Why the c++'s implemented string::find() doesn't use the KMP algorithm (and doesn't run in O(N + M)) and runs in O(N * M)? Is that corrected in C++0x?
If the complexity of current find is not O(N * M), what is that?
so what algorithm is implemented in gcc? is that KMP? if not, why?
I've tested that and the running time shows that it runs in O(N * M)
Why the c++'s implemented string::substr() doesn't use the KMP algorithm (and doesn't run in O(N + M)) and runs in O(N * M)?
I assume you mean find(), rather than substr() which doesn't need to search and should run in linear time (and only because it has to copy the result into a new string).
The C++ standard doesn't specify implementation details, and only specifies complexity requirements in some cases. The only complexity requirements on std::string operations are that size(), max_size(), operator[], swap(), c_str() and data() are all constant time. The complexity of anything else depends on the choices made by whoever implemented the library you're using.
The most likely reason for choosing a simple search over something like KMP is to avoid needing extra storage. Unless the string to be found is very long, and the string to search contains a lot of partial matches, the time taken to allocate and free that would likely be much more than the cost of the extra complexity.
Is that corrected in c++0x?
No, C++11 doesn't add any complexity requirements to std::string, and certainly doesn't add any mandatory implementation details.
If the complexity of current substr is not O(N * M), what is that?
That's the worst-case complexity, when the string to search contains a lot of long partial matches. If the characters have a reasonably uniform distribution, then the average complexity would be closer to O(N). So by choosing an algorithm with better worst-case complexity, you may well make more typical cases much slower.
FYI,
The string::find in both gcc/libstdc++ and llvm/libcxx were very slow. I improved both of them quite significantly (by ~20x in some cases). You might want to check the new implementation:
GCC: PR66414 optimize std::string::find
https://github.com/gcc-mirror/gcc/commit/fc7ebc4b8d9ad7e2891b7f72152e8a2b7543cd65
LLVM: https://reviews.llvm.org/D27068
The new algorithm is simpler and uses hand optimized assembly functions of memchr, and memcmp.
Where do you get the impression from that std::string::substr() doesn't use a linear algorithm? In fact, I can't even imagine how to implement in a way which has the complexity you quoted. Also, there isn't much of an algorithm involved: is it possible that you are think this function does something else than it does? std::string::substr() just creates a new string starting at its first argument and using either the number of characters specified by the second parameter or the characters up to the end of the string.
You may be referring to std::string::find() which doesn't have any complexity requirements or std::search() which is indeed allowed to do O(n * m) comparisons. However, this is a giving implementers the freedom to choose between an algorithm which has the best theoretical complexity vs. one which doesn't doesn't need additional memory. Since allocation of arbitrary amounts of memory is generally undesirable unless specifically requested, this seems a reasonable thing to do.
The C++ standard does not dictate the performance characteristics of substr (or many other parts, including the find you're most likely referring to with an M*N complexity).
It mostly dictates functional aspects of the language (with some exceptions like the non-legacy sort functions, for example).
Implementations are even free to implement qsort as a bubble sort (but only if they want to be ridiculed and possibly go out of business).
For example, there are only seven (very small) sub-points in section 21.4.7.2 basic_string::find of C++11, and none of them specify performance parameters.
Let's look into the CLRS book. On the page 989 of third edition we have the following exercise:
Suppose that pattern P and text T are randomly chosen strings of
length m and n, respectively, from the d-ary alphabet
Ʃd = {0; 1; ...; d}, where d >= 2. Show that the
expected number of character-to-character comparisons made by the
implicit loop in line 4 of the naive algorithm is
over all executions of this loop. (Assume that the naive algorithm
stops comparing characters for a given shift once it finds a mismatch
or matches the entire pattern.) Thus, for randomly chosen strings, the
naive algorithm is quite efficient.
NAIVE-STRING-MATCHER(T,P)
1 n = T:length
2 m = P:length
3 for s = 0 to n - m
4 if P[1..m] == T[s+1..s+m]
5 print “Pattern occurs with shift” s
Proof:
For a single shift we are expected to perform 1 + 1/d + ... + 1/d^{m-1} comparisons. Now use summation formula and multiply by number of valid shifts, which is n - m + 1. □
Where do you get your information about the C++ library? If you do mean string::search and it really doesn't use the KMP algorithm then I suggest that it is because that algorithm isn't generally faster than a simple linear search due to having to build a partial match table before the search can proceed.
If you are going to be searching for the same pattern in multiple texts. The BoyerMoore algorithm is a good choice because the pattern tables need only be computed once , but are used multiple times when searching multiple texts. If you are only going to search for a pattern once in 1 text though, the overhead of computing the tables along with the overhead of allocating memory slows you down too much and std::string.find(....) will beat you since it does not allocate any memory and has no overhead.
Boost has multiple string searching algorithms. I found that BM was an order of magnitude slower in the case of a single pattern search in 1 text than std::string.find().
For my cases BoyerMoore was rarely faster than std::string.find() even when searching multiple texts with the same pattern.
Here is the link to BoyerMoore
BoyerMoore

What is complexity of this pointer function?

This is code, function receiving a pointer *ptr(that is pointing to character array outside).Loop is simply calculating length. What will be the complexity function of this function. I have calculated.
Is this correct complexity function calculation?
inside while, should i consider 3 operations, *, ptr+c , !=
Astaric * is dereferencing, ptr+c is calculating address, and != for condition.
When we say O(N), we really mean O(N)+c, where c is a constant. This is because with very small n, the characteristics of the complexity may not show themselves, as nonessentials (noise) may dominate more, so this is represented by c. But as N grows, the constant becomes insignificant, as the complexity as relative to N becomes the dominant time. Typically, a constant merely shifts the graph up or down by a small amount, but doesn't change its shape. Thus, we omit it, and just say O(N), which is what we're really interested in knowing.
As for coefficients, same thing happens. If a complexity is proportional to N, it is a linear relationship. The graph of a linear operation, multiplied by X, will have a different slope but its shape is still linear. Thus, the coefficient is not really providing additional information in terms of complexity category.
Similarly, if you have something that is O(N)+O(N^2), the N^2 dominates, and we tend to ignore the smaller terms O(N), and just call it O(N^2) algorithm. It's not an exact science, it's a categorization to understand the nature of the complexity. Big-oh notation is the worst case complexity.
I think you might be interested in the coefficients and constants only when evaluating the relative metrics of specific algorithms that have the same categorical complexity. (Some O(N*lg(n)) sorts are faster than others, for example.) But usually it is measured a different way.
It depends strongly on the definition of "complexity" that you're using (certainly this is not one of the typical asymptotic notations) but, if I properly remember these assignments from this level of education, yes you have counted correctly.
The resulting worst-case asymptotic algorithmic complexity in big-Oh notation would be O(N) (derived by eliding the co-efficients and lower terms (sort of)).

Is it possible for 1 to be O(n)

Im learning O-notation, and I thought that 1 is O(1) because since 1 is considered a constant its Big-O would be 1. However, I'm reading that it can be O(n) as well. How is this possible? Would it be because is n = 1 then it would be the same?
Yes a function that is O(1) is also O(n) -- and O(n2), and O(en), and so on. (And mathematically, you can think of 1 as a function that always has the same value.)
If you look at the formal definition of Big-O notation, you'll see (roughly stated) that a function f(x) is O(g(x)) if g(x) exceeds f(x) by at least some constant factor as x goes to infinity. Quoting the linked article:
Let f and g be two functions defined on some subset of the real
numbers. One writes
*f(x)=O(g(x)) as x --> ∞
if and only if there is a positive
constant M such that for all sufficiently large values of x, the
absolute value of f(x) is at most M multiplied by the absolute value
of g(x). That is, f(x) = O(g(x)) if and only if there exists a
positive real number M and a real number x0 such that
*|f(x)| ≤ M |g(x)| for all x ≥ *x0.
However, we rarely say that an O(1) function or algorithm is O(n), since saying it's O(n) is misleading and doesn't convey as much information. For example, we say that the Quicksort algorithm is O(n log n), and Bubblesort is O(n2). It's strictly true that Quicksort is also O(n2), but there's no point in saying so -- largely because a lot of people aren't familiar with the exact mathematical meaning of Big-O notation.
There's another notation called Big Theta (Θ) which applies tighter bounds.
Actually, Big O Notation shows how the program complexity (it may be time, memory etc.) depends on the problem size.
O(1) means that the program complexity is independent of problem size. e.g. Accessing an array element. No matter which index you select, the time of accessing will be independent of the index.
O(n) means that the program complexity linearly depends on problem size. e.g. If you are linear searching an element in the array, you need to traverse most of the elements of the array. In the worst case, if element is not present in the array, you will be traversing the complete array.
If we increase the size of the array, the complexity say, time complexity will be different i.e. it will take more time to execute if we are traversing 100 elements than the time taken if we are traversing only 10 elements.
I hope this helps you a bit.
The Big O Notation is a mathematical way to explain the behaviour of a function near to a point or to infinity. This is a tool that is used in computer science to analyse the complexity of an algorithm. The complexity of an algorithm helps you analyse if your algorithm suits your situation and process your logic in a reasonable time.
But that does not answer your question. The question that if n is equal to 1 doesn't make sense in Big O notation. Like the name said, it's a notation and not a way to calculate in mathematics. Big O notation means to evaluate the behaviour of this algorithm near to infinite to tell which part of the algorithm is the most significant. For example, if an algorithm has a behaviour that can be represented by the function 2x^2 + 3x the Big O notation says to take each part of this function and evaluate it near to infinite and take the function that is the most significant. So, by evaluating 2x^2 and 3x we will see that 2x^2 will be a bigger infinite that 3x. And the difference between x^2 and 3x are infinite too. So, if we eliminate the coefficients (that are not the variables part of function) we will have two complexities: O(x^2) and O(x), so O(n^2) and O(n). But we know that the most significant is the O(n^2).
It's the same thing if inside a part of code, you have two complexities O(1) and O(n) the O(n) will be your algorithm complexity.
But if a O(n) complexity process only one element the behaviour will be equivalent to O(1). But it doesn’t means that your algorithm has O(1) complexity.

Introsort (quicksort + heapsort) implementation and complexity

I've read that C++ uses introsort (introspective sort) for its built-in std::sort where it starts off with quicksort and switches to heapsort when you hit the depth limit.
I've also read that the depth limit is supposed to be 2*log(2,N).
Is this value purely experimental? Or is there some mathematical theory behind it?
If you have an interval (range or array), the number of times you'll have to split the interval in half before you end up with an empty (or one element) interval is log(2,N), that's just a mathematical fact, you can work it out easily, if you want. If all goes perfectly well with quicksort, it should recurse log(2,N) times, for the same reason (and at each recursion level, it has to process all values of the interval, which leads to a O(N*log(2,N)) complexity for the overall algorithm). The problem is that quicksort could require many more recursions (if it keeps getting "unlucky" with picking pivot values, which means that it doesn't split the interval in half, but in an imbalanced way instead). At worse, quicksort could end up recursing N times, which is definitely not acceptable for a production-quality implementation.
Switching to heap-sort at 2*log(2,N) is just a good heuristic in general, to detect a much too deep number of recursions.
Technically, you could base this on the empirical performance of heap-sort versus quick-sort, to figure out what limit is the best. But such tests are highly dependent on the application (what are you sorting? how are you comparing elements? how cheap are the element swaps? etc..). So, most one-size-fits-all implementation, like std::sort, would just pick a reasonable limit like 2*log(2,N).
What #Mikael Persson said regarding why the depth limit is 2*log(2,N) is partly correct. It is not just a good heuristic, or a reasonable limit.
In fact, as you have probably guessed (depicted from your second question), there is an important mathematical reason for this: in tilde notation (search for tilde notation), quicksort makes on average ~2*log(2,N) comparisons. In big-oh notation, this is equivalent to O(N*log(2,N)).
That is why introsort switches to heapsort (which has asymptotic O(N*log(2,N)) complexity) when the depth of the recursion becomes more than 2*log(2,N). You can think of it as something which is not usual to happen and most probably means that something went wrong with the pivot picking and quicksort alone would lead to O(N^2) complexity.
You can find a short mathematical proof of the average number of compares quicksort does here (slide 21).

Linear algorithm to find minimum subset sum over a threshold

I have a collection of N positive integers, each bounded by a (relatively small) constant C. I want to find a subset of these numbers with the smallest sum greater than (or equal to) a value K.
The numbers involved aren't terribly large (<100), but I need good performance even in the worst case. I thought maybe I could adapt Pisinger's dynamic programming algorithm to the task; it runs in O(NC) time, and I happen to meet the requirements of bounded, positive numbers.
[Edit]: The numbers are not sorted and there may be duplicates.
However, I don't understand the algorithm well enough to do this myself. In fact, I'm not even certain if the assumptions it is based on still hold...
-Is it possible to adapt this algorithm to my needs?
-Or is there another linear algorithm I could use that is similarly efficient?
-Could anyone provide pseudocode or a detailed explanation?
Thanks.
Link to the Subset-Sum code I was investigating:
Fast solution to Subset sum algorithm by Pisinger
(Apologies if this is poorly worded/formatted/etc. I'm still new to StackOverflow...)
Pisinger's algorithm gives you the largest sum less than or equal to the capacity of the knapsack. To solve your problem, use Pisinger to figure out what not to put in the subset. Formally, let the items be w_1, ..., w_n and the minimum be K. Give w_1, ..., w_n and w_1 + ... + w_n - K to Pisinger, then take every item that Pisinger does not.
Well one solution is to:
T = {0}
for x in V
for t in T
T.insert(x+t)
for i in K to max(T)
if (T.contains(i))
return i
fail
This gives you the size of the subset, but you can adapt to output the members.
The maximum size of T is O(N) (because of C bound), so the running time is O(N^2) and the space is O(N). You can use a bit array of length NC as the backing store of T.