I recently had an interview and was asked to find number of bits in integer supplied. I had something like this:
#include <iostream>
using namespace std;
int givemCountOnes (unsigned int X) {
int count =0;
while (X != 0 ) {
if(X & 1)
count++;
X= X>>1;
}
return count;
}
int main() {
cout << givemCountOnes (4);
return 0;
}
I know there are better approaches but that is not the question here.
Question is, What is the complexity of this program?
Since it goes for number of bits in the input, people say this is O(n) where n is the number of bits in input.
However I feel that since the upper bound is sizeof(unsigned int) i.e. say 64 bits, I should say order is o(1).
Am I wrong?
The complexity is O(N). The complexity rises linearly with the size of the type used (unsigned int).
The upper bound does not matter as it can be extended any time in the future. It also does not matter because there is always an upper bound (memory size, number of atoms in the universe) and then everything could be considered O(1).
I will just add a better solution to above problem.
Use the following step in the Loop
x = x & (x-1);
This will remove the right most ON bit one at a time.
So your loop will at max run as long as there is an ON bit. Terminate when the number approaches 0.
Hence the complexity improves from O(number of bits in int) to O(number of on bits).
The O notation is used to tell what the difference is between different values of n. In this case, n would be the number of bits, as (in your case) the number of bits will change the (relative) time it takes to perform the calculation. So O(n) is correct - a one bit integer will take 1 unit of time, a 32-bit integer will take 32 units of time, and a 64-bit integer will take 64 units of time.
Actually, your algorithm is not dependent on the actual number of bits in the number, but the number of the highest bit set in the number, but that's a different matter. However, since we're typically talking about O as the "worst case", it's still O(n), where n is the number of bits in the integer.
And I can't really think of any method that is sufficiently better than that in terms of O - I can think of methods that improve the number of iterations in the loop (e.g using a 256 entry table, and dealing with 8 bits at a time), but it's still "bigger data -> longer time". Since O(n) and O(n/2) or O(n/8) are all the same (it's just that the overall time is 1/8 in the latter case than in the first case).
Big O notation describes count of algorithm steps in worst case scenario. Which is in this case, when there is a 1 in the last bit. So there will be n iterations/steps when you pass n bit number as input.
Imagine a similar algorithm which searches count of 1's in a list. It's complexity is O(n), where n is a list length. By your assumption, if you always pass fixed size lists as input, then algorithm complexity will become O(1) which is incorrect.
However if you fix bit length in algorithm: i.e. something like for (int i = 0; i < 64; ++i) ... then it will have O(1) complexity, since it doing O(1) operation 64 times, you can ignore constant here. Otherwise O(c*n) is O(n), O(c) is O(1), where c is constant.
Hope all these examples helped. BTW, there is O(1) solution for this, I'll post it when I remember :)
There's one thing should be cleared: the complexity of operation on your integer. It is not clear in this example, as you work on int, which is natural word size on your machine its complexity seem to be just 1.
But O-notation is about large amount of data and large tasks, say you have n bit integer, where n is about 4096 or so. In this case complexity addition, subtraction and shift are of O(n) complexity at least, so your algorithm then applied to such integer would be O(n²) complexity (n operations of O(n) complexity applied).
Direct count algorithm without shifting of whole number (in assumption that one bit test is O(1)) gives O(n log(n)) complexity (it involves up to n additions on log(n) sized integer).
But for fixed length data (which is C's int) big O analysis is simply meaningless, because it based on input data of variable length, say more, data of virtually any length upto infinity.
Related
What would be the efficieny of the following program, it is a for loop which runs for a finite no. of times.
for(int i = 0; i < 10; i++ )
{
//do something here, no more loops though.
}
So, what should be the efficiecy. O(1) or O(n) ?
That entirely depends on what is in the for loop. Also, computational complexity is normally measured in terms of the size n of the input, and I can't see anything in your example that models or represents or encodes directly or indirectly the size of the input. There is just the constant 10.
Besides, although sometimes the analysis of computational complexity may give unexpected, surprising results, the correct term is not "Big Oh", but rather Big-O.
You can only talk about the complexity with respect to some specific input to the calculation. If you are looping ten times because there are ten "somethings" that you need to do work for, then your complexity is O(N) with respect to those somethings. If you just need to loop 10 times regardless of the number of somethings - and the processing time inside the loop doesn't change with the number of somethings - then your complexity with respect to them is O(1). If there's no "something" for which the order is greater than 1, then it's fair to describe the loop as O(1).
bit of further rambling discussion...
O(N) indicates the time taken for the work to complete can be reasonably approximated by some constant amount of time plus some function of N - the number of somethings in the input - for huge values of N:
O(N) indicates the time is c + xN, where c is a fixed overhead and x is the per-something processing time,
O(log2N) indicates time is c + x(log2N),
O(N2) indicates time is c + x(N2),
O(N!) indicates time is c + x(N!)
O(NN) indicates time is c + x(NN)
etc..
Again, in your example there's no mention of the number of inputs, and the loop iterations is fixed. I can see how it's tempting to say it's O(1) even if there are 10 input "somethings", but consider: if you have a function capable of processing an arbitrary number of inputs, then decide you'll only use it in your application with exactly 10 inputs and hard-code that, you clearly haven't changed the performance characteristics of the function - you've just locked in a single point on the time-for-N-input curve - and any big-O complexity that was valid before the hardcoding must still be valid afterwards. It's less meaningful and useful though as N of 10 is a small amount and unless you've got an horrific big-O complexity like O(NN) the constants c and x take on a lot more importance in describing the overall performance than they would for huge values of N (where changes in the big-O notation generally have much more impact on performance than changing c or even x - which is of course the whole point of having big-O analysis).
Sure O(1), because here nothing does not depend linearly of n.
EDIT:
Let the loop body to contain some complex action with complexity O(P(n)) in Big O terms.
If we have a constant C number of iterations, the complexity of loop will be O(C * P(n)) = O(P(n)).
Else, now let the number of iterations to be Q(n), depends of n. It makes the complexity of loop O(Q(n) * P(n)).
I'm just trying to say that when the number of iterations is constant, it does not change the complexity of the whole loop.
n in Big O notation denotes the input size. We can't tell what is the complexity, because we don't know what is happening inside the for loop. For example, maybe there are recursive calls, depending on the input size? In this example overall is O(n):
void f(int n) // input size = n
{
for (int i = 0; i < 10; i++ )
{
//do something here, no more loops though.
g(n); // O(n)
}
}
void g(int n)
{
if (n > 0)
{
g(n - 1);
}
}
What would the big O notation of the function foo be?
int foo(char *s1, char *s2)
{
int c=0, s, p, found;
for (s=0; s1[s] != '\0'; s++)
{
for (p=0, found=0; s2[p] != '\0'; p++)
{
if (s2[p] == s1[s])
{
found = 1;
break;
}
}
if (!found) c++;
}
return c;
}
What is the efficiency of the function foo?
a) O(n!)
b) O(n^2)
c) O(n lg(base2) n )
d) O(n)
I would have said O(MN)...?
It is O(n²) where n = max(length(s1),length(s2)) (which can be determined in less than quadratic time - see below). Let's take a look at a textbook definition:
f(n) ∈ O(g(n)) if a positive real number c and positive integer N exist such that f(n) <= c g(n) for all n >= N
By this definition we see that n represents a number - in this case that number is the length of the string passed in. However, there is an apparent discrepancy, since this definition provides only for a single variable function f(n) and here we clearly pass in 2 strings with independent lengths. So we search for a multivariable definition for Big O. However, as demonstrated by Howell in "On Asymptotic Notation with Multiple Variables":
"it is impossible to define big-O notation for multi-variable functions in a way that implies all of these [commonly-assumed] properties."
There is actually a formal definition for Big O with multiple variables however this requires extra constraints beyond single variable Big O be met, and is beyond the scope of most (if not all) algorithms courses. For typical algorithm analysis we can effectively reduce our function to a single variable by bounding all variables to a limiting variable n. In this case the variables (specifically, length(s1) and length(s2)) are clearly independent, but it is possible to bound them:
Method 1
Let x1 = length(s1)
Let x2 = length(s2)
The worst case scenario for this function occurs when there are no matches, therefore we perform x1 * x2 iterations.
Because multiplication is commutative, the worst case scenario foo(s1,s2) == the worst case scenario of foo(s2,s1). We can therefore assume, without loss of generality, that x1 >= x2. (This is because, if x1 < x2 we could get the same result by passing the arguments in the reverse order).
Method 2 (in case you don't like the first method)
For the worst case scenario (in which s1 and s2 contain no common characters), we can determine length(s1) and length(s2) prior to iterating through the loops (in .NET and Java, determining the length of a string is O(1) - but in this case it is O(n)), assigning the greater to x1 and the lesser to x2. Here it is clear that x1 >= x2.
For this scenario, we will see that the extra calculations to determine x1 and x2 make this O(n² + 2n) We use the following simplification rule which can be found here to simplify to O(n²):
If f(x) is a sum of several terms, the one with the largest growth rate is kept, and all others omitted.
Conclusion
for n = x1 (our limiting variable), such that x1 >= x2, the worst case scenario is x1 = x2.
Therefore: f(x1) ∈ O(n²)
Extra Hint
For all homework problems posted to SO related to Big O notation, if the answer is not one of:
O(1)
O(log log n)
O(log n)
O(n^c), 0<c<1
O(n)
O(n log n) = O(log n!)
O(n^2)
O(n^c)
O(c^n)
O(n!)
Then the question is probably better off being posted to https://math.stackexchange.com/
In big-O notation, we always have to define what the occuring variables mean. O(n) doesn't mean anything unless we define what n is. Often, we can omit this information because it is clear from context. For example if we say that some sorting algorithm is O(n log(n)), n always denotes the number of items to sort, so we don't have to always state this.
Another important thing about big-O notation is that it only gives an upper limit -- every algorithm in O(n) is also in O(n^2). The notation is often used as meaning "the algorithm has the exact asymptotic complexity given by the expression (up to a constant factor)", but it's actual definition is "the complexity of the alogrithm is bounded by the given expression (up to a constant factor)".
In the example you gave, you took m and n to be the respective lengths of the two strings. With this definition, the algorithm is indeed O(m n). If we define n to be the length of the longer of the two strings though, we can also write this as O(n^2) -- this is also an upper limit for the complexity of the algorithm. And with the same definition of n, the algorithm is also O(n!), but not O(n) or O(n log(n)).
O(n^2)
The relevant part of the function, in terms of complexity, is the nested loops. The maximum number of iterations is the length of s1 times the length of s2, both of which are linear factors, so the worst-case computing time is O(n^2), i.e. the square of a linear factor. As Ethan said, O(mn) and O(n^2) are effectively the same thing.
Think of it this way:
There are two inputs. If the function simply returned, then it's performance is unrelated to the arguments. This would be O(1).
If the function looped over one string, then the performance is linearly related to the length of that string. Therefore O(N).
But the function has a loop within a loop. The performance is related to the length of s1 and the length of S2. Multiply those lengths together and you get the number of loop iterations. It's not linear any more, it follows a curve. This is O(N^2).
What is the complexity of the below program? I think it must be O(n), since there is a for loop that runs for n times.
It is a program to reverse the bits in a given integer.
unsigned int reverseBits(unsigned int num)
{
unsigned int NO_OF_BITS = sizeof(num) * 8;
unsigned int reverse_num = 0;
int i;
for (i = 0; i < NO_OF_BITS; i++)
{
if((num & (1 << i)))
reverse_num |= 1 << ((NO_OF_BITS - 1) - i);
}
return reverse_num;
}
What is the complexity of the above program and how? Someone said that the actual complexity is O(log n), but I can't see why.
Considering your above program, the complexity is O(1) because 8 * sizeof(unsigned int) is a constant. Your program will always run in constant time.
However if n is bound to NO_OF_BITS and you make that number an algorithm parameter (which is not the case), then the complexity will be O(n).
Note that with n bits the maximal value possible for num is 2^n, and that in this case if you want to express the complexity as a function of the maximal value allowed for num, the complexity is O(log₂(n)) or O(log(N)).
O-notation describes how the time or space requirements for an algorithm depend on the size of the input (denoted n), in the limit as n becomes very large. The input size is the number of bits required to represent the input, not the range of values that those bits can represent.
(Formally, describing an algorithm with running time t(n) as O(f(n)) means that there is some size N and some constant C for which t(n) <= C*f(n) for all n > N).
This algorithm does a fixed amount of work for each input bit, so the time complexity is O(n). It uses a working space, reverse_num, of the same size as the input (plus some asymptotically smaller variables), so the space complexity is also O(n).
This particular implementation imposes a limit on the input size, and therefore a fixed upper bound on the time and space requirements. This does not mean that the algorithm is O(1), as some answers say. O-notation describes the algorithm, not any particular implementation, and is meaningless if you place an upper bound on the input size.
if n==num, complexity is constant O(1) as the loop always runs fixed number of times. The space complexity is also O(1) as it does not depend on the input
If n is the input number, then NO_OF_BITS is O(log n) (think about it: to represent a binary number n, you need about log2(n) bits).
EDIT: Let me clarify, in the light of other responses and comments.
First, let n be the input number (num). It's important to clarify this because if we consider n to be NO_OF_BITS instead, we get a different answer!
The algorithm is conceptually O(log n). We need to reverse the bits of n. There are O(log n) bits needed to represent the number n, and reversing the bits involves a constant amount of work for each bit; hence the complexity is O(log n).
Now, in reality, built-in types in C cannot represent integers of arbitrary size. In particular, in this implementation uses unsigned int to represent the input, and this type is limited to a fixed number of bits (32 on most systems). Moreover, rather than just going through as many bits as necessary (from the lowest-order bit to the higher-order bit which is 1), this implementation chooses to go through all 32 bits. Since 32 is a constant, this implementation technically runs in O(1) time.
Nonetheless, the algorithm in conceptually O(log n), in the sense that if the input was 2^5, 5 iterations would be sufficient, if the input was 2^10, 10 iterations would be sufficient, and if there were no limit on the range of numbers an unsinged int would represent and the input was 2^1000, then 1000 iterations would be necessary.
Under no circumstances is this algorithm O(n) (unless we define n to be NO_OF_BITS, in which case it is).
You need to be clear what n is. If n is num then of course your code is O(log n) as NO_OF_BITS ~= log_2(n) * 8.
Also, as you are dealing with fixed size values, the whole thing is O(1). Of course, if you are viewing this as a more general concept and are likely to extend it, then feel free to think of it as O(log n) in the more general context where you intend to extend it beyond fixed bit numbers.
I searched around and could not find the performance time specifications for bitset::count(). Does anybody know what it is (O(n) or better) and where to find it?
EDIT By STL I refer only to the Standard Template Library.
I read this file (C:\cygwin\lib\gcc\i686-pc-cygwin\3.4.4\include\c++\bitset) on my computer.
See these
/// Returns the number of bits which are set.
size_t
count() const { return this->_M_do_count(); }
size_t
_M_do_count() const
{
size_t __result = 0;
for (size_t __i = 0; __i < _Nw; __i++)
__result += __builtin_popcountl(_M_w[__i]);
return __result;
}
BTW, this is where _Nw is specified:
template<size_t _Nw>
struct _Base_bitset
Thus it's O(n) in gcc implementation. We conclude the specification doesn't require it better than O(n). And nobody in their right mind will implement it in a way worse than that. We can then safely assume that it's at worst O(n). Possibly better but you can never count on that.
I can't be sure what you really mean by "STL" here, due to a prevailing misuse of the term in the C++ community.
The C++ Standard (2003) makes no mandate for the performance of std::bitset::count() (or, in fact, any members of std::bitset as far as I can see).
I can't find any reference suggesting a mandate for the performance of STL's bitset::count() either.
I think any sane implementation will provide this in constant (or at worst linear) time, though. However, this is merely a feeling. Check yours to find out what you'll actually get.
"SGI's reference implementation runs
in linear time with respect to the
number of bytes needed to store the
bits. It does this by creating a
static array of 256 integers. The
value stored at ith index in the array
is the number of bits set in the value
i."
http://www.cplusplus.com/forum/general/12486/
I'm not sure you're going to find a specification for that, since the STL doesn't typically require a certain level of performance. I've seen hints that it's "fast", around 1 cycle per bit in the set's size. You can of course read your particular implementation's code to find out what to expect.
The Algorithm that we follow is to count all the bits that are set to 1.
Now if we want to count through that bitset for a number n, we would go through log(n)+1 digits.
For example: for the number 13, we get 1101 as the bitset.
Natural log of 13 = 2.564 (approximately) 3
Number of bits = 3+1 = 4
For any number n(decimal) we loop log(n)+1 times.
Another approach would be the following:
int count_set_bits_fast(int n) {
int count = 0;
while (n > 0) {
n=(n&(n-1));
count++
}
return count;
}
If you analyse the functional line n=(n&(n-1)); you shall find that it essentially reduces the number of bits from right to left.
The Order would therefore be number of total set bits.
For example: 13 = 1101
1101&1100 = 1100
1100&1011 = 1000
1000&0111 = 0
O(number of set bits), O(Log(n)+1) Worst case
Is there any method to remove the duplicate elements in an array in place in C/C++ in O(n)?
Suppose elements are a[5]={1,2,2,3,4}
then resulting array should contain {1,2,3,4}
The solution can be achieved using two for loops but that would be O(n^2) I believe.
If, and only if, the source array is sorted, this can be done in linear time:
std::unique(a, a + 5); //Returns a pointer to the new logical end of a.
Otherwise you'll have to sort first, which is (99.999% of the time) n lg n.
Best case is O(n log n). Perform a heap sort on the original array: O(n log n) in time, O(1)/in-place in space. Then run through the array sequentially with 2 indices (source & dest) to collapse out repetitions. This has the side effect of not preserving the original order, but since "remove duplicates" doesn't specify which duplicates to remove (first? second? last?), I'm hoping that you don't care that the order is lost.
If you do want to preserve the original order, there's no way to do things in-place. But it's trivial if you make an array of pointers to elements in the original array, do all your work on the pointers, and use them to collapse the original array at the end.
Anyone claiming it can be done in O(n) time and in-place is simply wrong, modulo some arguments about what O(n) and in-place mean. One obvious pseudo-solution, if your elements are 32-bit integers, is to use a 4-gigabit bit-array (512 megabytes in size) initialized to all zeros, flipping a bit on when you see that number and skipping over it if the bit was already on. Of course then you're taking advantage of the fact that n is bounded by a constant, so technically everything is O(1) but with a horrible constant factor. However, I do mention this approach since, if n is bounded by a small constant - for instance if you have 16-bit integers - it's a very practical solution.
Yes. Because access (insertion or lookup) on a hashtable is O(1), you can remove duplicates in O(N).
Pseudocode:
hashtable h = {}
numdups = 0
for (i = 0; i < input.length; i++) {
if (!h.contains(input[i])) {
input[i-numdups] = input[i]
h.add(input[i])
} else {
numdups = numdups + 1
}
This is O(N).
Some commenters have pointed out that whether a hashtable is O(1) depends on a number of things. But in the real world, with a good hash, you can expect constant-time performance. And it is possible to engineer a hash that is O(1) to satisfy the theoreticians.
I'm going to suggest a variation on Borealids answer, but I'll point out up front that it's cheating. Basically, it only works assuming some severe constraints on the values in the array - e.g. that all keys are 32-bit integers.
Instead of a hash table, the idea is to use a bitvector. This is an O(1) memory requirement which should in theory keep Rahul happy (but won't). With the 32-bit integers, the bitvector will require 512MB (ie 2**32 bits) - assuming 8-bit bytes, as some pedant may point out.
As Borealid should point out, this is a hashtable - just using a trivial hash function. This does guarantee that there won't be any collisions. The only way there could be a collision is by having the same value in the input array twice - but since the whole point is to ignore the second and later occurences, this doesn't matter.
Pseudocode for completeness...
src = dest = input.begin ();
while (src != input.end ())
{
if (!bitvector [*src])
{
bitvector [*src] = true;
*dest = *src; dest++;
}
src++;
}
// at this point, dest gives the new end of the array
Just to be really silly (but theoretically correct), I'll also point out that the space requirement is still O(1) even if the array holds 64-bit integers. The constant term is a bit big, I agree, and you may have issues with 64-bit CPUs that can't actually use the full 64 bits of an address, but...
Take your example. If the array elements are bounded integer, you can create a lookup bitarray.
If you find an integer such as 3, turn the 3rd bit on.
If you find an integer such as 5, turn the 5th bit on.
If the array contains elements rather than integer, or the element is not bounded, using a hashtable would be a good choice, since hashtable lookup cost is a constant.
The canonical implementation of the unique() algorithm looks like something similar to the following:
template<typename Fwd>
Fwd unique(Fwd first, Fwd last)
{
if( first == last ) return first;
Fwd result = first;
while( ++first != last ) {
if( !(*result == *first) )
*(++result) = *first;
}
return ++result;
}
This algorithm takes a range of sorted elements. If the range is not sorted, sort it before invoking the algorithm. The algorithm will run in-place, and return an iterator pointing to one-past-the-last-element of the unique'd sequence.
If you can't sort the elements then you've cornered yourself and you have no other choice but to use for the task an algorithm with runtime performance worse than O(n).
This algorithm runs in O(n) runtime. That's big-oh of n, worst case in all cases, not amortized time. It uses O(1) space.
The example you have given is a sorted array. It is possible only in that case (given your constant space constraint)