I am wondering if in a situation like the following (an if/else statement under a for loop) the complexity would be O(n) or O(n^2):
for character in string:
if character==something:
do something
else:
do something else.
Thank you!
It will be
O(n) if
'do something' and 'do something else' are O(1)
O(n^2) if
'do something' and 'do something else' are O(n)
Basically the complexity of the for loop will depend on the complexity of it components and the no. of loops.
Put simply, O(n) basically means the algorithm will take as much time to execute as there are elements in n. O(1) means the algorithm always takes a constant time, regardless of how many elements are in the input. O(n^2) means the algorithm takes number of items squared time (i.e. slows down more and more the bigger the input is).
In your case you're doing the same kind of thing for every item in the input once. if..else is just one normal statement you do to each item once. It does neither increase nor decrease the runtime/complexity. Your algorithm is O(n).
It depends what you are doing in the else statement, but I believe it is O(n) because worst case you go through the string n times.
its just 0(n) nothing else because in the evaluation if statement you are just evaluating the truthy of the condition which is constant so constant times any order is that order .for example O(n)*constant is O(n) .so the answer would be O(n).
Related
Is it possible to search a character in a string in less than O(n). Most of the algorithms I have come across takes O(n) time. string::find() takes O(N*M). Is there any algorithm in which you are not required to traverse the whole string.
Well first I must say there isn't algorithm less than linear O(n) time complexity for your problem.
Of course you can try randomized algorithm like Las Vegas way of thinking. Pick a random spot from your array if lucky you found your char if not try again and store the wrong index.
Worst case is like linear search O(n) but purely lucky if you get it in few first tries it's way less. This kind of algorithm maybe is what you are looking for. On average it will find char maybe faster but remember if it still takes k tries when for sure k < n it is still linear time. O(k)
If you know more from your input maybe you can think of way to solve it differently.
Hope this helps
I'm wondering whether the big-o complexity of the following code should be o(1) or o(n).
for(int i=0;i<n && n<100;i++){sum++;}
Here's what i think:
since n is limited to be lower than 100, worst case will be O(99) + O(98) + ... + O(1) = 99 * O(1) = O(1)
However, by intuition, the code is somehow O(n) due to the loop.
Would really appreciate it if someone can advise me on this.
Thank you!
Intuitively it is O(1) because as n increases the runtime does not increase after a certain point. However, this is an edge case, as were n bounded by a much higher number, say the maximum value of an int, it would seem to be no different than if n was not bounded at all. However, when considering runtime using complexity theory we usually ignore things like that maximum size of an int.
Another way to think of this is that the number of iterations grows linearly with n for n in (0,100) and is constant otherwise. When considering n can be any value, however, the algorithm is definitely O(1)
This is all assuming each iteration of the loop takes constant time.
For more information look up Liouville's theorem.
I've read that C++ uses introsort (introspective sort) for its built-in std::sort where it starts off with quicksort and switches to heapsort when you hit the depth limit.
I've also read that the depth limit is supposed to be 2*log(2,N).
Is this value purely experimental? Or is there some mathematical theory behind it?
If you have an interval (range or array), the number of times you'll have to split the interval in half before you end up with an empty (or one element) interval is log(2,N), that's just a mathematical fact, you can work it out easily, if you want. If all goes perfectly well with quicksort, it should recurse log(2,N) times, for the same reason (and at each recursion level, it has to process all values of the interval, which leads to a O(N*log(2,N)) complexity for the overall algorithm). The problem is that quicksort could require many more recursions (if it keeps getting "unlucky" with picking pivot values, which means that it doesn't split the interval in half, but in an imbalanced way instead). At worse, quicksort could end up recursing N times, which is definitely not acceptable for a production-quality implementation.
Switching to heap-sort at 2*log(2,N) is just a good heuristic in general, to detect a much too deep number of recursions.
Technically, you could base this on the empirical performance of heap-sort versus quick-sort, to figure out what limit is the best. But such tests are highly dependent on the application (what are you sorting? how are you comparing elements? how cheap are the element swaps? etc..). So, most one-size-fits-all implementation, like std::sort, would just pick a reasonable limit like 2*log(2,N).
What #Mikael Persson said regarding why the depth limit is 2*log(2,N) is partly correct. It is not just a good heuristic, or a reasonable limit.
In fact, as you have probably guessed (depicted from your second question), there is an important mathematical reason for this: in tilde notation (search for tilde notation), quicksort makes on average ~2*log(2,N) comparisons. In big-oh notation, this is equivalent to O(N*log(2,N)).
That is why introsort switches to heapsort (which has asymptotic O(N*log(2,N)) complexity) when the depth of the recursion becomes more than 2*log(2,N). You can think of it as something which is not usual to happen and most probably means that something went wrong with the pivot picking and quicksort alone would lead to O(N^2) complexity.
You can find a short mathematical proof of the average number of compares quicksort does here (slide 21).
Let's say we have an array of 1.000.000 elements and we go through all of them to check something simple, for example if the first character is "A". From my (very little) understanding, the complexity will be O(n) and it will take some X amount of time. If I add another IF (not else if) to check, let's say, if the last character is "G", how will it change complexity? Will it double the complexity and time? Like O(2n) and 2X?
I would like to avoid taking into consideration the number of calculations different commands have to make. For example, I understand that Len() requires more calculations to give us the result than a simple char comparison does, but let's say that the commands used in the IFs will have (almost) the same amount of complexity.
O(2n) = O(n). Generalizing, O(kn) = O(n), with k being a constant. Sure, with two IFs it might take twice the time, but execution time will still be a linear function of input size.
Edit: Here and Here are explanations, with examples, of the big-O notation which is not too mathematic-oriented
Asymptotic complexity (which is what big-O uses) is not dependent on constant factors, more specifically, you can add / remove any constant factor to / from the function and it will remain equivalent (i.e. O(2n) = O(n)).
Assuming an if-statement takes a constant amount of time, it will only add a constant factor to the complexity.
A "constant amount of time" means:
The time taken for that if-statement for a given element is not dependent on how many other elements there are in the array
So basically if it doesn't call a function which looks through the other elements in the array in some way or something similar to this
Any non-function-calling if-statement is probably fine (unless it contains a statement that goes through the array, which some language allows)
Thus 2 (constant-time) if-statements called for each each element will be O(2n), but this is equal to O(n) (well, it might not really be 2n, more on that in the additional note).
See Wikipedia for more details and a more formal definition.
Note: Apart from not being dependent on constant factors, it is also not dependent on asymptotically smaller terms (terms which remain smaller regardless of how big n gets), e.g. O(n) = O(n + sqrt(n)). And big-O is just an upper bound, so saying it is O(n9999) would also be correct (though saying that in a test / exam will probably get you 0 marks).
Additional note: The problem when not ignoring constant factors is - what classifies as a unit of work? There is no standard definition here. One way is to use the operation that takes the longest, but determining this may not always be straight-forward, nor would it always be particularly accurate, nor would you be able to generically compare complexities of different algorithms.
Some key points about time complexity:
Theta notation - Exact bound, hence if a piece of code which we are analyzing contains conditional if/else and either part has some more code which grows based on input size then exact bound can't be obtained since either of branch might be taken and Theta notation is not advisable for such cases. On the other hand, if both of the branches resolve to constant time code, then Theta notation can be applicable in such case.
Big O notation - Upper bound, so if a code has conditionals where either of the conditional branches might grow with input size n, then we assume max or upper bound to calculate the time consumption by the code, hence we use Big O for such conditionals assuming we take the path that has max time consumption. So, the path which has lower time can be assumed as O(1) in amortized analysis(including the fact that we assume this path has no no recursions that may grow with the input size) and calculate time complexity Big O for the lengthiest path.
Big Omega notation - Lower bound, This is the minimum guaranteed time that a piece of code can take irrespective of the input. Useful for cases where the time taken by code doesn't grow based on input size n, but it consumes a significant amount of time k. In these cases, we can use the lower bound analysis.
Note: All of these notations doesn't depend upon the input being best/avg/worst and all of these can be applied to any piece of code.
So as discussed above, Big O doesn't care about the constant factors such as k and only sees how time increases with respect to growth in n, in which case here it is O(kn) = O(n) linear.
PS: This post was about the relation of big O and conditionals evaluation criteria for amortized analysis.
It's related to a question I posted myself today.
In your example it depends on whether you can jump from the first to the last element and if you can't then it also depends on the average length of each entry.
If as you went down through the array you had to read each full entry in order to evaluate your two if statements then your order would be O(1,000,000xN) where N is the average length of each entry. IF N is variable then it will affect the order. An example would be standard multiplication where we perform Log(N) additions of an entry which is Log(N) in lenght and so the order is O(Log^2(N)) or if you prefer O((Log(N))^2).
On the other hand if you can just check the first and last character then N = 2 and is constant so can be ignored.
This is an IMPORTANT point you have to be careful though because how can you decide if your multipler can be ignored. For example say we were doing Log(N) additions of a Log(N/100) number. Now just because Log(N/100) is the smaller term doesn't mean we can ignore it. The multiplying factor cannot be ignored if it is variable.
The name says it all really. I suspect that insertion sort is best, since it's the best sort for mostly-sorted data in general. However, since I know more about the data there is a chance there are other sorts woth looking at. So the other relevant pieces of information are:
1) this is time data, which means I presumable could create an effective hash for ordering of data.
2) The data won't all exist at one time. instead I'll be reading in records which may contain a single vector, or dozen or hundreds of vectors. I want to output all time within a 5 second window. So it's possible that a sort that does the sorting as I insert the data would be a better option.
3) memory is not a big issue, but CPU speed is as this may be a bottleneck of the system.
Given these conditions can anyone suggest an algorithm that may be worth considering in addition to insertion sort? Also, How does one defined 'mostly sorted' to decide what is a good sort option? What I mean by that is how do I look at my data and decided 'this isn't as sorted as I thought it as, maybe insertion sort is no longer the best option'? Any link to an article which considered process complexity which better defines the complexity relative to the degree data is sorted would be appreciated.
Thanks
Edit:
thank you everyone for your information. I will be going with an easy insertion or merge sort (whichever I have already pre-written) for now. However, I'll be trying some of the other methods once were closer to the optimization phase (since they take more effort to implement). I appreciate the help
You could adopt option (2) you suggested - sort the data while you insert elements.
Use a skip list, sorted according to time, ascending to maintain your data.
Once a new entree arrives - check if it is larger then the last
element (easy and quick) if it is - simply append it (easy to do in a skip list). The
skip list will need to add 2 nodes on average for these cases, and will be O(1) on
average for these cases.
If the element is not larger then the last element - add it to the
skip list as a standard insert op, which will be O(logn).
This approach will yield you O(n+klogn) algorithm, where k is the number of elements inserted out of order.
I would throw in merge sort if you implement the natural version you get a best case of O(N) with a typical and worst case of O(N log N) if you have any problems. Insertion you get a worst case of O(N^2) and a best case of O(N).
You can sort a list of size n with k elements out of place in O(n + k lg k) time.
See: http://www.quora.com/How-can-I-quickly-sort-an-array-of-elements-that-is-already-sorted-except-for-a-small-number-of-elements-say-up-to-1-4-of-the-total-whose-positions-are-known/answer/Mark-Gordon-6?share=1
The basic idea is this:
Iterate over the elements of the array, building an increasing subsequence (if the current element is greater than or equal to the last element of the subsequence, append it to the end of the subsequence. Otherwise, discard both the current element and the last element of the subsequence). This takes O(n) time.
You will have discarded no more than 2k elements since k elements are out of place.
Sort the 2k elements that were discarded using an O(k lg k) sorting algorithm like merge sort or heapsort.
You now have two sorted lists. Merge the lists in O(n) time like you would in the merge step of merge sort.
Overall time complexity = O(n + k lg k)
Overall space complexity = O(n)
(this can be modified to run in O(1) space if you can merge in O(1) space, but it's by no means trivial)
Without fully understanding the problem, Timsort may fit the bill as you're alleging that your data is mostly sorted already.
There are many adaptive sorting algorithms out there that are specifically designed to sort mostly-sorted data. Ignoring the fact that you're storing dates, you might want to look at smoothsort or Cartesian tree sort as algorithms that can sort data that is reasonable sorted in worst-case O(n log n) time and best-case O(n) time. Smoothsort also has the advantage of requiring only O(1) space, like insertion sort.
Using the fact that everything is a date and therefore can be converted into an integer, you might want to look at binary quicksort (MSD radix sort) using a median-of-three pivot selection. This algorithm has best-case O(n log n) performance, but has a very low constant factor that makes it pretty competitive. Its worst case is O(n log U), where U is the number of bits in each date (probably 64), which isn't too bad.
Hope this helps!
If your OS or C library provides a mergesort function, it is very likely that it already handles the case where the data given is partially ordered (in any direction) running in O(N) time.
Otherwise, you can just copy the mergesort available from your favorite BSD operating system.