How many numbers in an array are smaller than a given number? - c++

The naive one is O(n). Is there a one that is O(log n) or even O(1)?
How about a sorted array? How about using binary search tree?
How about my array has a size n = [2 ^(h + 1)] − 1 ? // h=height of a complete binary tree

Unsorted
If the array is not sorted, then you can do no better than O(n). Proof: suppose you didn't look at every single element of the array, then an adversary could just make one of the elements that you didn't look at larger or smaller than the given number to make your count incorrect. So, better than O(n) is not possible.
Sorted
If the array is sorted, then you can determine the result in O(log n) time by locating the first element that is greater than or equal to the given number, and then simply subtracting that index from the size of the array.

With unsorted, you can't do better than O(n). Final.
With sorted, you can do in worst case O(log(n)) with binary search. Now you can improve upon this assuming the array layout has either decent entropy or is (mostly) linear by aiming at expected point as if the layout was purely linear.
For example, take a sorted array a[n] with a[0]=x, a[n]=y, and your threshold v.
Instead of bisecting the array at n/2, test element of a[n*(v-x)/(y-x)]
With regular layout (a[i] = const1*i+const2) you get the result in O(1), one hit +- rounding error, so at worst 2. With "white noise" random layout (all values equally probable), you get it still much faster than O(log(n)).

Related

What is the Time Complexity of std::multiset::count in C++?

I am confused about the number of operations that takes place when calling count(x) for some element x in a multiset of size n.
Am I correct that the number of operations is log(n) + #_of_matches_of_x, meaning logarithmic in the number of elements in the multiset, plus the number of matches of the target element x among all elements in the multiset?
Thanks for your time!
As the reference link has mentioned, the complexity of count is:
Logarithmic in the size of the container plus linear in the number of
the elements found.
The reason is that std::multiset is a tree-like data structure with a container at each tree node. So, for when calling std::multiset::count, you should first find the key in the tree O(log(All elements)) and then count the elements in that found node (O(found elements)).
This site clearly states that the complexity of multiset::count is
Logarithmic in size and linear in the number of matches.
Or you can check out this one.
Logarithmic in the size of the container plus linear in the number of the elements found.
Well, I pulled out an interesting article for you. (Link)

To find sum of all consecutive sub-array of length k in a given array

I want to find out all the sum of continuous sub-array of length K
for a given array of Length n given that k < n. For example, let the given array be arr[6]={1,2,3,4,5,6} and k=3,then answer is (6,9,12,15).
It can be obtained as :
(1+2+3)=6,
(2+3+4)=9,
(3+4+5)=12,
(4+5+6)=15.
I have tried this using sliding window of length k,but its time complexity is O(n).Is any solution which takes even less time such as O(log n).
Unless you know certain specific properties of the array (e.g. the ordering of the elements, the range of the elements included in the array, etc.) then you would need to check each individual value, resulting in an O(n) complexity.
If, for instance, you knew that the sum of the values in the array were T (perhaps because you knew T itself or were given the range) then you could consider that all the elements except the first and last (K-1) elements would be included in K different sums. This would mean a sum of T.K minus some amount, and you could reduce the values of the first and last K values appropriate amount of times, resulting in an algorithm of complexity O(K).
But note that, in order to achieve a strategy similar to this, you would have to know some other specific information regarding the values in the array, may that be their range or their sum.
You can use Segment tree data structure, though building of it will take O(n log n), but than you can find sum of any interval in O( log n ), and modify each element of array in O( log n )
https://en.wikipedia.org/wiki/Segment_tree

What would be the most efficient way to find a[i] = i in a sorted array?

Given an array a[], what would be the most efficient way to determine whether or not at least one element i satisfies the condition a[i] == i?
All the elements in the array are sorted and distinct, but they aren't necessarily integer types (i.e. they might be floating point types).
Several people have made claims about the relevance of “sorted”, “distinct” and “aren't necessarily integers”. In fact, proper selection of an efficient algorithm to solve this problem hinges on these characteristics. A more efficient algorithm would be possible if we could know that the values in the array were both distinct and integral, while a less efficient algorithm would be required if the values might be non-distinct, whether or not they were integral. And of course, if the array was not already sorted, you could sort it first (at average complexity O(n log n)) and then use the more efficient pre-sorted algorithm (i.e. for a sorted array), but in the unsorted case it would be more efficient to simply leave the array unsorted and run through it directly comparing the values in linear time (O(n)). Note that regardless of the algorithm chosen, best-case performance is O(1) (when the first element examined contains its index value); at any point during execution of any algorithm we might come across an element where a[i] == i at which point we return true; what actually matters in terms of algorithm performance in this problem is how quickly we can exclude all elements and declare that there is no such element a[i] where a[i] == i.
The problem does not state the sort order of a[], which is a pretty critical piece of missing information. If it’s ascending, the worst-case complexity will always be O(n), there’s nothing we can do to make the worst-case complexity better. But if the sort order is descending, even the worst-case complexity is O(log n): since values in the array are distinct and descending, there is only one possible index where a[i] could equal i, and basically all you have to do is a binary search to find the crossover point (where the ascending index values cross over the descending element values, if there even is such a crossover), and determine if a[c] == c at the crossover point index value c. Since that’s pretty trivial, I’ll proceed assuming that the sort order is ascending. Interestingly if the elements were integers, even in the ascending case there is a similar “crossover-like” situation (though in the ascending case there could be more than one a[i] == i match), so if the elements were integers, a binary search would also be applicable in the ascending case, in which case even the worst-case performance would be O(log n) (see Interview question - Search in sorted array X for index i such that X[i] = i). But we aren’t given that luxury in this version of the problem.
Here is how we might solve this problem:
Begin with the first element, a[0]. If its value is == 0, you’ve found an element which satisfies a[i] == i so return true. If its value is < 1, the next element (a[1]) could possibly contain the value 1, so you proceed to the next index. If, however, a[0] >= 1, you know (because the values are distinct) that the condition a[1] == 1 cannot possibly be true, so you can safely skip index 1. But you can even do better than that: For example, if a[0] == 12, you know (because the values are sorted in ascending order) that there cannot possibly be any elements that satisfy a[i] == i prior to element a[13]. Because the values in the array can be non-integral, we cannot make any further assumptions at this point, so the next element we can safely skip to directly is a[13] (e.g. a[1] through a[12] may all contain values between 12.000... and 13.000... such that a[13] could still equal exactly 13, so we have to check it).
Continuing that process yields an algorithm as follows:
// Algorithm 1
bool algorithm1(double* a, size_t len)
{
for (size_t i=0; i<len; ++i) // worst case is O(n)
{
if (a[i] == i)
return true; // of course we could also return i here (as an int)...
if (a[i] > i)
i = static_cast<size_t>(std::floor(a[i]));
}
return false; // ......in which case we’d want to return -1 here (an int)
}
This has pretty good performance if many of the values in a[] are greater than their index value, and has excellent performance if all values in a[] are greater than n (it returns false after only one iteration!), but it has dismal performance if all values are less than their index value (it will return false after n iterations). So we return to the drawing board... but all we need is a slight tweak. Consider that the algorithm could have been written to scan backwards from n down to 0 just as easily as it can scan forward from 0 to n. If we combine the logic of iterating from both ends toward the middle, we get an algorithm as follows:
// Algorithm 2
bool algorithm2(double* a, size_t len)
{
for (size_t i=0, j=len-1; i<j; ++i,--j) // worst case is still O(n)
{
if (a[i]==i || a[j]==j)
return true;
if (a[i] > i)
i = static_cast<size_t>(std::floor(a[i]));
if (a[j] < j)
j = static_cast<size_t>(std::ceil(a[j]));
}
return false;
}
This has excellent performance in both of the extreme cases (all values are less than 0 or greater than n), and has pretty good performance with pretty much any other distribution of values. The worst case is if all of the values in the lower half of the array are less than their index and all of the values in the upper half are greater than their index, in which case the performance degrades to the worst-case of O(n). Best case (either extreme case) is O(1), while average case is probably O(log n) but I’m deferring to someone with a math major to determine that with certainty.
Several people have suggested a “divide and conquer” approach to the problem, without specifying how the problem could be divided and what one would do with the recursively divided sub-problems. Of course such an incomplete answer would probably not satisfy the interviewer. The naïve linear algorithm and worst-case performance of algorithm 2 above are both O(n), while algorithm 2 improves the average-case performance to (probably) O(log n) by skipping (not examining) elements whenever it can. The divide-and-conquer approach can only outperform algorithm 2 if, in the average case, it is somehow able to skip more elements than algorithm 2 can skip. Let’s assume we divide the problem by splitting the array into two (nearly) equal contiguous halves , recursively, and decide if, with the resulting sub-problems, we are likely to be able to skip more elements than algorithm 2 could skip, especially in algorithm 2’s worst case. For the remainder of this discussion, let’s assume an input that would be worst-case for algorithm 2. After the first split, we can check both halves’ top & bottom elements for the same extreme case that results in O(1) performance for algorithm2, yet results in O(n) performance with both halves combined. This would be the case if all elements in the bottom half are less than 0 and all elements in the upper half are greater than n-1. In these cases, we can immediately exclude the bottom and/or top half with O(1) performance for any half we can exclude. Of course the performance of any half that cannot be excluded by that test remains to be determined after recursing further, dividing that half by half again until we find any segment whose top or bottom element contains its index value. That’s a reasonably nice performance improvement over algorithm 2, but it occurs in only certain special cases of algorithm 2’s worst case. All we’ve done with divide-and-conquer is decrease (slightly) the proportion of the problem space that evokes worst-case behavior. There are still worst-case scenarios for divide-and-conquer, and they exactly match most of the problem space that evokes worst-case behavior for algorithm 2.
So, given that the divide-and-conquer algorithm has less worst-case scenarios, doesn’t it make sense to go ahead and use a divide-and-conquer approach?
In a word, no. Well, maybe. If you know up front that about half of your data is less than 0 and half is greater than n, this special case would generally fare better with the divide-and-conquer approach. Or, if your system is multicore and your ‘n’ is large, it might be helpful to split the problem evenly between all of your cores, but once it’s split between them, I maintain that the sub-problems on each core are probably best solved with algorithm 2 above, avoiding further division of the problem and certainly avoiding recursion, as I argue below....
At each recursion level of a recursive divide-and-conquer approach, the algorithm needs some way to remember the as-yet-unsolved 2nd half of the problem while it recurses into the 1st half. Often this is done by having the algorithm recursively call itself first for one half and then for the other, a design which maintains this information implicitly on the runtime stack. Another implementation might avoid recursive function calls by maintaining essentially this same information on an explicit stack. In terms of space growth, algorithm 2 is O(1), but any recursive implementation is unavoidably O(log n) due to having to maintain this information on some sort of stack. But aside from the space issue, a recursive implementation has extra runtime overhead of remembering the state of as-yet-unrecursed-into subproblem halves until such time as they can be recursed into. This runtime overhead is not free, and given the simplicity of algorithm 2’s implementation above, I posit that such overhead is proportionally significant. Therefore I suggest that algorithm 2 above will roundly spank any recursive implementation for the vast majority of cases.
In the worst case, you can't do any better than checking every element. (Imagine something like a[i] = i + uniform_random(-.25, .25).) You'll need some information on what your input looks like.
Actually I would start from the last element, and do a basic check (for example, if you have 1000 elements, but highest is 100, you know you need only check 0..100). In a worst case scenario you still need to check every element, but it should be faster to find the areas where it may be possible. If it is as stated above (a[i] = i + [-0.25..0.25]), you are f($!ed and need to search every single element.
For a sorted array, you can perform an interpolation search. Similiar to a binary search, but assuming an even distribution of values, can be faster.
I think the main problem here is your conflicting statements:
a[i] == i
All the elements in the array are sorted and distinct , they need not be integer always.
If the array's value is equal to its accessing subscript that means it's an integer. If it's not an integer, and they're say.. char, what is considered "sorted"? ASCII value ( A < B < C)?
If it were an array of chars would we consider:
a[i] == i
to be true if
i == 6510 && a[i] == 'A'
If I were in this interview I would be grilling the interviewer with follow up questions before answering. That said...
If all we know is what you stated, we can safely say that we can find the value in O(n) because that is the time to make one full pass of the array. With more details we can probably limit this to O(log(n)) with a binary search of the array.
Noticed that all the elements in the array are sorted and distinct, so if we construct a new array b with b[i]=a[i]-i, elements in array b is also sorted, what we need to find is to find zeros in array b. I think binary search can solve the problem! Here is a link for count the number of occurrences in a sorted array. You can also do the similar Divide & Conquer technique on the original array without construct a auxiliary array! The time complexity is O(Logn)!
Take this as an example:
a=[0,1,2,4,8]
b=[0,0,0,1,4]
What we need to find is exactly index 0,1,2
Hope it helps!

Search Algorithm to find the k lowest values in a list

I have a list that contains n double values and I need to find the k lowest double values in that list
k is much smaller than n
the initial list with the n double values is randomly ordered
the found k lowest double values are not required to be sorted
What algorithm would you recommend?
At the moment I use Quicksort to sort the whole list, and then I take the first k elements out of the sorted list. I expect there should be a much faster algorithm.
Thank you for your help!!!
You could model your solution to match the nlargest() code in Python's standard library.
Heapify the first k values on a maxheap.
Iterate over the remaining n - k values.
Compare each to the element of the top of the heap.
If the new value is lower, do a heapreplace operation (which replaces the topmost heap element with the new value and then sifts it downward).
The algorithm can be surprisingly efficient. For example, when n=100,000 and k=100, the number of comparisons is typically around 106,000 for randomly arranged inputs. This is only slightly more than 100,000 comparisons to find a single minimum value. And, it does about twenty times fewer comparisons than a full quicksort on the whole dataset.
The relative strength of various algorithms is studied and summarized at: http://code.activestate.com/recipes/577573-compare-algorithms-for-heapqsmallest
You can use selection algorithm to find the kth lowest element and then iterate and return it and all elements that are lower then it. More work has to be done if the list can contain duplicates (making sure you don't end up with more elements that you need).
This solution is O(n).
Selection algorithm is implemented in C++ as nth_element()
Another alternative is to use a max heap of size k, and iterate the elements while maintaining the heap to hold all k smallest elements.
for each element x:
if (heap.size() < k):
heap.add(x)
else if x < heap.max():
heap.pop()
heap.add(x)
When you are done - the heap contains k smallest elements.
This solution is O(nlogk)
Take a look at partial_sort algorithm from C++ standard library.
You can use std::nth_element. This is O(N) complexity because it doesn't sort the elements, it just arranges them such that every element under a certain N is less than N.
you can use selection sort, it takes O(n) to select first lowest value. Once we have set this lowest value on position 1 we can rescan the data set to find out second lowest value. and can do it until we have kth lowest value. in this way if k is enough smaller then n then we will have complexity kn which is equivalent to O(n)...

Fast Algorithm for finding largest values in 2d array

I have a 2D array (an image actually) that is size N x N. I need to find the indices of the M largest values in the array ( M << N x N) . Linearized index or the 2D coords are both fine. The array must remain intact (since it's an image). I can make a copy for scratch, but sorting the array will bugger up the indices.
I'm fine with doing a full pass over the array (ie. O(N^2) is fine). Anyone have a good algorithm for doing this as efficiently as possible?
Selection is sorting's austere sister (repeat this ten times in a row). Selection algorithms are less known than sort algorithms, but nonetheless useful.
You can't do better than O(N^2) (in N) here, since nothing indicates that you must not visit each element of the array.
A good approach is to keep a priority queue made of the M largest elements. This makes something O(N x N x log M).
You traverse the array, enqueuing pairs (elements, index) as you go. The queue keeps its elements sorted by first component.
Once the queue has M elements, instead of enqueuing you now:
Query the min element of the queue
If the current element of the array is greater, insert it into the queue and discard the min element of the queue
Else do nothing.
If M is bigger, sorting the array is preferable.
NOTE: #Andy Finkenstadt makes a good point (in the comments to your question) : you definitely should traverse your array in the "direction of data locality": make sure that you read memory contiguously.
Also, this is trivially parallelizable, the only non parallelizable part is when you merge the queues when joining the sub processes.
You could copy the array into a single dimensioned array of tuples (value, original X, original Y ) and build a basic heap out of it in (O(n) time), provided you implement the heap as an array.
You could then retrieve the M largest tuples in O(M lg n) time and reference their original x and y from the tuple.
If you are going to make a copy of the input array in order to do a sort, that's way worse than just walking linearly through the whole thing to pick out numbers.
So the question is how big is your M? If it is small, you can store results (i.e. structs with 2D indexes and values) in a simple array or a vector. That'll minimize heap operations but when you find a larger value than what's in your vector, you'll have to shift things around.
If you expect M to get really large, then you may need a better data structure like a binary tree (std::set) or use sorted std::deque. std::set will reduce number of times elements must be shifted in memory, while if you use std::deque, it'll do some shifting, but it'll reduce number of times you have to go to the heap significantly, which may give you better performance.
Your problem doesn't use the 2 dimensions in any interesting way, it is easier to consiger the equivalent problem in a 2d array.
There are 2 main ways to solve this problem:
Mantain a set of M largest elements, and iterate through the array. (Using a heap allows you to do this efficiently).
This is simple and is probably better in your case (M << N)
Use selection, (the following algorithm is an adaptation of quicksort):
Create an auxiliary array, containing the indexes [1..N].
Choose an arbritary index (and corresponding value), and partition the index array so that indexes corresponding to elements less go to the left, and bigger elements go to the right.
Repeat the process, binary search style until you narrow down the M largest elements.
This is good for cases with large M. If you want to avoid worst case issues (the same quicksort has) then look at more advanced algorithms, (like median of medians selection)
How many times do you search for the largest value from the array?
If you only search 1 time, then just scan through it keeping the M largest ones.
If you do it many times, just insert the values into a sorted list (probably best implemented as a balanced tree).