O(n) algorithm to find out the element appearing more than n/2 times - c++

I was asked in an interview to give an O(n) algorithm to print an element that appears more than n/2 times in an array, if there is such an element. n is the size of the array.
I don't have any clue on how to do this. Can anyone help?

It's the Boyer's Voting algorithm.
It's also O(1) in space!.
Edit
For those complaining about the site color scheme (like me) ... here is the original paper.

In psuedocode:
int n = array.length
Hash<elementType,int> hash
foreach element in array
hash[element] += 1
foreach entry in hash
if entry.value > n/2
print entry.key
break

It's also the median value, which takes O(n) to find using the median-of-medians algorithm. In C++ you can do this in one line:
std::nth_element(begin, begin + n/2, begin + n)

Related

Ternary Search to find point in in array where the difference is minimum

Let A an array of n positive integers.
How can I find some index k of A such that:
left = A[0] + A[1] + ... + A[k]
right = A[k+1] + A[k+2] + ... + A[n]
have the minimum absolute difference (that is, abs(left - right) is minimum) ?
As the absolute function of this difference is parabolic (decreases until the minimum difference and then increases, like an U ), I heard that Ternary Search is used to find values in functions like this (parabolic), but I don't know how to implement it, since I've searched over the internet and didn't find uses of Ternary Search over parabolic functions.
EDIT: suppose I have all intervals sum in O(1), and I need something faster than O(n) otherwise I wouldn't need Ternary Search..
Let left(k) represent the sum of the values in the array, from A[0] through A[k]. It is trivial to prove that:
left(k+1)=left(k)+A[k+1]
That it, if you already computed your left for the given k, then left for k+1 is computed by adding the next element to left.
In other words:
If you iterate over the array, from element #0 to element #n-1 (where n is the size of the array), you can compute the running total for left simply by adding the next element in the array to left.
This might seem to be obvious and self-evident, but it helps to state this formally in order for the next step in the process to become equally obvious.
In the same fashion, given right(k) representing the sum of the values in the array starting with element #k, until the last element in the array, you can also prove the following:
right(k+1)=right(k)-A[k]
So, you can find the k with the minimum difference between left(k) and right(k+1) (I'm using a slightly different notation than your question uses, because my notation is more convenient) by starting with the sum total of all values in the array as right(0) and A[0] as left(0), then computing right(1), then, proceed to iterate from the beginning to the array to the end, calculating both left and right on each step, on the fly, and computing the difference between the left and the right values. Finding where the difference is the minimum becomes trivial.
I can't think of any other way to do this, in less than O(n):
1) Computing the sum total of all values in the array, for the initial value of right(0) is O(n).
2) The iteration over the right is, of course, O(n).
I don't believe that a logarithmic binary search will work here, since the values abs(left(k)-right(k)) themselves are not going to be in sorted order.
Incidentally, with this approach, you can also find the minimum difference when the array contains negative values too. The only difference is that since the difference is no longer parabolic, you simply have to iterate over the entire array, and just keep track of where abs(left-right) is the smallest.
Trivial approach:
Compute all the sums A[0] + A[1] + ... + A[k] and A[k+1] + A[k+2] + ... + A[n] for any k<=n.
Search for the k minimising abs(left - right) for any k<=n
O(n) in space and time.
Edit:
Computing all the sums can be done in O(n) with an incremental approach.

What algorithm used to find the nth sorted subarray of an unordered array?

I had this question recently in an interview and I failed, and now search for the answer.
Let's say I have a big array of n integers, all differents.
If this array was ordered, I could subdivide it in x smaller
arrays, all of size y, except maybe the last one, which could be less.
I could then extract the nth subarray and return it, already sorted.
Example : Array 4 2 5 1 6 3. If y=2 and I want the 2nd array, it would be 3 4.
Now what I did is simply sort the array and return the nth subarray, which takes O(n log n). But it was said to me that there exists a way to do it in O(n + y log y). I searched on internet and didn't find anything. Ideas ?
The algorithm you are looking for is Selection Algorithm, which lets you find k-th order statistics in linear time. The algorithm is quite complex, but the standard C++ library conveniently provides an implementation of it.
The algorithm for finding k-th sorted interval that the interviewers had in mind went like this:
Find b=(k-1)*y-th order statistics in O(N)
Find e=k*y-th order statistics in O(N)
There will be y numbers between b and e. Store them in a separate array of size y. This operation takes O(N)
Sort the array of size y for O(y * log2y) cost.
The overall cost is O(N+N+N+y * log2y), i.e. O(N+y * log2y)
You can combine std::nth_element and std::sort for this:
std::vector<int> vec = muchData();
// Fix those bound iterators as needed
auto lower = vec.begin() + k*y;
auto upper = lower + y;
// put right element at lower and partition vector by it
std::nth_element(vec.begin(), lower, vec.end());
// Same for upper, but don't mess up lower
std::nth_element(lower + 1, upper - 1, vec.end());
// Now sort the subarray
std::sort(lower, upper);
[lower, upper) is now the k-th sorted subarray of length y, with the desired complexity on average.
To be checked for special cases like y = 1 before real world use, but this is the general idea.

Big 0 notation for duplicate function, C++

What is the Big 0 notation for the function description in the screenshot.
It would take O(n) to go through all the numbers but once it finds the numbers and removes them what would that be? Would the removed parts be a constant A? and then would the function have to iterate through the numbers again?
This is what I am thinking for Big O
T(n) = n + a + (n-a) or something involving having to iterate through (n-a) number of steps after the first duplicate is found, then would big O be O(n)?
Big O notation is considering the worst case. Let's say we need to remove all duplicates from the array A=[1..n]. The algorithm will start with the first element and check every remaining element - there are n-1 of them. Since all values happen to be different it won't remove any from the array.
Next, the algorithm selects the second element and checks the remaining n-2 elements in the array. And so on.
When the algorithm arrives at the final element it is done. The total number of comparisions is the sum of (n-1) + (n-2) + ... + 2 + 1 + 0. Through the power of maths, this sum becomes (n-1)*n/2 and the dominating term is n^2 so the algorithm is O(n^2).
This algorithm is O(n^2). Because for each element in the array you are iterating over the array and counting the occurrences of that element.
foreach item in array
count = 0
foreach other in array
if item == other
count += 1
if count > 1
remove item
As you see there are two nested loops in this algorithm which results in O(n*n).
Removed items doesn't affect the worst case. Consider an array containing unique elements. No elements is being removed in this array.
Note: A naive implementation of this algorithm could result in O(n^3) complexity.
You started with first element you will go through all elements in the vector thats n-1 you will do that for n time its (n * n-1)/2 for worst case n time is the best case (all elements are 4)

Split Array into two sets?

I have an array W of 0..N-1
I need to split them into two sets: Say K and N-K elements.
But the condition is: sum(N-K) - sum(K) should be maximum.
How do I approach this?
I tried doing this:
Sort the array - std::sort(W,W+N), and then:
for(int i=0; i<K; ++i) less+=W[i];
for(int i=K; i<N; ++i) more+=W[i];
And then more-less
But I don't think this is the optimum way, or it may even be wrong for some of the cases.
Thanks.
UPDATE:
We have to choose K elements from W such that difference betweensum(k elements) and sum(remaining elements) is maximum.
Edit: Note that in your posted question, you seem to be expecting sort to sort from high-to-low. Both std::sort and std::nth_element put the low elements first. I have replaced K with (N-K) in the answer below to correct that.
Edit after UPDATE: Do the below twice, once for K and once for (N-K). Choose the optimal answer.
More optimal than std::sort would be std::nth_element for your purposes.
std::nth_element( W, W+(N-K), W+N );
Your use of std::sort will use O(n log n) complexity to order all the elements within both your sets, which you don't need.
std::nth_element will use O(n) complexity to partition without completely sorting.
Note: your for loops may also be replaced with std::accumulate
less = std::accumulate( W, W+(N-K), 0 );
more = std::accumulate( W+(N-K), W+N, 0 );
You are to split the set of elements into two distinctive nonoverlapping subsets A and B. You want the sum(A)-sum(B) be as high as possible.
Therefore, you want the sum(A) be as high as possible and sum(B) as low as possible.
Therefore, the set 'A' should contain as high elements as possible
and the set 'B' should contain as low elements as possible
By sorting the input set by element's value, and by assigning 'lowest elements' to B and 'highest elements' to A, you are guaranteed that the sum(A)-sum(B) will be max possible.
I do not see any cases where your approach would be wrong.
As to the 'being optimal' things, I did not analyze it at all. Drew's note seems quite probable.
It can be done using max heap. O(n + n log k) time
Make a max heap of size k. We have find the lowest k elements of the array. The root of heap will be the highest element in the heap. Make a heap of first k elements.
Now iterate through the array. Compare the array element with the root of max heap. If it is smaller than root then replace it and heapify the heap again. This will take O(n log k) time.
Find the sum of elements of heap.
Now you can find the sum of rest of the elements of array and get the difference. (O(n)) time
Total time O(n + n log k)
EDIT: Perhaps you can find the sum of all elements of array while traversing it for heap. This will save O(n) time and it can be solved in O(n log k)

Sort an array with recursion

To start off this is a homework assignment and I am just looking for some pointers on using recursion.
I have an array of psuedo random integers of size n. I need to sort the array from lowest highest. Below is the recursive sort function that I have created but I know that I am missing a piece but I am not sure what.
template <typename T>
void sort_array_recur(T* random_array,T n)
{
//stop case
if(n = 1 )
{
if(random_array[n] < random_array[ n + 1 ])
{
T temp = random_array[n + 1];
random_array[n] == random_array[n + 1];
random_array[n + 1] == temp;
}
}
else
{
sort_array_recur(random_array, (n - 1));
}
}
I think what I am missing is some sort of insert function that also needs to be called recursively. I have also searched around and nothing seems particular to my situation (or at least I couldn't understand it as such). Thank you for your time in advance.
EDIT:
I guess I forgot to mention the spec says "sort the first n-1 elements of an n-element array. Then place the nth element in its proper position within the n-1 sorted elements". I guess I am not understanding how to sort the first the first n-1 elements of an array?
You are asked to use recursion. Your problem sorts a size n array. The first step is sorting n-1 elements of that array.
Consider m = n-1. Can you apply your problem to a size m array? i.e. sort the first m-1 elements and then place the m'th element in its correct position?
Consider k = m-1. Can you do the same with a size k array?
Do you see how you can use recursion with this problem?
Also consider how you will end the recursion; what will you do with a size 1 array?