What algorithm used to find the nth sorted subarray of an unordered array? - c++

I had this question recently in an interview and I failed, and now search for the answer.
Let's say I have a big array of n integers, all differents.
If this array was ordered, I could subdivide it in x smaller
arrays, all of size y, except maybe the last one, which could be less.
I could then extract the nth subarray and return it, already sorted.
Example : Array 4 2 5 1 6 3. If y=2 and I want the 2nd array, it would be 3 4.
Now what I did is simply sort the array and return the nth subarray, which takes O(n log n). But it was said to me that there exists a way to do it in O(n + y log y). I searched on internet and didn't find anything. Ideas ?

The algorithm you are looking for is Selection Algorithm, which lets you find k-th order statistics in linear time. The algorithm is quite complex, but the standard C++ library conveniently provides an implementation of it.
The algorithm for finding k-th sorted interval that the interviewers had in mind went like this:
Find b=(k-1)*y-th order statistics in O(N)
Find e=k*y-th order statistics in O(N)
There will be y numbers between b and e. Store them in a separate array of size y. This operation takes O(N)
Sort the array of size y for O(y * log2y) cost.
The overall cost is O(N+N+N+y * log2y), i.e. O(N+y * log2y)

You can combine std::nth_element and std::sort for this:
std::vector<int> vec = muchData();
// Fix those bound iterators as needed
auto lower = vec.begin() + k*y;
auto upper = lower + y;
// put right element at lower and partition vector by it
std::nth_element(vec.begin(), lower, vec.end());
// Same for upper, but don't mess up lower
std::nth_element(lower + 1, upper - 1, vec.end());
// Now sort the subarray
std::sort(lower, upper);
[lower, upper) is now the k-th sorted subarray of length y, with the desired complexity on average.
To be checked for special cases like y = 1 before real world use, but this is the general idea.

Related

Moving window RMQ performance improvement

Say I have an array of integers A of length N, also I have an integer L <= N.
What I am trying to find is the minimum of the range [0, L-1], [1,L], [2,L+1]....[N-L,N-1]
(like a moving window of length L from left to right)
My algorithm now is O(N lg N) with O(N lg N) preprocess:
Save all numbers A[0...L-1] in a multi-set S, also store the number in a queue Q in order. The minimum of [0, L-1] is simply the first element of S. O(N lg N)
Pop out the first element of Q, find this element in S and delete it. Then push A[L] in S. The minimum of [1, L] is simply the first element of S. O(lg N)
Repeat step 2 for all possible range, move to next element each iteration. O(N)
Total is O(N lg N).
I wonder if there is any algorithm which can achieve better than this with following requirements:
Preprocess time (If needed) is O(N)
Query time if O(1)
I have done some research on RMQ, the nearest method I found is using sparse table which achieve O(1) query time but O(N lg N) preprocess time. Another method which reduce RMQ to LCA problem can meet the requirements but it needs some restriction on the array A.
So is it possible that, with no restriction on A, the requirements can be fulfilled when solving my problem?
Yes, use a deque. We will keep the elements sorted ascendingly, so the first element is always the minimum in [i - L + 1, i], for the current position i. We won't keep actual elements, but their positions.
d = empty deque
for i = 0 to n-1:
// get rid of too old elements
while !d.empty && i - d.front + 1 > L:
d.pop_front()
// keep the deque sorted
while !d.empty && A[d.back] > A[i]
d.pop_back()
d.push_back(i)
// A[d.front] is the minimum in `[i - L + 1, i]
Since every element enters and leaves the deque at most once, this is O(n).

Big 0 notation for duplicate function, C++

What is the Big 0 notation for the function description in the screenshot.
It would take O(n) to go through all the numbers but once it finds the numbers and removes them what would that be? Would the removed parts be a constant A? and then would the function have to iterate through the numbers again?
This is what I am thinking for Big O
T(n) = n + a + (n-a) or something involving having to iterate through (n-a) number of steps after the first duplicate is found, then would big O be O(n)?
Big O notation is considering the worst case. Let's say we need to remove all duplicates from the array A=[1..n]. The algorithm will start with the first element and check every remaining element - there are n-1 of them. Since all values happen to be different it won't remove any from the array.
Next, the algorithm selects the second element and checks the remaining n-2 elements in the array. And so on.
When the algorithm arrives at the final element it is done. The total number of comparisions is the sum of (n-1) + (n-2) + ... + 2 + 1 + 0. Through the power of maths, this sum becomes (n-1)*n/2 and the dominating term is n^2 so the algorithm is O(n^2).
This algorithm is O(n^2). Because for each element in the array you are iterating over the array and counting the occurrences of that element.
foreach item in array
count = 0
foreach other in array
if item == other
count += 1
if count > 1
remove item
As you see there are two nested loops in this algorithm which results in O(n*n).
Removed items doesn't affect the worst case. Consider an array containing unique elements. No elements is being removed in this array.
Note: A naive implementation of this algorithm could result in O(n^3) complexity.
You started with first element you will go through all elements in the vector thats n-1 you will do that for n time its (n * n-1)/2 for worst case n time is the best case (all elements are 4)

m smallest values of vector with size n (c++11)

I need the average of the nClose smallest values (except the first zero) in a vector with n elements where we know that nClose + 1 < n, there are only non-negative numbers, and the vector contains at least one zero value. Furthermore, nClose will be a lot smaller than n, say that nClose will be around 10 and n will be around 500.
Normally I will use min_element to find the minimum, however this is useless here since I need several values. At the moment I use the following code
sort(diff.begin(), diff.end());
double sum = accumulate(diff.begin() + 1, diff.begin() + 1 + nClose, 0);
double avg = sum / nClose;
Due to the sort it runs in O(n log n) where we can do it in O(nClose*n) by just find the minimum and remove it, then repeat this for nClose times. Know one of you how to accomplish this with the algorithms of c++11?
You can use std::nth_element for that.
nth_element(diff.begin(),diff.begin()+nClose+1, diff.end());
double sum = accumulate(diff.begin(), diff.begin() + 1 + nClose, 0);
double avg = sum / nClose;
Regarding your remark about finding the minimum and removing it: This would probably be even less efficient than your current solution, as removing the nth element requires all elements after the nth position to be moved one position to the left, effectively turning your algorithm into something like O(nClose*n^2).
Also, while this should be a pretty efficient solution, I'd warn you against putting too much weight on algorithmic complexity, as the constants may actually play a much bigger role than any advantage in Big O notation.

Split Array into two sets?

I have an array W of 0..N-1
I need to split them into two sets: Say K and N-K elements.
But the condition is: sum(N-K) - sum(K) should be maximum.
How do I approach this?
I tried doing this:
Sort the array - std::sort(W,W+N), and then:
for(int i=0; i<K; ++i) less+=W[i];
for(int i=K; i<N; ++i) more+=W[i];
And then more-less
But I don't think this is the optimum way, or it may even be wrong for some of the cases.
Thanks.
UPDATE:
We have to choose K elements from W such that difference betweensum(k elements) and sum(remaining elements) is maximum.
Edit: Note that in your posted question, you seem to be expecting sort to sort from high-to-low. Both std::sort and std::nth_element put the low elements first. I have replaced K with (N-K) in the answer below to correct that.
Edit after UPDATE: Do the below twice, once for K and once for (N-K). Choose the optimal answer.
More optimal than std::sort would be std::nth_element for your purposes.
std::nth_element( W, W+(N-K), W+N );
Your use of std::sort will use O(n log n) complexity to order all the elements within both your sets, which you don't need.
std::nth_element will use O(n) complexity to partition without completely sorting.
Note: your for loops may also be replaced with std::accumulate
less = std::accumulate( W, W+(N-K), 0 );
more = std::accumulate( W+(N-K), W+N, 0 );
You are to split the set of elements into two distinctive nonoverlapping subsets A and B. You want the sum(A)-sum(B) be as high as possible.
Therefore, you want the sum(A) be as high as possible and sum(B) as low as possible.
Therefore, the set 'A' should contain as high elements as possible
and the set 'B' should contain as low elements as possible
By sorting the input set by element's value, and by assigning 'lowest elements' to B and 'highest elements' to A, you are guaranteed that the sum(A)-sum(B) will be max possible.
I do not see any cases where your approach would be wrong.
As to the 'being optimal' things, I did not analyze it at all. Drew's note seems quite probable.
It can be done using max heap. O(n + n log k) time
Make a max heap of size k. We have find the lowest k elements of the array. The root of heap will be the highest element in the heap. Make a heap of first k elements.
Now iterate through the array. Compare the array element with the root of max heap. If it is smaller than root then replace it and heapify the heap again. This will take O(n log k) time.
Find the sum of elements of heap.
Now you can find the sum of rest of the elements of array and get the difference. (O(n)) time
Total time O(n + n log k)
EDIT: Perhaps you can find the sum of all elements of array while traversing it for heap. This will save O(n) time and it can be solved in O(n log k)

Find pair of elements in integer array such that abs(v[i]-v[j]) is minimized

Lets say we have int array with 5 elements: 1, 2, 3, 4, 5
What I need to do is to find minimum abs value of array's elements' subtraction:
We need to check like that
1-2 2-3 3-4 4-5
1-3 2-4 3-5
1-4 2-5
1-5
And find minimum abs value of these subtractions. We can find it with 2 fors. The question is, is there any algorithm for finding value with one and only for?
sort the list and subtract nearest two elements
The provably best performing solution is assymptotically linear O(n) up until constant factors.
This means that the time taken is proportional to the number of the elements in the array (which of course is the best we can do as we at least have to read every element of the array, which already takes O(n) time).
Here is one such O(n) solution (which also uses O(1) space if the list can be modified in-place):
int mindiff(const vector<int>& v)
{
IntRadixSort(v.begin(), v.end());
int best = MAX_INT;
for (int i = 0; i < v.size()-1; i++)
{
int diff = abs(v[i]-v[i+1]);
if (diff < best)
best = diff;
}
return best;
}
IntRadixSort is a linear time fixed-width integer sorting algorithm defined here:
http://en.wikipedia.org/wiki/Radix_sort
The concept is that you leverage the fixed-bitwidth nature of ints by paritioning them in a series of fixed passes on the bit positions. ie partition them on the hi bit (32nd), then on the next highest (31st), then on the next (30th), and so on - which only takes linear time.
The problem is equivalent to sorting. Any sorting algorithm could be used, and at the end, return the difference between the nearest elements. A final pass over the data could be used to find that difference, or it could be maintained during the sort. Before the data is sorted the min difference between adjacent elements will be an upper bound.
So to do it without two loops, use a sorting algorithm that does not have two loops. In a way it feels like semantics, but recursive sorting algorithms will do it with only one loop. If this issue is the n(n+1)/2 subtractions required by the simple two loop case, you can use an O(n log n) algorithm.
No, unless you know the list is sorted, you need two
Its simple Iterate in a for loop
keep 2 variable "minpos and maxpos " and " minneg" and "maxneg"
check for the sign of the value you encounter and store maximum positive in maxpos
and minimum +ve number in "minpos" do the same by checking in if case for number
less than zero. Now take the difference of maxpos-minpos in one variable and
maxneg and minneg in one variable and print the larger of the two . You will get
desired.
I believe you definitely know how to find max and min in one for loop
correction :- The above one is to find max difference in case of minimum you need to
take max and second max instead of max and min :)
This might be help you:
end=4;
subtractmin;
m=0;
for(i=1;i<end;i++){
if(abs(a[m]-a[i+m])<subtractmin)
subtractmin=abs(a[m]-a[i+m];}
if(m<4){
m=m+1
end=end-1;
i=m+2;
}}