Moving window RMQ performance improvement

Moving window RMQ performance improvement - c++

Say I have an array of integers A of length N, also I have an integer L <= N.
What I am trying to find is the minimum of the range [0, L-1], [1,L], [2,L+1]....[N-L,N-1]
(like a moving window of length L from left to right)
My algorithm now is O(N lg N) with O(N lg N) preprocess:
Save all numbers A[0...L-1] in a multi-set S, also store the number in a queue Q in order. The minimum of [0, L-1] is simply the first element of S. O(N lg N)
Pop out the first element of Q, find this element in S and delete it. Then push A[L] in S. The minimum of [1, L] is simply the first element of S. O(lg N)
Repeat step 2 for all possible range, move to next element each iteration. O(N)
Total is O(N lg N).
I wonder if there is any algorithm which can achieve better than this with following requirements:
Preprocess time (If needed) is O(N)
Query time if O(1)
I have done some research on RMQ, the nearest method I found is using sparse table which achieve O(1) query time but O(N lg N) preprocess time. Another method which reduce RMQ to LCA problem can meet the requirements but it needs some restriction on the array A.
So is it possible that, with no restriction on A, the requirements can be fulfilled when solving my problem?

Yes, use a deque. We will keep the elements sorted ascendingly, so the first element is always the minimum in [i - L + 1, i], for the current position i. We won't keep actual elements, but their positions.
d = empty deque
for i = 0 to n-1:
// get rid of too old elements
while !d.empty && i - d.front + 1 > L:
d.pop_front()
// keep the deque sorted
while !d.empty && A[d.back] > A[i]
d.pop_back()
d.push_back(i)
// A[d.front] is the minimum in `[i - L + 1, i]
Since every element enters and leaves the deque at most once, this is O(n).

Related

Time complexity in recursive function in which recursion reduces size

I have to estimate time complexity of Solve():
// Those methods and list<Element> Elements belongs to Solver class
void Solver::Solve()
{
while(List is not empty)
Recursive();
}
void Solver::Recursive(some parameters)
{
Element WhatCanISolve = WhatCanISolve(some parameters); //O(n) in List size. When called directly from Solve() - will always return valid element. When called by recursion - it may or may not return element
if(WhatCanISolve == null)
return;
//We reduce GLOBAL problem size by one.
List.remove(Element); //This is list, and Element is pointed by iterator, so O(1)
//Some simple O(1) operations
//Now we call recursive function twice.
Recursive(some other parameters 1);
Recursive(some other parameters 2);
}
//This function performs search with given parameters
Element Solver::WhatCanISolve(some parameters)
{
//Iterates through whole List, so O(n) in List size
//Returns first element matching parameters
//Returns single Element or null
}
My first thought was that it should be somwhere around O(n^2).
Then I thought of
T(n) = n + 2T(n-1)
which (according to wolframalpha) expands to:
O(2^n)
However i think that the second idea is false, since n is reduced between recursive calls.
I also did some benchmarking with large sets. Here are the results:
N t(N) in ms
10000 480
20000 1884
30000 4500
40000 8870
50000 15000
60000 27000
70000 44000
80000 81285
90000 128000
100000 204380
150000 754390

Your algorithm is still O(2n), even though it reduces the problem size by one item each time. Your difference equation
T(n) = n + 2T(n-1)
does not account for the removal of an item at each step. But it only removes one item, so the equation should be T(n) = n + 2T(n-1) - 1. Following your example and
Saving the algebra by using WolframAlpha to solve this gives the solution T(n) = (c1 + 4) 2n-1 - n - 2 which is still O(2n). It removes one item, which is not a considerable amount given the other factors (especially the recursion).
A similar example that comes to mind is an n*n 2D matrix. Suppose you're only using it for a triangular matrix. Even though you remove one row to process for each column, iterating through every element still has complexity O(n2), which is the same as if all elements were used (i.e. a square matrix).
For further evidence, I present a plot of your own collected running time data:

Presumably the time is quadratic. If WhatCanISolve returns nullptr, iff the list is empty, then all calls
Recursive(some other parameters 2);
will finish in O(1), because they are run with an empty list. This means, the correct formula is actually
T(n) = C*n + T(n-1)
This means, T(n)=O(n^2), which corresponds well to what we see on the plot.

What algorithm used to find the nth sorted subarray of an unordered array?

I had this question recently in an interview and I failed, and now search for the answer.
Let's say I have a big array of n integers, all differents.
If this array was ordered, I could subdivide it in x smaller
arrays, all of size y, except maybe the last one, which could be less.
I could then extract the nth subarray and return it, already sorted.
Example : Array 4 2 5 1 6 3. If y=2 and I want the 2nd array, it would be 3 4.
Now what I did is simply sort the array and return the nth subarray, which takes O(n log n). But it was said to me that there exists a way to do it in O(n + y log y). I searched on internet and didn't find anything. Ideas ?

The algorithm you are looking for is Selection Algorithm, which lets you find k-th order statistics in linear time. The algorithm is quite complex, but the standard C++ library conveniently provides an implementation of it.
The algorithm for finding k-th sorted interval that the interviewers had in mind went like this:
Find b=(k-1)*y-th order statistics in O(N)
Find e=k*y-th order statistics in O(N)
There will be y numbers between b and e. Store them in a separate array of size y. This operation takes O(N)
Sort the array of size y for O(y * log2y) cost.
The overall cost is O(N+N+N+y * log2y), i.e. O(N+y * log2y)

You can combine std::nth_element and std::sort for this:
std::vector<int> vec = muchData();
// Fix those bound iterators as needed
auto lower = vec.begin() + k*y;
auto upper = lower + y;
// put right element at lower and partition vector by it
std::nth_element(vec.begin(), lower, vec.end());
// Same for upper, but don't mess up lower
std::nth_element(lower + 1, upper - 1, vec.end());
// Now sort the subarray
std::sort(lower, upper);
[lower, upper) is now the k-th sorted subarray of length y, with the desired complexity on average.
To be checked for special cases like y = 1 before real world use, but this is the general idea.

Big 0 notation for duplicate function, C++

What is the Big 0 notation for the function description in the screenshot.
It would take O(n) to go through all the numbers but once it finds the numbers and removes them what would that be? Would the removed parts be a constant A? and then would the function have to iterate through the numbers again?
This is what I am thinking for Big O
T(n) = n + a + (n-a) or something involving having to iterate through (n-a) number of steps after the first duplicate is found, then would big O be O(n)?

Big O notation is considering the worst case. Let's say we need to remove all duplicates from the array A=[1..n]. The algorithm will start with the first element and check every remaining element - there are n-1 of them. Since all values happen to be different it won't remove any from the array.
Next, the algorithm selects the second element and checks the remaining n-2 elements in the array. And so on.
When the algorithm arrives at the final element it is done. The total number of comparisions is the sum of (n-1) + (n-2) + ... + 2 + 1 + 0. Through the power of maths, this sum becomes (n-1)*n/2 and the dominating term is n^2 so the algorithm is O(n^2).

This algorithm is O(n^2). Because for each element in the array you are iterating over the array and counting the occurrences of that element.
foreach item in array
count = 0
foreach other in array
if item == other
count += 1
if count > 1
remove item
As you see there are two nested loops in this algorithm which results in O(n*n).
Removed items doesn't affect the worst case. Consider an array containing unique elements. No elements is being removed in this array.
Note: A naive implementation of this algorithm could result in O(n^3) complexity.

You started with first element you will go through all elements in the vector thats n-1 you will do that for n time its (n * n-1)/2 for worst case n time is the best case (all elements are 4)

Split Array into two sets?

I have an array W of 0..N-1
I need to split them into two sets: Say K and N-K elements.
But the condition is: sum(N-K) - sum(K) should be maximum.
How do I approach this?
I tried doing this:
Sort the array - std::sort(W,W+N), and then:
for(int i=0; i<K; ++i) less+=W[i];
for(int i=K; i<N; ++i) more+=W[i];
And then more-less
But I don't think this is the optimum way, or it may even be wrong for some of the cases.
Thanks.
UPDATE:
We have to choose K elements from W such that difference betweensum(k elements) and sum(remaining elements) is maximum.

Edit: Note that in your posted question, you seem to be expecting sort to sort from high-to-low. Both std::sort and std::nth_element put the low elements first. I have replaced K with (N-K) in the answer below to correct that.
Edit after UPDATE: Do the below twice, once for K and once for (N-K). Choose the optimal answer.
More optimal than std::sort would be std::nth_element for your purposes.
std::nth_element( W, W+(N-K), W+N );
Your use of std::sort will use O(n log n) complexity to order all the elements within both your sets, which you don't need.
std::nth_element will use O(n) complexity to partition without completely sorting.
Note: your for loops may also be replaced with std::accumulate
less = std::accumulate( W, W+(N-K), 0 );
more = std::accumulate( W+(N-K), W+N, 0 );

You are to split the set of elements into two distinctive nonoverlapping subsets A and B. You want the sum(A)-sum(B) be as high as possible.
Therefore, you want the sum(A) be as high as possible and sum(B) as low as possible.
Therefore, the set 'A' should contain as high elements as possible
and the set 'B' should contain as low elements as possible
By sorting the input set by element's value, and by assigning 'lowest elements' to B and 'highest elements' to A, you are guaranteed that the sum(A)-sum(B) will be max possible.
I do not see any cases where your approach would be wrong.
As to the 'being optimal' things, I did not analyze it at all. Drew's note seems quite probable.

It can be done using max heap. O(n + n log k) time
Make a max heap of size k. We have find the lowest k elements of the array. The root of heap will be the highest element in the heap. Make a heap of first k elements.
Now iterate through the array. Compare the array element with the root of max heap. If it is smaller than root then replace it and heapify the heap again. This will take O(n log k) time.
Find the sum of elements of heap.
Now you can find the sum of rest of the elements of array and get the difference. (O(n)) time
Total time O(n + n log k)
EDIT: Perhaps you can find the sum of all elements of array while traversing it for heap. This will save O(n) time and it can be solved in O(n log k)

most efficient algorithm to get union of 2 ordered lists

I need to find the union of 2 descending ordered lists (list1 and list2), where the union
would be each element from both lists without duplicates. Assume the list elements are integers. I am
using big O notation to determine the most efficient algorithm to solve this problem. I know the big
O notation for the 1st, but I do not know the big O notation for the 2nd. Can someone tell me the
big O notation of the 2nd algorithm so I can decide which algorithm to implement? If someone knows a
better algorithm than one of these, could you help me understand that as well? Thanks in advance.
Here are my two algorithms. . .
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Algorithm #1: O(N * log base2 N)
Starting at the first element of list1,
while(list1 is not at the end of the list) {
if(the current element in list1 is not in list2) // Binary Search -> O(log base2 N)
add the current element in list1 to list2
go to the next element in list1 }
list2 is now the union of the 2 lists
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Algorithm #2: O(?)
Starting at the first elements of each list,
LOOP_START:
compare the current elements of the lists
whichever element is greater, put into a 3rd list called list3
go to the next element in the list whose element was just inserted into list3
branch to LOOP_START until either list1 or list2 are at the end of their respective list
insert the remaining elements from either list1 or list2 into list3 (the union)
list3 now contains the union of list1 and list2

Here's my assessment of the situation
Your first algorithm runs in n log n time: you are doing the binary search for every element in the first list, right?
Your second algorithm is not entirely complete: you don't say what to do if the elements in the two lists are equal. However, given the right logic for dealing with equal elements, your second algorithm is like the merge part of the merge sort: it will run in linear time (i.e. N). It is optimal, in a sense that you cannot do better than that: you cannot merge two ordered lists without looking at every element in both list at least once.

The second is O(n+m) while the first is O(n log(m) + m). Thus the second is significantly better.

With the following algorithm you can have the two lists merged in O(n+m).
[Sorry, I have used python for simplicity, but the algorithm is the same in every language]
Note that the algorithm also maintains the items sorted in the result list.
def merge(list1, list2):
result = []
i1 = 0;
i2 = 0;
#iterate over the two lists
while i1 < len(list1) and i2 < len(list2):
 #if the current items are equal, add just one and go to the next two items
if list1[i1] == list2[i2]:
result.append(list1[i1])
i1 += 1
i2 += 1
#if the item of list1 is greater than the item of list2, add it and go to next item of list1
elif list1[i1] > list2[i2]:
result.append(list1[i1])
i1 += 1
#if the item of list2 is greater than the item of list1, add it and go to next item of list2
else:
result.append(list2[i2])
i2 += 1
#Add the remaining items of list1
while i1 < len(list1):
result.append(list1[i1])
i1 += 1
#Add the remaining items of list2
while i2 < len(list2):
result.append(list2[i2])
i2 += 1
return result
print merge([10,8,5,1],[12,11,7,5,2])
Output:
[12, 11, 10, 8, 7, 5, 2, 1]

Complexity Analysis:
Say the length of list 1 is N and that of list 2 is M.
Algorithm 1:
At the risk of sounding incredible, I would accept that according to me the complexity of this algorithm as such is N * M and not NlogM.
For each element in list 1 (O(N)), we are searching it in list 2 (O(logM). The complexity of this algorithm 'seems' O(NlogM).
However, we are also inserting the element in list 2. This new element should be inserted in proper place so that the list 2 remains sorted for further binary search operations. If we are using array as the data structure, then the insertion would take O(M) time.
Hence the order of complexity is O(N*M) for the algorithm as is.
A modification can be done, wherein the new element is inserted at the end of the list 2 (the list is then no more ordered) and we carry out the binary search operation from index 0 to M-1 rather than the new size-1. In this case the complexity shall be O(N*logM) since we shall carry out N binary searches in the list of length M.
To make the list ordered again, we will have to merge the two ordered parts (0 to M-1 and M to newSize-1). This can be done in O(N+M) time (one merge operation in merge sort of array length N+M). Hence the net time complexity of this algorithm shall be
O(NlogM + N + M)
Space complexity is O(max(N,M)) not considering the original lists space and only considering the extra space required in list 2.
Algorithm 2:
At each iteration, we are moving atleast 1 pointer forward. The total distance to travel by both pointers is N + M. Hence the order of time complexity in worst case is O(N+M) which is better than 1st algorithm.
However, the space complexity required in this case is larger (O(N+M)).

Here is another approach:
Iterate through both lists, and insert all the values into a set.
This will remove all duplicates and the result will be the union of two lists.
Two important notes: You'll loose the order of the numbers. Also, it takes additional space.
Time complexity: O(n + m)
Space Complexity: O(n + m)
If you need to maintain order of the result set, use some custom version of LinkedHashMap.

Actually, algorithm 2 should not work if the input lists are not sorted.
To sort the array it is order O(m*lg(m)+ n*lg(n))
You can build a hash table on the first list, then for each item from the second list, you check if this item exists in the hash table. This works in O(m+n).

There are a few things that need to be specified:
Do the input lists contain duplicates?
Must the result be ordered?
I'll assume that, using std::list, you can cheaply insert at the head or at the tail.
Let's say List 1 has N elements and List 2 has M elements.
Algorithm 1
It iterates over every item of List 1 searching for it in List 2.
Assuming that there may be duplicates and that the result must be ordered, the worse case time for the search is that no element in List 1 exists in List 2, hence it's at least:
O(N × M).
To insert the item of List 1 in the right place, you need to iterate List 2 again until the point of insertion. The worse case will be when every item in List 1 is smaller (if List 2 is searched from the beginning) or greater (if List 2 is searched from the end). Since the previous items of List 1 have been inserted in List 2, there would be M iterations for the first item, M + 1 for the second, M + 2 for the third, etc. and M + N - 1 iterations for the last item, for an average of M + (N - 1) / 2 per item.
Something like:
N × (M + (N - 1) / 2)
For big-O notation, constant factors don't matter, so:
N × (M + (N - 1))
For big-O notation, non-variable additions don't matter, so:
O(N × (M + N))
Adding to the original O(N × M):
O(N × M) + O(N × (M + N))
O(N × M) + O(N × M + N2)
The second equation is just to make the constant factor elimination evident, e.g. 2 × (N × M), thus:
O(N × (M + N))
O(N2 + N × M)
These two are equivalent, which ever you like the most.
Possible optimizations:
If the result doesn't have to be ordered, insertion can be O(1), hence the worse time case is:
O(N × M)
Don't just test each List 1 item in List 2 by equality, test if each item by e.g. greater than, so that you can stop searching in List 2 when List 1's item is greater than List 2's item; this wouldn't reduce the worse case, but it would reduce the average case
Keep the List 2 iterator that points to where List 1's item was found to be greater than List 2's item, to make the sorted insertion O(1); on insertion make sure to keep an iterator that starts at the inserted item, because although List 1 is ordered, it might contain duplicates; with these two, the worse time case becomes:
O(N × M)
For the next iterations, search for List 1's item in the rest of List 2 with the iterator we kept; this reduces the worse case, because if you reach the end of List 2, you'll be just "removing" duplicates from List 1; with these three, the worse time case becomes:
O(N + M)
By this point, the only difference between this algorithm and Algorithm 2 is that List 2 is changed to contain the result, instead of creating a new list.
Algorithm 2
This is the merging of the merge sort.
You'll be walking every element of List 1 and every element of List 2 once, and insertion is always made at the head or tail of the list, hence the worse case time is:
O(N + M)
If there are duplicates, they're simply discarded. The result is more easily made ordered than not.
Final Notes
If there are no duplicates, insertion can be optimized in both cases. For instance, with doubly-linked lists, we can easily check if the last element in List 1 is greater than the first element in List 2 or vice-versa, and simply concatenate the lists.
This can be further generalized for any tail of List 1 and List 2. For instance, in Algorithm 1, if a List 1's item is not found in List 2, we can concatenate List 2 and the tail of List 1. In Algorithm 2, this is done in the last step.
The worse case, when List 1's items and List 2's items are interleaved, is not reduced, but again the average case is reduced, and in many cases by a big factor that makes a big difference In Real Life™.
I ignored:
Allocation times
Worse space differences in the algorithms
Binary search, because you mentioned lists, not arrays or trees
I hope I didn't make any blatant mistake.

I had implemented a typescript(js) based implementation of Union operation of 2 arrays of object in one of my previous projects. The data was too large and the default library functions like underscore or lodash were not optimistic. After some brain hunting i came up with the below binary search based algorithm. Hope it might help someone for performance tuning.
As far as complexity is concerned, the algorithm is binary search based and will end up to be O(log(N)).
Basically the code takes two unordered object arrays and a keyname to compare and:
1) sort the arrays
2) iterate through each element of first array and delete it in second array
3) concatenate resulting second array into first array.
private sortArrays = (arr1: Array<Object>, arr2: Array<Object>, propertyName: string): void => {
function comparer(a, b) {
if (a[propertyName] < b[propertyName])
return -1;
if (a[propertyName] > b[propertyName])
return 1;
return 0;
}
arr1.sort(comparer);
arr2.sort(comparer);
}
private difference = (arr1: Array<Object>, arr2: Array<Object>, propertyName: string): Array<Object> => {
this.sortArrays(arr1, arr2, propertyName);
var self = this;
for (var i = 0; i < arr1.length; i++) {
var obj = {
loc: 0
};
if (this.OptimisedBinarySearch(arr2, arr2.length, obj, arr1[i], propertyName))
arr2.splice(obj.loc, 1);
}
return arr2;
}
private OptimisedBinarySearch = (arr, size, obj, val, propertyName): boolean => {
var first, mid, last;
var count;
first = 0;
last = size - 1;
count = 0;
if (!arr.length)
return false;
while (arr[first][propertyName] <= val[propertyName] && val[propertyName] <= arr[last][propertyName]) {
mid = first + Math.floor((last - first) / 2);
if (val[propertyName] == arr[mid][propertyName]) {
obj.loc = mid;
return true;
}
else if (val[propertyName] < arr[mid][propertyName])
last = mid - 1;
else
first = mid + 1;
}
return false;
}
private UnionAll = (arr1, arr2, propertyName): Array<Object> => {
return arr1.concat(this.difference(arr1, arr2, propertyName));
}
//example
var YourFirstArray = [{x:1},{x:2},{x:3}]
var YourSecondArray= [{x:0},{x:1},{x:2},{x:3},{x:4},{x:5}]
var keyName = "x";
this.UnionAll(YourFirstArray, YourSecondArray, keyName)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js