I had this question recently in an interview and I failed, and now search for the answer.
Let's say I have a big array of n integers, all differents.
If this array was ordered, I could subdivide it in x smaller
arrays, all of size y, except maybe the last one, which could be less.
I could then extract the nth subarray and return it, already sorted.
Example : Array 4 2 5 1 6 3. If y=2 and I want the 2nd array, it would be 3 4.
Now what I did is simply sort the array and return the nth subarray, which takes O(n log n). But it was said to me that there exists a way to do it in O(n + y log y). I searched on internet and didn't find anything. Ideas ?
The algorithm you are looking for is Selection Algorithm, which lets you find k-th order statistics in linear time. The algorithm is quite complex, but the standard C++ library conveniently provides an implementation of it.
The algorithm for finding k-th sorted interval that the interviewers had in mind went like this:
Find b=(k-1)*y-th order statistics in O(N)
Find e=k*y-th order statistics in O(N)
There will be y numbers between b and e. Store them in a separate array of size y. This operation takes O(N)
Sort the array of size y for O(y * log2y) cost.
The overall cost is O(N+N+N+y * log2y), i.e. O(N+y * log2y)
You can combine std::nth_element and std::sort for this:
std::vector<int> vec = muchData();
// Fix those bound iterators as needed
auto lower = vec.begin() + k*y;
auto upper = lower + y;
// put right element at lower and partition vector by it
std::nth_element(vec.begin(), lower, vec.end());
// Same for upper, but don't mess up lower
std::nth_element(lower + 1, upper - 1, vec.end());
// Now sort the subarray
std::sort(lower, upper);
[lower, upper) is now the k-th sorted subarray of length y, with the desired complexity on average.
To be checked for special cases like y = 1 before real world use, but this is the general idea.
http://www.spoj.com/problems/LSORT/ It is a problem on spoj
It states that
You are given a permutation of n numbers that are between 1 to n and having no duplicates.
Task is to sort that permutation in ascending order.There is another array Q in which we are inserting elements from given permutation P.
You have to implement N steps to sort P. In the i-th step, P has N-i+1 remaining elements, Q has i-1 elements and you have to choose some x-th element (from the N-i+1 available elements) of P and put it to the left or to the right of Q. The cost of this step is equal to x * i. The total cost is the sum of costs of individual steps. After N steps, Q must be an ascending sequence. Your task is to minimize the total cost.
Input
The first line of the input file is T (T ≤ 10), the number of test cases. Then descriptions of T test cases follow. The description of each test case consists of two lines. The first line contains a single integer N (1 ≤ N ≤ 1000). The second line contains N distinct integers from the set {1, 2, .., N}, the N-element permutation P.
Output
For each test case your program should write one line, containing a single integer - the minimum total cost of sorting.
Now i have figured out the dp
My recurrence relation states that for getting most optimal values from elements having value i to j i will have to insert either $i$ at front or $j$ at back.
Cost of inserting i at front = dp[i+1][j]+cost of adding element i at front
Cost of inserting j at back = dp[i][j-1] +cost of adding element j at back
and i have to take minimum of these.answer would be dp[1][n]
for(l=1;l<=n;l++) //length of current permutation Q
{
for(i=1;i<=n-l+1;i++) //starting value of permutation Q
{
j=i+l-1; //ending value of permutation Q
dp[i][j]=min(dp[i+1][j]+l*xi,dp[i][j-1]+l*xj);//chosing wether to insert i at start or j at end
}
}
here xi=index of element i from start of permutation P.
and yi=index of element j from start of permutation P.
ans would be dp[1][n]
But am unable to figure out xi and xj
Please help
You can try re-thinking your DP state.
For me, I would use the dp[startQ][endQ] where dp[startQ][endQ] means the cost I have incurred to far to 'sort' values startQ to endQ in the array Q.
If you know what is in the array Q (integers startQ to endQ inclusive), one can easily re-construct the array of P by just removing/ignoring all the integers within startQ and endQ.
For each state, dp[startQ][endQ], since one can only add to the front or the back of Q,
dp[startQ][endQ] can only be:
dp[startQ][endQ-1] + cost of adding endQ
dp[startQ-1][endQ] + cost of adding startQ
with the base cases being
dp[i][i] = 0;
These states can be computed and the answer can be found at dp[1]][n]; (assuming it is one indexed).
However I haven't thought of a efficient way to compute x if it were to be coded in a top down manner, where as the whole computation can be performed in O(N^2 log N) using bottom-up DP with a data structure to compute x at every state.
I will leave the final details for you to code out :) but I can help more if required.
I need to find the union of 2 descending ordered lists (list1 and list2), where the union
would be each element from both lists without duplicates. Assume the list elements are integers. I am
using big O notation to determine the most efficient algorithm to solve this problem. I know the big
O notation for the 1st, but I do not know the big O notation for the 2nd. Can someone tell me the
big O notation of the 2nd algorithm so I can decide which algorithm to implement? If someone knows a
better algorithm than one of these, could you help me understand that as well? Thanks in advance.
Here are my two algorithms. . .
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Algorithm #1: O(N * log base2 N)
Starting at the first element of list1,
while(list1 is not at the end of the list) {
if(the current element in list1 is not in list2) // Binary Search -> O(log base2 N)
add the current element in list1 to list2
go to the next element in list1 }
list2 is now the union of the 2 lists
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Algorithm #2: O(?)
Starting at the first elements of each list,
LOOP_START:
compare the current elements of the lists
whichever element is greater, put into a 3rd list called list3
go to the next element in the list whose element was just inserted into list3
branch to LOOP_START until either list1 or list2 are at the end of their respective list
insert the remaining elements from either list1 or list2 into list3 (the union)
list3 now contains the union of list1 and list2
Here's my assessment of the situation
Your first algorithm runs in n log n time: you are doing the binary search for every element in the first list, right?
Your second algorithm is not entirely complete: you don't say what to do if the elements in the two lists are equal. However, given the right logic for dealing with equal elements, your second algorithm is like the merge part of the merge sort: it will run in linear time (i.e. N). It is optimal, in a sense that you cannot do better than that: you cannot merge two ordered lists without looking at every element in both list at least once.
The second is O(n+m) while the first is O(n log(m) + m). Thus the second is significantly better.
With the following algorithm you can have the two lists merged in O(n+m).
[Sorry, I have used python for simplicity, but the algorithm is the same in every language]
Note that the algorithm also maintains the items sorted in the result list.
def merge(list1, list2):
result = []
i1 = 0;
i2 = 0;
#iterate over the two lists
while i1 < len(list1) and i2 < len(list2):
#if the current items are equal, add just one and go to the next two items
if list1[i1] == list2[i2]:
result.append(list1[i1])
i1 += 1
i2 += 1
#if the item of list1 is greater than the item of list2, add it and go to next item of list1
elif list1[i1] > list2[i2]:
result.append(list1[i1])
i1 += 1
#if the item of list2 is greater than the item of list1, add it and go to next item of list2
else:
result.append(list2[i2])
i2 += 1
#Add the remaining items of list1
while i1 < len(list1):
result.append(list1[i1])
i1 += 1
#Add the remaining items of list2
while i2 < len(list2):
result.append(list2[i2])
i2 += 1
return result
print merge([10,8,5,1],[12,11,7,5,2])
Output:
[12, 11, 10, 8, 7, 5, 2, 1]
Complexity Analysis:
Say the length of list 1 is N and that of list 2 is M.
Algorithm 1:
At the risk of sounding incredible, I would accept that according to me the complexity of this algorithm as such is N * M and not NlogM.
For each element in list 1 (O(N)), we are searching it in list 2 (O(logM). The complexity of this algorithm 'seems' O(NlogM).
However, we are also inserting the element in list 2. This new element should be inserted in proper place so that the list 2 remains sorted for further binary search operations. If we are using array as the data structure, then the insertion would take O(M) time.
Hence the order of complexity is O(N*M) for the algorithm as is.
A modification can be done, wherein the new element is inserted at the end of the list 2 (the list is then no more ordered) and we carry out the binary search operation from index 0 to M-1 rather than the new size-1. In this case the complexity shall be O(N*logM) since we shall carry out N binary searches in the list of length M.
To make the list ordered again, we will have to merge the two ordered parts (0 to M-1 and M to newSize-1). This can be done in O(N+M) time (one merge operation in merge sort of array length N+M). Hence the net time complexity of this algorithm shall be
O(NlogM + N + M)
Space complexity is O(max(N,M)) not considering the original lists space and only considering the extra space required in list 2.
Algorithm 2:
At each iteration, we are moving atleast 1 pointer forward. The total distance to travel by both pointers is N + M. Hence the order of time complexity in worst case is O(N+M) which is better than 1st algorithm.
However, the space complexity required in this case is larger (O(N+M)).
Here is another approach:
Iterate through both lists, and insert all the values into a set.
This will remove all duplicates and the result will be the union of two lists.
Two important notes: You'll loose the order of the numbers. Also, it takes additional space.
Time complexity: O(n + m)
Space Complexity: O(n + m)
If you need to maintain order of the result set, use some custom version of LinkedHashMap.
Actually, algorithm 2 should not work if the input lists are not sorted.
To sort the array it is order O(m*lg(m)+ n*lg(n))
You can build a hash table on the first list, then for each item from the second list, you check if this item exists in the hash table. This works in O(m+n).
There are a few things that need to be specified:
Do the input lists contain duplicates?
Must the result be ordered?
I'll assume that, using std::list, you can cheaply insert at the head or at the tail.
Let's say List 1 has N elements and List 2 has M elements.
Algorithm 1
It iterates over every item of List 1 searching for it in List 2.
Assuming that there may be duplicates and that the result must be ordered, the worse case time for the search is that no element in List 1 exists in List 2, hence it's at least:
O(N × M).
To insert the item of List 1 in the right place, you need to iterate List 2 again until the point of insertion. The worse case will be when every item in List 1 is smaller (if List 2 is searched from the beginning) or greater (if List 2 is searched from the end). Since the previous items of List 1 have been inserted in List 2, there would be M iterations for the first item, M + 1 for the second, M + 2 for the third, etc. and M + N - 1 iterations for the last item, for an average of M + (N - 1) / 2 per item.
Something like:
N × (M + (N - 1) / 2)
For big-O notation, constant factors don't matter, so:
N × (M + (N - 1))
For big-O notation, non-variable additions don't matter, so:
O(N × (M + N))
Adding to the original O(N × M):
O(N × M) + O(N × (M + N))
O(N × M) + O(N × M + N2)
The second equation is just to make the constant factor elimination evident, e.g. 2 × (N × M), thus:
O(N × (M + N))
O(N2 + N × M)
These two are equivalent, which ever you like the most.
Possible optimizations:
If the result doesn't have to be ordered, insertion can be O(1), hence the worse time case is:
O(N × M)
Don't just test each List 1 item in List 2 by equality, test if each item by e.g. greater than, so that you can stop searching in List 2 when List 1's item is greater than List 2's item; this wouldn't reduce the worse case, but it would reduce the average case
Keep the List 2 iterator that points to where List 1's item was found to be greater than List 2's item, to make the sorted insertion O(1); on insertion make sure to keep an iterator that starts at the inserted item, because although List 1 is ordered, it might contain duplicates; with these two, the worse time case becomes:
O(N × M)
For the next iterations, search for List 1's item in the rest of List 2 with the iterator we kept; this reduces the worse case, because if you reach the end of List 2, you'll be just "removing" duplicates from List 1; with these three, the worse time case becomes:
O(N + M)
By this point, the only difference between this algorithm and Algorithm 2 is that List 2 is changed to contain the result, instead of creating a new list.
Algorithm 2
This is the merging of the merge sort.
You'll be walking every element of List 1 and every element of List 2 once, and insertion is always made at the head or tail of the list, hence the worse case time is:
O(N + M)
If there are duplicates, they're simply discarded. The result is more easily made ordered than not.
Final Notes
If there are no duplicates, insertion can be optimized in both cases. For instance, with doubly-linked lists, we can easily check if the last element in List 1 is greater than the first element in List 2 or vice-versa, and simply concatenate the lists.
This can be further generalized for any tail of List 1 and List 2. For instance, in Algorithm 1, if a List 1's item is not found in List 2, we can concatenate List 2 and the tail of List 1. In Algorithm 2, this is done in the last step.
The worse case, when List 1's items and List 2's items are interleaved, is not reduced, but again the average case is reduced, and in many cases by a big factor that makes a big difference In Real Life™.
I ignored:
Allocation times
Worse space differences in the algorithms
Binary search, because you mentioned lists, not arrays or trees
I hope I didn't make any blatant mistake.
I had implemented a typescript(js) based implementation of Union operation of 2 arrays of object in one of my previous projects. The data was too large and the default library functions like underscore or lodash were not optimistic. After some brain hunting i came up with the below binary search based algorithm. Hope it might help someone for performance tuning.
As far as complexity is concerned, the algorithm is binary search based and will end up to be O(log(N)).
Basically the code takes two unordered object arrays and a keyname to compare and:
1) sort the arrays
2) iterate through each element of first array and delete it in second array
3) concatenate resulting second array into first array.
private sortArrays = (arr1: Array<Object>, arr2: Array<Object>, propertyName: string): void => {
function comparer(a, b) {
if (a[propertyName] < b[propertyName])
return -1;
if (a[propertyName] > b[propertyName])
return 1;
return 0;
}
arr1.sort(comparer);
arr2.sort(comparer);
}
private difference = (arr1: Array<Object>, arr2: Array<Object>, propertyName: string): Array<Object> => {
this.sortArrays(arr1, arr2, propertyName);
var self = this;
for (var i = 0; i < arr1.length; i++) {
var obj = {
loc: 0
};
if (this.OptimisedBinarySearch(arr2, arr2.length, obj, arr1[i], propertyName))
arr2.splice(obj.loc, 1);
}
return arr2;
}
private OptimisedBinarySearch = (arr, size, obj, val, propertyName): boolean => {
var first, mid, last;
var count;
first = 0;
last = size - 1;
count = 0;
if (!arr.length)
return false;
while (arr[first][propertyName] <= val[propertyName] && val[propertyName] <= arr[last][propertyName]) {
mid = first + Math.floor((last - first) / 2);
if (val[propertyName] == arr[mid][propertyName]) {
obj.loc = mid;
return true;
}
else if (val[propertyName] < arr[mid][propertyName])
last = mid - 1;
else
first = mid + 1;
}
return false;
}
private UnionAll = (arr1, arr2, propertyName): Array<Object> => {
return arr1.concat(this.difference(arr1, arr2, propertyName));
}
//example
var YourFirstArray = [{x:1},{x:2},{x:3}]
var YourSecondArray= [{x:0},{x:1},{x:2},{x:3},{x:4},{x:5}]
var keyName = "x";
this.UnionAll(YourFirstArray, YourSecondArray, keyName)
With a C++ STL vector we are building a vector of N elements and for some
reason we chose to insert them at the front of the vector. Every element insertion at the front of a vector forces the shift of all existing elements by 1. This results in (1+2+3+...+N) overall shifts of vector elements, which is (N/2)(N+1) shifts.
My question is how the author came with (1+2+3+...N), I thought it should be 1+1+1..N as we are moving one element at one position to get empty at beginning?
Thanks!
From [vector.modifiers]/2 (which describes vector::insert):
Complexity: The complexity is linear in the number of elements inserted plus the distance to the end of the vector.
Each time that you add an element the distance to the end of the vector is increased by one.
The first time that you add an element, there is 1 to be inserted and the distance to the end is 0, so the complexity is 1 + 0 = 1. The second time, there is 1 to be inserted, and the distance to the end is 1, so the complexity is 1 + 1 = 2. The third time, the distance to the end is 2, so the complexity is 1 + 2 = 3. This is what creates the 1 + 2 + 3 + ... + N pattern that the author is describing.
At insertion n, there are n elements currently in the vector that needs to be shifted.
vector<int> values;
for (size_t i = 0; i < N; ++i)
{
//At this point there are `i` elements in the vector that need to be moved
//to make room for the new element
values.insert(values.begin(), 0);
}
The first Value is shifted N-1 times, each time a new value is inserted it has to move. The second value is shifted N-2 times because only N-2 values are added after it. Next value is shifted N-3 and so on. The last value is not shifted.
I don't know why the author speaks about N and not N-1. But the reason for your confusion is, that the author counts the shifts of a single value and you count the amount of shift prozesses involving more than one single value shift.
I was asked in an interview to give an O(n) algorithm to print an element that appears more than n/2 times in an array, if there is such an element. n is the size of the array.
I don't have any clue on how to do this. Can anyone help?
It's the Boyer's Voting algorithm.
It's also O(1) in space!.
Edit
For those complaining about the site color scheme (like me) ... here is the original paper.
In psuedocode:
int n = array.length
Hash<elementType,int> hash
foreach element in array
hash[element] += 1
foreach entry in hash
if entry.value > n/2
print entry.key
break
It's also the median value, which takes O(n) to find using the median-of-medians algorithm. In C++ you can do this in one line:
std::nth_element(begin, begin + n/2, begin + n)