choose n largest elements in two vector - c++

I have two vectors, each contains n unsorted elements, how can I get n largest elements in these two vectors?
my solution is merge two vector into one with 2n elements, and then use std::nth_element algorithm, but I found that's not quite efficient, so anyone has more efficient solution. Really appreciate.

You may push the elements into priority_queue and then pop n elements out.

Assuming that n is far smaller than N this is quite efficient. Getting minElem is cheap and sorted inserting in L cheaper than sorting of the two vectors if n << N.
L := SortedList()
For Each element in any of the vectors do
{
minElem := smallest element in L
if( element >= minElem or if size of L < n)
{
add element to L
if( size of L > n )
{
remove smallest element from L
}
}
}

vector<T> heap;
heap.reserve(n + 1);
vector<T>::iterator left = leftVec.begin(), right = rightVec.begin();
for (int i = 0; i < n; i++) {
if (left != leftVec.end()) heap.push_back(*left++);
else if (right != rightVec.end()) heap.push_back(*right++);
}
if (left == leftVec.end() && right == rightVec.end()) return heap;
make_heap(heap.begin(), heap.end(), greater<T>());
while (left != leftVec.end()) {
heap.push_back(*left++);
push_heap(heap.begin(), heap.end(), greater<T>());
pop_heap(heap.begin(), heap.end(), greater<T>());
heap.pop_back();
}
/* ... repeat for right ... */
return heap;
Note I use *_heap directly rather than priority_queue because priority_queue does not provide access to its underlying data structure. This is O(N log n), slightly better than the naive O(N log N) method if n << N.

You can do the "n'th element" algorithm conceptually in parallel on the two vectors quite easiely (at least the simple variant that's only linear in the average case).
Pick a pivot.
Partition (std::partition) both vectors by that pivot. You'll have the first vector partitioned by some element with rank i and the second by some element with rank j. I'm assuming descending order here.
If i+j < n, recurse on the right side for the n-i-j greatest elements. If i+j > n, recurse on the left side for the n greatest elements. If you hit i+j==n, stop the recursion.
You basically just need to make sure to partition both vectors by the same pivot in every step. Given a decent pivot selection, this algorithm is linear in the average case (and works in-place).
See also: http://en.wikipedia.org/wiki/Selection_algorithm#Partition-based_general_selection_algorithm
Edit: (hopefully) clarified the algorithm a bit.

Related

Create a function that checks whether an array has two opposite elements or not for less than n^2 complexity. (C++)

Create a function that checks whether an array has two opposite elements or not for less than n^2 complexity. Let's work with numbers.
Obviously the easiest way would be:
bool opposite(int* arr, int n) // n - array length
{
for(int i = 0; i < n; ++i)
{
for(int j = 0; j < n; ++j)
{
if(arr[i] == - arr[j])
return true;
}
}
return false;
}
I would like to ask if any of you guys can think of an algorithm that has a complexity less than n^2.
My first idea was the following:
1) sort array ( algorithm with worst case complexity: n.log(n) )
2) create two new arrays, filled with negative and positive numbers from the original array
( so far we've got -> n.log(n) + n + n = n.log(n))
3) ... compare somehow the two new arrays to determine if they have opposite numbers
I'm not pretty sure my ideas are correct, but I'm opened to suggestions.
An important alternative solution is as follows. Sort the array. Create two pointers, one initially pointing to the front (smallest), one initially pointing to the back (largest). If the sum of the two pointed-to elements is zero, you're done. If it is larger than zero, then decrement the back pointer. If it is smaller than zero, then increment the front pointer. Continue until the two pointers meet.
This solution is often the one people are looking for; often they'll explicitly rule out hash tables and trees by saying you only have O(1) extra space.
I would use an std::unordered_set and check to see if the opposite of the number already exist in the set. if not insert it into the set and check the next element.
std::vector<int> foo = {-10,12,13,14,10,-20,5,6,7,20,30,1,2,3,4,9,-30};
std::unordered_set<int> res;
for (auto e : foo)
{
if(res.count(-e) > 0)
std::cout << -e << " already exist\n";
else
res.insert(e);
}
Output:
opposite of 10 alrready exist
opposite of 20 alrready exist
opposite of -30 alrready exist
Live Example
Let's see that you can simply add all of elements to the unordered_set and when you are adding x check if you are in this set -x. The complexity of this solution is O(n). (as #Hurkyl said, thanks)
UPDATE: Second idea is: Sort the elements and then for all of the elements check (using binary search algorithm) if the opposite element exists.
You can do this in O(n log n) with a Red Black tree.
t := empty tree
for each e in A[1..n]
if (-e) is in t:
return true
insert e into t
return false
In C++, you wouldn't implement a Red Black tree for this purpose however. You'd use std::set, because it guarantees O(log n) search and insertion.
std::set<int> s;
for (auto e : A) {
if (s.count(-e) > 0) {
return true;
}
s.insert(e);
}
return false;
As Hurkyl mentioned, you could do better by just using std::unordered_set, which is a hashtable. This gives you O(1) search and insertion in the average case, but O(n) for both operations in the worst case. The total complexity of the solution in the average case would be O(n).

Given a sorted array and a parameter k, find the count of sum of two numbers greater than or equal to k in linear time

I am trying to find all pairs in an array with sum equal to k. My current solution takes O(n*log(n)) time (code snippet below).Can anybody help me in finding a better solution, O(n) or O(lgn) may be (if it exists)
map<int,int> mymap;
map<int,int>::iterator it;
cin>>n>>k;
for( int i = 0 ; i < n ; i++ ){
cin>>a;
if( mymap.find(a) != mymap.end() )
mymap[a]++;
else
mymap[a] = 1;
}
for( it = mymap.begin() ; it != mymap.end() ; it++ ){
int val = it->first;
if( mymap.find(k-val) != mymap.end() ){
cnt += min( it->second, mymap.find(k-val)->second );
it->second = 0;
}
}
cout<<cnt;
Another aproach which will take O(log n) in the best case and O(nlog n) in the worst one for positive numbers can be done in this way:
Find element in array that is equal to k/2 or if it doesn’t exist than finds the minimum greater then k/2. All combinations with this element and all greater elements will be interested for us because p + s >= k when p>= k/2 and s>=k/2. Array is sorted, so binary search with some modifications can be used. This step will take O(log n) time.
All elements which are less then k/2 + elements greater or equal to "mirror elements" (according to median k/2) will also be interested for us because p + s >= k when p=k/2-t and s>= k/2+t. Here we need to loop through elements less then k/2 and find their mirror elements (binary search). The loop should be stopped if mirror element is greater then the last array.
For instance we have array {1,3,5,8,11} and k = 10, so on the first step we will have k/2 = 5 and pairs {5,7}, {8,11}, {8, 11}. The count of these pairs will be calculated by formula l * (l - 1)/2 where l = count of elements >= k/2. In our case l = 3, so count = 3*2/2=3.
On the second step for 3 number a mirror element will be 7 (5-2=3 and 5+2=7), so pairs {3, 8} and {3, 11} will be interested. For 1 number mirror will be 9 (5-4=1 and 5+4=9), so {1, 11} is what we look for.
So, if k/2 < first array element this algorithm will be O(log n).
For negative the algorithm will be a little bit more complex but can be solved also with the same complexity.
There exists a rather simple O(n) approach using the so-called "two pointers" or "two iterators" approach. The key idea is to have two iterators (not necessarily C++ iterators, indices would do too) running on the same array so that if first iterator points to value x, then the second iterator points to the maximal element in the array that is less then k-x.
We will be increasing the first iterator, and while doing this we'll also change the second iterator to maintain this property. Note that as the first pointer increases, the corresponding position of the second pointer will only decrease, so on every iteration we can start from the position where we stopped at the previous iteration; we will never need to increase the second pointer. This is how we achieve O(n) time.
Code is like this (did not test this, but the idea should be clear)
vector<int> a; // the given array
int r = a.size() - 1;
for (int l=0; l<a.size(); l++) {
while ((r >= 0) && (a[r] >= k - a[l]))
r--;
// now r is the maximal position in a so that a[r] < k - a[l]
// so all elements right to r form a needed pair with a[l]
ans += a.size() - r - 1; // this is how many pairs we have starting at l
}
Another approach which might be simpler to code, but a bit slower, is O(n log n) using binary search. For each element a[l] of the array, you can find the maximal position r so that a[r]<k-a[l] using binary search (this is the same r as in the first algorithm).
#Drew Dormann - thanks for the remark.
Run through the array with two pointers. left and right.
Assuming left is the small side, start with left at location 0 and then right moves towards left until a[left]+a[right] >= k for the last time.
When this is achieved, then total_count += (a.size - right + 1).
You then move left one step forwards and right needs to (maybe) move towards it. Repeat this until they meet.
When this is done, and let us say they met at location x, then totla_count += choose(2, a.size - x).
Sort the array (n log n)
for (i = 1 to n)
Start at the root
if a[i] + curr_node >= k, go left and match = indexof(curr_nod)e
else, go right
If curr_node = leaf node, add all nodes after a[match] to the list of valid pairs with a[i]
Step 2 also takes O(n log n). The for loop runs n times. Within the loop, we perform a binary search for each node i.e. log n steps. Hence the overall complexity of the algorithm is O (n log n).
This should do the work:
void count(int A[], int n) //n being the number of terms in array
{ int i, j, k, count = 0;
cin>>k;
for(i = 0; i<n; i++)
for(j = 0; j<n; j++)
if(A[i] + A[j] >= k)
count++ ;
cout<<"There are "<<count<<" such numbers" ;
}

Iterative Merge Sort, works same speed as Bubblesort

I have tried to implement an iterative Merge sort using nested loops. Although this algorithm does sort correctly (as in after sorting things are in correct order), I know there is something wrong with this implementation as I tried to sort larger collections with it and compare timings with slower sorts, and I end up getting slow times for this iterative implementation. For example, sorting 500 items gives a time of 31 milliseconds with this implementation just like bubble sort does.
int main()
{
int size;
cin >> size;
//assume vector is already initialized with values & size
vector<int> items(size);
IterativeMergeSort(items, 0, size - 1);
}
void IterativeMergeSort(vector<int> &items, int start, int end)
{
vector<int> temp(items.size());
int left, middle, right;
for(int outer = 1; outer < 2; outer *= 2)
{
for(int inner = start; inner < end; inner = inner * outer + 1)
{
left = outer - 1;
middle = inner;
right = inner + 1;
ItMerge(items, left, middle, right, temp);
}
}
}
void ItMerge(vector<int> &items, int start, int mid, int end, vector<int> &temp)
{
int first1 = start;
int last1 = mid;
int first2 = mid + 1;
int last2 = end;
int index = first1;
while(first1 <= last1 && first2 <= last2)
{
if(items[first1] <= items[first2])
{
temp[index] = items[first1];
first1++;
}
else
{
temp[index] = items[first2];
first2++;
}
index++;
}
while(first1 <= last1)
{
temp[index] = items[first1];
first1++;
index++;
}
while(first2 <= last2)
{
temp[index] = items[first2];
first2++;
index++;
}
for(index = start; index <= end; index++)
{
items[index] = temp[index];
}
}
Your algorithm isn't merge sort. It tries to be, but it isn't.
As I understand it, what is supposed to happen is that the inner loop steps over subsequences and merges them, while the outer loop controls the inner loop's sequence length, starting with 1 and doubling on every iteration until there are just two subsequences and they get merged.
But that's not what your algorithm is doing. The outer loop's condition is broken, so the outer loop will run exactly once. And the inner loop doesn't take roughly-equal subsequences in pairs. Instead, the right subsequence is exactly one element (mid is inner, right is inner+1) and the left subsequence is always everything used so far (left is outer-1, and outer is constant 1). So the algorithm will repeatedly merge the already-sorted left subsequence with a single-element right subsequence.
This means that in effect, your algorithm is insertion sort, except that you don't insert in place, but instead copy the sorted sequence to a buffer, inserting the new element at the right moment, then copy the result back. So it's a very inefficient insertion sort.
Below is a link to somewhat optimized examples of top down and bottom up merge sort. The bottom up merge sort is a bit faster because it skips the recursive sequence used to repeated generate sub-pairs of indexes until a sub-pair represents a run of size 1. Most of the time is spent merging, so bottom up isn't that much faster. The first pass of the bottom up merge pass could be optimized by swapping pairs in place rather than copying them. The bottom up merge sort ends up with the sorted data in either the temp or original array. If the original array is wanted, then a pass count can be calculated and if the count is odd, then the first pass swaps in place.
Both versions can sort 4 million 64 bit unsigned integers in less than a second on my system (Intel Core i7 2600k 3.4ghz).
merge_sort using vectors works well with less than 9 inputs
For a vector or array of integers, a counting / radix sort would be faster still.
I've finally figured it out.
In pseudocode:
for( outer = 1, outer < length, outer *=2)
for(inner = 0; inner < length, inner = inner + (outer *2))
left = inner
middle = (inner + outer) - 1
right = (inner + (outer * 2)) - 1
merge(items, left, middle, right, temp)
After rethinking how the iterative merge sort is supposed to work and looking at a couple implementations, in the merge method, all I needed was to check if the middle and right indexes passed in were greater than or equal to the vector size (that way we handle any values that could out of bounds), then merge as usual. Also, looking at this helped greatly understand it; also this. Just to be sure that it works as well as a recursive Merge Sort, I did timings on both and both (recursive and iterative) implementations produced identical times for 500,1000,5000, and 10K values to sort (in some cases the iterative solution produced a faster time).

Improvement of quick sort

Quick sort algorithm has bad behavior when there are many copy of items.(I mean we have Repetitive data).How could it be improved to this issue is resolved.
int partition (int low,int high)
{
int j=low,i=low+1;
int PivotItem=arr[low];
for(int i=0,j=0;i<n;i++)
{
if(arr[i]==PivotItem)
subarray[j]=arr[i];
}
for(i=low+1;i<=high;i++)
{
if(arr[i]<PivotItem)
{
j++;
swap(arr[i],arr[j]);
}
}
swap(arr[low],arr[j]);
int PivotPoint=j;
return PivotPoint;
}
void quick_sort(int low,int high)
{
if(low==high)
return ;
int PivotPoint=partition(low,high);
quick_sort(low,PivotPoint-1);
quick_sort(PivotPoint+1,high);
}
There is special modification of QuickSort known as dutch flag sort algorithm. It uses three-way partition for items smaller, equal and bigger than pivot item value.
I assume you meant the fact that quick sort compares elements based on a <= (or < and then the result is symmetrical to the next explanation) comparator, and if we look at the case where all elements are the same as the pivot x, you get quicksort's worst case complexity, since you split the array into two very non-even parts, one of size n-1, and the other is empty.
A quick fix to address this issue will be to use quick sort only with <, and > - to split the data to the two subarrays, and instead of a singular pivot, hold an array that holds all the elements that equal to the pivot, then recurse on the elements that are strictly larger than the pivot, and the elements that are strictly smaller than the pivot, and combine the three arrays.
Illustration:
legend: X=pivot, S = smaller than pivot, L = larger than pivot
array = |SLLLSLXLSLXSSLLXLLLSSXSSLLLSSXSSLLLXSSL|
Choose pivot - X
Create L array of only strictly smaller elements: |SSSSSSSSSSSSSSS|
Create R array of only strictly larger elements: |LLLLLLLLLLLLLLLLLL|
Create "pivot array" |XXXXXX|
Now, recurse on L, recurse on R, and combine:
|SSSSSSSSSSSSSSS XXXXXX LLLLLLLLLLLLLLLLLL|

Array balancing point

What is the best way to solve this?
A balancing point of an N-element array A is an index i such that all elements on lower indexes have values <= A[i] and all elements on higher indexes have values higher or equal A[i].
For example, given:
A[0]=4 A[1]=2 A[2]=7 A[3]=11 A[4]=9
one of the correct solutions is: 2. All elements below A[2] is less than A[2], all elements after A[2] is more than A[2].
One solution that appeared to my mind is O(nsquare) solution. Is there any better solution?
Start by assuming A[0] is a pole. Then start walking the array; comparing each element A[i] in turn against A[0], and also tracking the current maximum.
As soon as you find an i such that A[i] < A[0], you know that A[0] can no longer be a pole, and by extension, neither can any of the elements up to and including A[i]. So now continue walking until you find the next value that's bigger than the current maximum. This then becomes the new proposed pole.
Thus, an O(n) solution!
In code:
int i_pole = 0;
int i_max = 0;
bool have_pole = true;
for (int i = 1; i < N; i++)
{
if (A[i] < A[i_pole])
{
have_pole = false;
}
if (A[i] > A[i_max])
{
i_max = i;
if (!have_pole)
{
i_pole = i;
}
have_pole = true;
}
}
If you want to know where all the poles are, an O(n log n) solution would be to create a sorted copy of the array, and look to see where you get matching values.
EDIT: Sorry, but this doesn't actually work. One counterexample is [2, 5, 3, 1, 4].
Make two auxiliary arrays, each with as many elements as the input array, called MIN and MAX.
Each element M of MAX contains the maximum of all the elements in the input from 0..M. Each element M of MIN contains the minimum of all the elements in the input from M..N-1.
For each element M of the input array, compare its value to the corresponding values in MIN and MAX. If INPUT[M] == MIN[M] and INPUT[M] == MAX[M] then M is a balancing point.
Building MIN takes N steps, and so does MAX. Testing the array then takes N more steps. This solution has O(N) complexity and finds all balancing points. In the case of sorted input every element is a balancing point.
Create a double-linked list such as i-th node of this list contains A[i] and i. Traverse this list while elements grow (counting maximum of these elements). If some A[bad] < maxSoFar it can't be MP. Remove it and go backward removing elements until you find A[good] < A[bad] or reach the head of the list. Continue (starting with maxSoFar as maximum) until you reach end of the list. Every element in result list is MP and every MP is in this list. Complexity is O(n) since is maximum of steps is performed for descending array - n steps forward and n removals.
Update
Oh my, I confused "any" with "every" in problem definition :).
You can combine bmcnett's and Oli's answers to find all the poles as quickly as possible.
std::vector<int> i_poles;
i_poles.push_back(0);
int i_max = 0;
for (int i = 1; i < N; i++)
{
while (!i_poles.empty() && A[i] < A[i_poles.back()])
{
i_poles.pop_back();
}
if (A[i] >= A[i_max])
{
i_poles.push_back(i);
}
}
You could use an array preallocated to size N if you wanted to avoid reallocations.