Merge sorted arrays - Efficient solution

Merge sorted arrays - Efficient solution - c++

Goal here is to merge multiple arrays which are already sorted into a resultant array.
I've written the following solution and wondering if there is a way to improve the solution
/*
Goal is to merge all sorted arrays
*/
void mergeAll(const vector< vector<int> >& listOfIntegers, vector<int>& result)
{
int totalNumbers = listOfIntegers.size();
vector<int> curpos;
int currow = 0 , minElement , foundMinAt = 0;
curpos.reserve(totalNumbers);
// Set the current position that was travered to 0 in all the array elements
for ( int i = 0; i < totalNumbers; ++i)
{
curpos.push_back(0);
}
for ( ; ; )
{
/* Find the first minimum
Which is basically the first element in the array that hasn't been fully traversed
*/
for ( currow = 0 ; currow < totalNumbers ; ++currow)
{
if ( curpos[currow] < listOfIntegers[currow].size() )
{
minElement = listOfIntegers[currow][curpos[currow] ];
foundMinAt = currow;
break;
}
}
/* If all the elements were traversed in all the arrays, then no further work needs to be done */
if ( !(currow < totalNumbers ) )
break;
/*
Traverse each of the array and find out the first available minimum value
*/
for ( ;currow < totalNumbers; ++currow)
{
if ( listOfIntegers[currow][curpos[currow] ] < minElement )
{
minElement = listOfIntegers[currow][curpos[currow] ];
foundMinAt = currow;
}
}
/*
Store the minimum into the resultant array
and increment the element traversed
*/
result.push_back(minElement);
++curpos[foundMinAt];
}
}
The corresponding main goes like this.
int main()
{
vector< vector<int> > myInt;
vector<int> result;
myInt.push_back(vector<int>() );
myInt.push_back(vector<int>() );
myInt.push_back(vector<int>() );
myInt[0].push_back(10);
myInt[0].push_back(12);
myInt[0].push_back(15);
myInt[1].push_back(20);
myInt[1].push_back(21);
myInt[1].push_back(22);
myInt[2].push_back(14);
myInt[2].push_back(17);
myInt[2].push_back(30);
mergeAll(myInt,result);
for ( int i = 0; i < result.size() ; ++i)
{
cout << result[i] << endl;
}
}

You can generalize Merge Sort algorithm and work with multiple pointers. Initially, all of them are pointing to the beginning of each array. You maintain these pointers sorted (by the values they point to) in a priority queue. In each step, you remove the smallest element in the heap in O(log n) (n is the number of arrays). You then output the element pointed by the extracted pointer. Now you increment this pointer in one position and if you didn't reach the end of the array, reinsert in the priority queue in O(log n). Proceed this way until the heap is not empty. If there are a total of m elements, the complexity is O(m log n). The elements are output in sorted order this way.

Perhaps I'm misunderstanding the question...and I feel like I'm misunderstanding your solution.
That said, maybe this answer is totally off-base and not helpful.
But, especially with the number of vectors and push_back's you're already using, why do you not just use std::sort?
#include <algorithm>
void mergeAll(const vector<vector<int>> &origList, vector<int> &resultList)
{
for(int i = 0; i < origList.size(); ++i)
{
resultList.insert(resultList.end(), origList[i].begin(), origList[i].end());
}
std::sort(resultList.begin(), resultList.end());
}
I apologize if this is totally off from what you're looking for. But it's how I understood the problem and the solution.
std::sort runs in O(N log (N)) http://www.cppreference.com/wiki/stl/algorithm/sort

I've seen some solution on the internet to merge two sorted arrays, but most of them were quite cumbersome. I changed some of the logic to provide the shortest version I can come up with:
void merge(const int list1[], int size1, const int list2[], int size2, int list3[]) {
// Declaration & Initialization
int index1 = 0, index2 = 0, index3 = 0;
// Loop untill both arrays have reached their upper bound.
while (index1 < size1 || index2 < size2) {
// Make sure the first array hasn't reached
// its upper bound already and make sure we
// don't compare outside bounds of the second
// array.
if ((list1[index1] <= list2[index2] && index1 < size1) || index2 >= size2) {
list3[index3] = list1[index1];
index1++;
}
else {
list3[index3] = list2[index2];
index2++;
}
index3++;
}
}

If you want to take advantage of multi-threading then a fairly good solution would be to just merge 2 lists at a time.
ie suppose you have 9 lists.
merge list 0 with 1.
merge list 2 with 3.
merge list 4 with 5.
merge list 6 with 7.
These can be performed concurrently.
Then:
merge list 0&1 with 2&3
merge list 4&5 with 6&7
Again these can be performed concurrently.
then merge list 0,1,2&3 with list 4,5,6&7
finally merge list 0,1,2,3,4,5,6&7 with list 8.
Job done.
I'm not sure on the complexity of that but it seems the obvious solution and DOES have the bonus of being multi-threadable to some extent.

Consider the priority-queue implementation in this answer linked in a comment above: Merging 8 sorted lists in c++, which algorithm should I use
It's O(n lg m) time (where n = total number of items and m = number of lists).

All you need is two pointers (or just int index counters), checking for minimum between array A and B, copying the value over to the resultant list, and incrementing the pointer of the array the minimum came from. If you run out of elements on one source array, copy the remainder of the second to the resultant and you're done.
Edit:
You can trivially expand this to N arrays.
Edit:
Don't trivially expand this to N arrays :-). Do two at a time. Silly me.

If you are merging very many vector together, then you could speed up performance by using a sort of tree to determine which vector contains the smallest element. This is probably not necessary for your application, but comment if it is and I'll try to work it out.

You could just stick them all into a multiset. That will handle the sorting for you.

Related

Sort array of n elements which has k sorted sections

What is the best way to sort an section-wise sorted array as depicted in the second image?
The problem is performing a quick-sort using Message Passing Interface. The solution is performing quick-sort on array sections obtained by using MPI_Scatter() then joining the sorted
pieces using MPI_Gather().
Problem is that the array as a whole is unsorted but sections of it are.
Merging the sub-sections similarly to this solution seems like the best way of sorting the array, but considering that the sub-arrays are already within a single array other sorting algorithms may prove better.
The inputs for a sort function would be the array, it's length and the number of equally sorted sub-sections.
A signature would look something like int* sort(int* array, int length, int sections);
The sections parameter can have any value between 1 and 25. The length parameter value is greater than 0, a multiple of sections and smaller than 2^32.
This is what I am currently using:
int* merge(int* input, int length, int sections)
{
int* sub_sections_indices = new int[sections];
int* result = new int[length];
int section_size = length / sections;
for (int i = 0; i < sections; i++) //initialisation
{
sub_sections_indices[i] = 0;
}
int min, min_index, current_index;
for (int i = 0; i < length; i++) //merging
{
min_index = 0;
min = INT_MAX;
for (int j = 0; j < sections; j++)
{
if (sub_sections_indices[j] < section_size)
{
current_index = j * section_size + sub_sections_indices[j];
if (input[current_index] < min)
{
min = input[current_index];
min_index = j;
}
}
}
sub_sections_indices[min_index]++;
result[i] = min;
}
return result;
}

Optimizing for performance
I think this answer that maintains a min-heap of the smallest item of each sub-array is the best way to handle arbitrary input. However, for small values of k, think somewhere between 10 and 100, it might be faster to implement the more naive solutions given in the question you linked to; while maintaining the min-heap is only O(log n) for each step, it might have a higher overhead for small values of n than the simple linear scan from the naive solutions.
All these solutions create a copy of the input, and they maintain O(k) state.
Optimizing for space
The only way to save space I see is to sort in-place. This will be a problem for the algorithms mentioned above. An in-place algorithm will have two swap elements, but any swaps will likely destroy the property that each sub-array is sorted, unless the larger of the swapped pair is re-sorted into the sub-array it is being swapped to, which will result in an O(n²) algorithm. So if you really do need to conserve memory, I think a regular in-place sorting algorithm would have to be used, which defeats your purpose.

Efficiently get largest 3 integers in C++ Linked List (unsorted)

I heard there were some std functions that do give the largest n integers of an array, but how about a linked list?
I would think a solution would be to have a few for loops to iterate over the linked list, but it seems as if there could be a simpler solution in the C++ libraries.
Thanks.

I would do it like this if you can't use another data structure:
typedef std::list<int> IntList;
InstList list = <your_values>;
int top[3];
for (size_t i = 0; i < 3; i++)
top[i] = std::numeric_limits<int>::min();
IntList::iterator it, end;
for (it = list.begin(), end = list.end(); it != end; ++it) {
const int& value = *it;
if (value > top[2]) {
top[0] = top[1];
top[1] = top[2];
top[2] = value;
} else if (value > top[1]) {
top[0] = top[1];
top[1] = value;
} else if (value > top[0]) {
top[0] = value;
}
}

Perhaps look into using a priority_queue.

The basic idea is to maintain a sorted list, priority queue, or heap of exactly N numbers. You push the first N values of your list into that, then you iterate through the remainder. If you encounter item that is larger than the smallest value in your queue (or whatever), you remove that element and push the new one in.
If you're only looking for N=3, then using a simple array is probably better than a priority queue or anything else. You can determine which element in that array is the minimum with just two comparisons. You always remember the index of the minimum element, and only update that when you replace it.
Interestingly, this approach would have the worst performance for a list that is sorted in ascending order. However, it is still essentially linear time complexity.

Heap corruption while freeing memory in a recursion function

I'm implementing an algorithm to select Kth smallest element of an array . so far when i was trying to free heap memory i got this error : crt detected that the application wrote to memory after end of heap buffer ...
int SEQUENTIAL_SELECT(int *S , int k , int n)
{
if(n<=Q) // sort S and return the kth element directly
{
qsort(S,n,sizeof(int),compare);
return S[k];
}
// subdivide S into n/Q subsequences of Q elements each
int countSets = ceil((float)n/(float)Q);
//sort each subsequnce and determine its median
int *medians = new int[countSets];
for(int i=0;i<countSets;i++)
{
if(i==countSets-1)
{
int size = Q - (n%Q);
qsort(&S[Q*i],size,sizeof(int),compare);
medians[i] = S[i*Q+size/2];
continue;
}
qsort(&S[Q*i],Q,sizeof(int),compare);
medians[i] = S[i*Q+Q/2];
}
// call SEQUENTIAL_SELECT recursively to find median of medians
int m = SEQUENTIAL_SELECT(medians,countSets/2,countSets);
delete[] medians;
int size = (3*n)/4;
int* s1 = new int[size]; // contains values less than m
int* s3 = new int[size]; // contains values graten than m
for(int i=0;i<size;i++)
{
s1[i] = INT_MAX;
s3[i] = INT_MAX;
}
int i1=0;
int i2=0;
int i3=0;
for(int i=0;i<n;i++)
{
if(S[i]>m)
s3[i3++] = S[i];
else if(S[i]<m)
s1[i1++] = S[i];
else
i2++; // count number of values equal to m
}
if( i1>=k )
m = SEQUENTIAL_SELECT(s1,k,i1);
else if( i1+i2+i3 >= k)
m = SEQUENTIAL_SELECT(s3,k-i1-i2,i3);
delete[] s3;
delete[] s1;
return m;
}

#Dcoder is certainly correct that Q - n%q is incorrect. It should be n%Q. In addition, the computation size = (3*n)/4 is not reliable; try it with n = 6 (assuming, as seems certain, that Q is actually 5) given the vector [1, 2, 3, 4, 5, 0].
You could have avoided having a lot of eyes looking at your code by simply checking the values of the indexes at every array subscript assignment (although that wouldn't have caught the assignments inside of qsort, but more on that below).
It must surely have occurred to you that you are using an awful lot of memory to perform a simple operation, which could in fact be done in-place. Normally the reason to avoid doing an in-place operation would be that you need to preserve the original vector, but you're computing medians with qsort which sorts in-place, so the original vector is already modified. If that's acceptable, then there is no reason not to do the rest of the median-of-medians algorithm in-place. [1]
By the way, although I'm certainly not one of those who fears floating-point computations, there is no reason at all for countSets = ceil(float(n)/float(Q)). (n + Q - 1)/Q will work just fine. That idiom could usefully have been used in the computation of size as well, although I'm not at all sure where you got the 3n/4 computation from in the first place.
[Note 1] Hint: instead of grouping consecutively, divide the vector into five regions and find the median of the ith element of each region. Once you've found it, swap it with the ith element of the first region; once that is done, your first region -- the first fifth of the vector -- contains the medians and you can recurse on that subvector. That means actually writing out the median code as a series of comparisons, which is tedious but a lot faster than calling qsort. That also avoids the degenerate case I mentioned above, where the median-of-medians computation incorrectly returns the smallest element in the vector.

Array balancing point

What is the best way to solve this?
A balancing point of an N-element array A is an index i such that all elements on lower indexes have values <= A[i] and all elements on higher indexes have values higher or equal A[i].
For example, given:
A[0]=4 A[1]=2 A[2]=7 A[3]=11 A[4]=9
one of the correct solutions is: 2. All elements below A[2] is less than A[2], all elements after A[2] is more than A[2].
One solution that appeared to my mind is O(nsquare) solution. Is there any better solution?

Start by assuming A[0] is a pole. Then start walking the array; comparing each element A[i] in turn against A[0], and also tracking the current maximum.
As soon as you find an i such that A[i] < A[0], you know that A[0] can no longer be a pole, and by extension, neither can any of the elements up to and including A[i]. So now continue walking until you find the next value that's bigger than the current maximum. This then becomes the new proposed pole.
Thus, an O(n) solution!
In code:
int i_pole = 0;
int i_max = 0;
bool have_pole = true;
for (int i = 1; i < N; i++)
{
if (A[i] < A[i_pole])
{
have_pole = false;
}
if (A[i] > A[i_max])
{
i_max = i;
if (!have_pole)
{
i_pole = i;
}
have_pole = true;
}
}

If you want to know where all the poles are, an O(n log n) solution would be to create a sorted copy of the array, and look to see where you get matching values.
EDIT: Sorry, but this doesn't actually work. One counterexample is [2, 5, 3, 1, 4].

Make two auxiliary arrays, each with as many elements as the input array, called MIN and MAX.
Each element M of MAX contains the maximum of all the elements in the input from 0..M. Each element M of MIN contains the minimum of all the elements in the input from M..N-1.
For each element M of the input array, compare its value to the corresponding values in MIN and MAX. If INPUT[M] == MIN[M] and INPUT[M] == MAX[M] then M is a balancing point.
Building MIN takes N steps, and so does MAX. Testing the array then takes N more steps. This solution has O(N) complexity and finds all balancing points. In the case of sorted input every element is a balancing point.

Create a double-linked list such as i-th node of this list contains A[i] and i. Traverse this list while elements grow (counting maximum of these elements). If some A[bad] < maxSoFar it can't be MP. Remove it and go backward removing elements until you find A[good] < A[bad] or reach the head of the list. Continue (starting with maxSoFar as maximum) until you reach end of the list. Every element in result list is MP and every MP is in this list. Complexity is O(n) since is maximum of steps is performed for descending array - n steps forward and n removals.
Update
Oh my, I confused "any" with "every" in problem definition :).

You can combine bmcnett's and Oli's answers to find all the poles as quickly as possible.
std::vector<int> i_poles;
i_poles.push_back(0);
int i_max = 0;
for (int i = 1; i < N; i++)
{
while (!i_poles.empty() && A[i] < A[i_poles.back()])
{
i_poles.pop_back();
}
if (A[i] >= A[i_max])
{
i_poles.push_back(i);
}
}
You could use an array preallocated to size N if you wanted to avoid reallocations.

Finding smallest value in an array most efficiently

There are N values in the array, and one of them is the smallest value. How can I find the smallest value most efficiently?

If they are unsorted, you can't do much but look at each one, which is O(N), and when you're done you'll know the minimum.
Pseudo-code:
small = <biggest value> // such as std::numerical_limits<int>::max
for each element in array:
if (element < small)
small = element
A better way reminded by Ben to me was to just initialize small with the first element:
small = element[0]
for each element in array, starting from 1 (not 0):
if (element < small)
small = element
The above is wrapped in the algorithm header as std::min_element.
If you can keep your array sorted as items are added, then finding it will be O(1), since you can keep the smallest at front.
That's as good as it gets with arrays.

You need too loop through the array, remembering the smallest value you've seen so far. Like this:
int smallest = INT_MAX;
for (int i = 0; i < array_length; i++) {
if (array[i] < smallest) {
smallest = array[i];
}
}

The stl contains a bunch of methods that should be used dependent to the problem.
std::find
std::find_if
std::count
std::find
std::binary_search
std::equal_range
std::lower_bound
std::upper_bound
Now it contains on your data what algorithm to use.
This Artikel contains a perfect table to help choosing the right algorithm.
In the special case where min max should be determined and you are using std::vector or ???* array
std::min_element
std::max_element
can be used.

If you want to be really efficient and you have enough time to spent, use SIMD instruction.
You can compare several pairs in one instruction:
r0 := min(a0, b0)
r1 := min(a1, b1)
r2 := min(a2, b2)
r3 := min(a3, b3)
__m64 _mm_min_pu8(__m64 a , __m64 b );
Today every computer supports it. Other already have written min function for you:
http://smartdata.usbid.com/datasheets/usbid/2001/2001-q1/i_minmax.pdf
or use already ready library.

If the array is sorted in ascending or descending order then you can find it with complexity O(1).
For an array of ascending order the first element is the smallest element, you can get it by arr[0] (0 based indexing).
If the array is sorted in descending order then the last element is the smallest element,you can get it by arr[sizeOfArray-1].
If the array is not sorted then you have to iterate over the array to get the smallest element.In this case time complexity is O(n), here n is the size of array.
int arr[] = {5,7,9,0,-3,2,3,4,56,-7};
int smallest_element=arr[0] //let, first element is the smallest one
for(int i =1;i<sizeOfArray;i++)
{
if(arr[i]<smallest_element)
{
smallest_element=arr[i];
}
}
You can calculate it in input section (when you have to find smallest element from a given array)
int smallest_element;
int arr[100],n;
cin>>n;
for(int i = 0;i<n;i++)
{
cin>>arr[i];
if(i==0)
{
smallest_element=arr[i]; //smallest_element=arr[0];
}
else if(arr[i]<smallest_element)
{
smallest_element = arr[i];
}
}
Also you can get smallest element by built in function
#inclue<algorithm>
int smallest_element = *min_element(arr,arr+n); //here n is the size of array
You can get smallest element of any range by using this function
such as,
int arr[] = {3,2,1,-1,-2,-3};
cout<<*min_element(arr,arr+3); //this will print 1,smallest element of first three element
cout<<*min_element(arr+2,arr+5); // -2, smallest element between third and fifth element (inclusive)
I have used asterisk (*), before min_element() function. Because it returns pointer of smallest element.
All codes are in c++.
You can find the maximum element in opposite way.

Richie's answer is close. It depends upon the language. Here is a good solution for java:
int smallest = Integer.MAX_VALUE;
int array[]; // Assume it is filled.
int array_length = array.length;
for (int i = array_length - 1; i >= 0; i--) {
if (array[i] < smallest) {
smallest = array[i];
}
}
I go through the array in reverse order, because comparing "i" to "array_length" in the loop comparison requires a fetch and a comparison (two operations), whereas comparing "i" to "0" is a single JVM bytecode operation. If the work being done in the loop is negligible, then the loop comparison consumes a sizable fraction of the time.
Of course, others pointed out that encapsulating the array and controlling inserts will help. If getting the minimum was ALL you needed, keeping the list in sorted order is not necessary. Just keep an instance variable that holds the smallest inserted so far, and compare it to each value as it is added to the array. (Of course, this fails if you remove elements. In that case, if you remove the current lowest value, you need to do a scan of the entire array to find the new lowest value.)

An O(1) sollution might be to just guess: The smallest number in your array will often be 0. 0 crops up everywhere. Given that you are only looking at unsigned numbers. But even then: 0 is good enough. Also, looking through all elements for the smallest number is a real pain. Why not just use 0? It could actually be the correct result!
If the interviewer/your teacher doesn't like that answer, try 1, 2 or 3. They also end up being in most homework/interview-scenario numeric arrays...
On a more serious side: How often will you need to perform this operation on the array? Because the sollutions above are all O(n). If you want to do that m times to a list you will be adding new elements to all the time, why not pay some time up front and create a heap? Then finding the smallest element can really be done in O(1), without resulting to cheating.

If finding the minimum is a one time thing, just iterate through the list and find the minimum.
If finding the minimum is a very common thing and you only need to operate on the minimum, use a Heap data structure.
A heap will be faster than doing a sort on the list but the tradeoff is you can only find the minimum.

If you're developing some kind of your own array abstraction, you can get O(1) if you store smallest added value in additional attribute and compare it every time a new item is put into array.
It should look something like this:
class MyArray
{
public:
MyArray() : m_minValue(INT_MAX) {}
void add(int newValue)
{
if (newValue < m_minValue) m_minValue = newValue;
list.push_back( newValue );
}
int min()
{
return m_minValue;
}
private:
int m_minValue;
std::list m_list;
}

//find the min in an array list of #s
$array = array(45,545,134,6735,545,23,434);
$smallest = $array[0];
for($i=1; $i<count($array); $i++){
if($array[$i] < $smallest){
echo $array[$i];
}
}

//smalest number in the array//
double small = x[0];
for(t=0;t<x[t];t++)
{
if(x[t]<small)
{
small=x[t];
}
}
printf("\nThe smallest number is %0.2lf \n",small);

Procedure:
We can use min_element(array, array+size) function . But it iterator
that return the address of minimum element . If we use *min_element(array, array+size) then it will return the minimum value of array.
C++ implementation
#include<bits/stdc++.h>
using namespace std;
int main()
{
int num;
cin>>num;
int arr[10];
for(int i=0; i<num; i++)
{
cin>>arr[i];
}
cout<<*min_element(arr,arr+num)<<endl;
return 0;
}

int small=a[0];
for (int x: a.length)
{
if(a[x]<small)
small=a[x];
}

C++ code
#include <iostream>
using namespace std;
int main() {
int n = 5;
int arr[n] = {12,4,15,6,2};
int min = arr[0];
for (int i=1;i<n;i++){
if (min>arr[i]){
min = arr[i];
}
}
cout << min;
return 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js