Sort array of n elements which has k sorted sections - c++

What is the best way to sort an section-wise sorted array as depicted in the second image?
The problem is performing a quick-sort using Message Passing Interface. The solution is performing quick-sort on array sections obtained by using MPI_Scatter() then joining the sorted
pieces using MPI_Gather().
Problem is that the array as a whole is unsorted but sections of it are.
Merging the sub-sections similarly to this solution seems like the best way of sorting the array, but considering that the sub-arrays are already within a single array other sorting algorithms may prove better.
The inputs for a sort function would be the array, it's length and the number of equally sorted sub-sections.
A signature would look something like int* sort(int* array, int length, int sections);
The sections parameter can have any value between 1 and 25. The length parameter value is greater than 0, a multiple of sections and smaller than 2^32.
This is what I am currently using:
int* merge(int* input, int length, int sections)
{
int* sub_sections_indices = new int[sections];
int* result = new int[length];
int section_size = length / sections;
for (int i = 0; i < sections; i++) //initialisation
{
sub_sections_indices[i] = 0;
}
int min, min_index, current_index;
for (int i = 0; i < length; i++) //merging
{
min_index = 0;
min = INT_MAX;
for (int j = 0; j < sections; j++)
{
if (sub_sections_indices[j] < section_size)
{
current_index = j * section_size + sub_sections_indices[j];
if (input[current_index] < min)
{
min = input[current_index];
min_index = j;
}
}
}
sub_sections_indices[min_index]++;
result[i] = min;
}
return result;
}

Optimizing for performance
I think this answer that maintains a min-heap of the smallest item of each sub-array is the best way to handle arbitrary input. However, for small values of k, think somewhere between 10 and 100, it might be faster to implement the more naive solutions given in the question you linked to; while maintaining the min-heap is only O(log n) for each step, it might have a higher overhead for small values of n than the simple linear scan from the naive solutions.
All these solutions create a copy of the input, and they maintain O(k) state.
Optimizing for space
The only way to save space I see is to sort in-place. This will be a problem for the algorithms mentioned above. An in-place algorithm will have two swap elements, but any swaps will likely destroy the property that each sub-array is sorted, unless the larger of the swapped pair is re-sorted into the sub-array it is being swapped to, which will result in an O(n²) algorithm. So if you really do need to conserve memory, I think a regular in-place sorting algorithm would have to be used, which defeats your purpose.

Related

Efficient algorithm to produce closest triplet from 3 arrays?

I need to implement an algorithm in C++ that, when given three arrays of unequal sizes, produces triplets a,b,c (one element contributed by each array) such that max(a,b,c) - min(a,b,c) is minimized. The algorithm should produce a list of these triplets, in order of size of max(a,b,c)-min(a,b,c). The arrays are sorted.
I've implemented the following algorithm (note that I now use arrays of type double), however it runs excruciatingly slow (even when compiled using GCC with -03 optimization, and other combinations of optimizations). The dataset (and, therefore, each array) has potentially tens of millions of elements. Is there a faster/more efficient method? A significant speed increase is necessary to accomplish the required task in a reasonable time frame.
void findClosest(vector<double> vec1, vector<double> vec2, vector<double> vec3){
//calculate size of each array
int len1 = vec1.size();
int len2 = vec2.size();
int len3 = vec3.size();
int i = 0; int j = 0; int k = 0; int res_i, res_j, res_k;
int diff = INT_MAX;
int iter = 0; int iter_bound = min(min(len1,len2),len3);
while(iter < iter_bound)
while(i < len1 && j < len2 && k < len3){
int minimum = min(min(vec1[i], vec2[j]), vec3[k]);
int maximum = max(max(vec1[i], vec2[j]), vec3[k]);
//if new difference less than previous difference, update difference, store
//resultants
if(fabs(maximum - minimum) < diff){ diff = maximum-minimum; res_i = i; res_j = j; res_k = k;}
//increment minimum value
if(vec1[i] == minimum) ++i;
else if(vec2[j] == minimum) ++j;
else ++k;
}
//"remove" triplet
vec1.erase(vec1.begin() + res_i);
vec2.erase(vec2.begin() + res_j);
vec3.erase(vec3.begin() + res_k);
--len1; --len2; --len3;
++iter_bound;
}
OK, you're going to need to be clever in a few ways to make this run well.
The first thing that you need is a priority queue, which is usually implemented with a heap. With that, the algorithm in pseudocode is:
Make a priority queue for possible triples in order of max - min, then how close median is to their average.
Make a pass through all 3 arrays, putting reasonable triples for every element into the priority queue
While the priority queue is not empty:
Pull a triple out
If all three of the triple are not used:
Add triple to output
Mark the triple used
else:
If you can construct reasonable triplets for unused elements:
Add them to the queue
Now for this operation to succeed, you need to efficiently find elements that are currently unused. Doing that at first is easy, just keep an array of bools where you mark off the indexes of the used values. But once a lot have been taken off, your search gets long.
The trick for that is to have a vector of bools for individual elements, a second for whether both in a pair have been used, a third for where all 4 in a quadruple have been used and so on. When you use an element just mark the individual bool, then go up the hierarchy, marking off the next level if the one you're paired with is marked off, else stopping. This additional data structure of size 2n will require an average of marking 2 bools per element used, but allows you to find the next unused index in either direction in at most O(log(n)) steps.
The resulting algorithm will be O(n log(n)).

Optimize counting sort?

Given that the input will be N numbers from 0 to N (with duplicates) how I can optimize the code bellow for both small and big arrays:
void countingsort(int* input, int array_size)
{
int max_element = array_size;//because no number will be > N
int *CountArr = new int[max_element+1]();
for (int i = 0; i < array_size; i++)
CountArr[input[i]]++;
for (int j = 0, outputindex = 0; j <= max_element; j++)
while (CountArr[j]--)
input[outputindex++] = j;
delete []CountArr;
}
Having a stable sort is not a requirement.
edit: In case it's not clear, I am talking about optimizing the algorithm.
IMHO there's nothing wrong here. I highly recommend this approach when max_element is small, numbers sorted are non sparse (i.e. consecutive and no gaps) and greater than or equal to zero.
A small tweak, I'd replace new / delete and just declare a finite array using heap, e.g. 256 for max_element.
int CountArr[256] = { }; // Declare and initialize with zeroes
As you bend these rules, i.e. sparse, negative numbers you'd be struggling with this approach. You will need to find an optimal hashing function to remap the numbers to your efficient array. The more complex the hashing becomes the benefit between this over well established sorting algorithms diminishes.
In terms of complexity this cannot be beaten. It's O(N) and beats standard O(NlogN) sorting by exploiting the extra knowledge that 0<x<N. You cannot go below O(N) because you need at least to swipe through the input array once.

Array balancing point

What is the best way to solve this?
A balancing point of an N-element array A is an index i such that all elements on lower indexes have values <= A[i] and all elements on higher indexes have values higher or equal A[i].
For example, given:
A[0]=4 A[1]=2 A[2]=7 A[3]=11 A[4]=9
one of the correct solutions is: 2. All elements below A[2] is less than A[2], all elements after A[2] is more than A[2].
One solution that appeared to my mind is O(nsquare) solution. Is there any better solution?
Start by assuming A[0] is a pole. Then start walking the array; comparing each element A[i] in turn against A[0], and also tracking the current maximum.
As soon as you find an i such that A[i] < A[0], you know that A[0] can no longer be a pole, and by extension, neither can any of the elements up to and including A[i]. So now continue walking until you find the next value that's bigger than the current maximum. This then becomes the new proposed pole.
Thus, an O(n) solution!
In code:
int i_pole = 0;
int i_max = 0;
bool have_pole = true;
for (int i = 1; i < N; i++)
{
if (A[i] < A[i_pole])
{
have_pole = false;
}
if (A[i] > A[i_max])
{
i_max = i;
if (!have_pole)
{
i_pole = i;
}
have_pole = true;
}
}
If you want to know where all the poles are, an O(n log n) solution would be to create a sorted copy of the array, and look to see where you get matching values.
EDIT: Sorry, but this doesn't actually work. One counterexample is [2, 5, 3, 1, 4].
Make two auxiliary arrays, each with as many elements as the input array, called MIN and MAX.
Each element M of MAX contains the maximum of all the elements in the input from 0..M. Each element M of MIN contains the minimum of all the elements in the input from M..N-1.
For each element M of the input array, compare its value to the corresponding values in MIN and MAX. If INPUT[M] == MIN[M] and INPUT[M] == MAX[M] then M is a balancing point.
Building MIN takes N steps, and so does MAX. Testing the array then takes N more steps. This solution has O(N) complexity and finds all balancing points. In the case of sorted input every element is a balancing point.
Create a double-linked list such as i-th node of this list contains A[i] and i. Traverse this list while elements grow (counting maximum of these elements). If some A[bad] < maxSoFar it can't be MP. Remove it and go backward removing elements until you find A[good] < A[bad] or reach the head of the list. Continue (starting with maxSoFar as maximum) until you reach end of the list. Every element in result list is MP and every MP is in this list. Complexity is O(n) since is maximum of steps is performed for descending array - n steps forward and n removals.
Update
Oh my, I confused "any" with "every" in problem definition :).
You can combine bmcnett's and Oli's answers to find all the poles as quickly as possible.
std::vector<int> i_poles;
i_poles.push_back(0);
int i_max = 0;
for (int i = 1; i < N; i++)
{
while (!i_poles.empty() && A[i] < A[i_poles.back()])
{
i_poles.pop_back();
}
if (A[i] >= A[i_max])
{
i_poles.push_back(i);
}
}
You could use an array preallocated to size N if you wanted to avoid reallocations.

Merge sorted arrays - Efficient solution

Goal here is to merge multiple arrays which are already sorted into a resultant array.
I've written the following solution and wondering if there is a way to improve the solution
/*
Goal is to merge all sorted arrays
*/
void mergeAll(const vector< vector<int> >& listOfIntegers, vector<int>& result)
{
int totalNumbers = listOfIntegers.size();
vector<int> curpos;
int currow = 0 , minElement , foundMinAt = 0;
curpos.reserve(totalNumbers);
// Set the current position that was travered to 0 in all the array elements
for ( int i = 0; i < totalNumbers; ++i)
{
curpos.push_back(0);
}
for ( ; ; )
{
/* Find the first minimum
Which is basically the first element in the array that hasn't been fully traversed
*/
for ( currow = 0 ; currow < totalNumbers ; ++currow)
{
if ( curpos[currow] < listOfIntegers[currow].size() )
{
minElement = listOfIntegers[currow][curpos[currow] ];
foundMinAt = currow;
break;
}
}
/* If all the elements were traversed in all the arrays, then no further work needs to be done */
if ( !(currow < totalNumbers ) )
break;
/*
Traverse each of the array and find out the first available minimum value
*/
for ( ;currow < totalNumbers; ++currow)
{
if ( listOfIntegers[currow][curpos[currow] ] < minElement )
{
minElement = listOfIntegers[currow][curpos[currow] ];
foundMinAt = currow;
}
}
/*
Store the minimum into the resultant array
and increment the element traversed
*/
result.push_back(minElement);
++curpos[foundMinAt];
}
}
The corresponding main goes like this.
int main()
{
vector< vector<int> > myInt;
vector<int> result;
myInt.push_back(vector<int>() );
myInt.push_back(vector<int>() );
myInt.push_back(vector<int>() );
myInt[0].push_back(10);
myInt[0].push_back(12);
myInt[0].push_back(15);
myInt[1].push_back(20);
myInt[1].push_back(21);
myInt[1].push_back(22);
myInt[2].push_back(14);
myInt[2].push_back(17);
myInt[2].push_back(30);
mergeAll(myInt,result);
for ( int i = 0; i < result.size() ; ++i)
{
cout << result[i] << endl;
}
}
You can generalize Merge Sort algorithm and work with multiple pointers. Initially, all of them are pointing to the beginning of each array. You maintain these pointers sorted (by the values they point to) in a priority queue. In each step, you remove the smallest element in the heap in O(log n) (n is the number of arrays). You then output the element pointed by the extracted pointer. Now you increment this pointer in one position and if you didn't reach the end of the array, reinsert in the priority queue in O(log n). Proceed this way until the heap is not empty. If there are a total of m elements, the complexity is O(m log n). The elements are output in sorted order this way.
Perhaps I'm misunderstanding the question...and I feel like I'm misunderstanding your solution.
That said, maybe this answer is totally off-base and not helpful.
But, especially with the number of vectors and push_back's you're already using, why do you not just use std::sort?
#include <algorithm>
void mergeAll(const vector<vector<int>> &origList, vector<int> &resultList)
{
for(int i = 0; i < origList.size(); ++i)
{
resultList.insert(resultList.end(), origList[i].begin(), origList[i].end());
}
std::sort(resultList.begin(), resultList.end());
}
I apologize if this is totally off from what you're looking for. But it's how I understood the problem and the solution.
std::sort runs in O(N log (N)) http://www.cppreference.com/wiki/stl/algorithm/sort
I've seen some solution on the internet to merge two sorted arrays, but most of them were quite cumbersome. I changed some of the logic to provide the shortest version I can come up with:
void merge(const int list1[], int size1, const int list2[], int size2, int list3[]) {
// Declaration & Initialization
int index1 = 0, index2 = 0, index3 = 0;
// Loop untill both arrays have reached their upper bound.
while (index1 < size1 || index2 < size2) {
// Make sure the first array hasn't reached
// its upper bound already and make sure we
// don't compare outside bounds of the second
// array.
if ((list1[index1] <= list2[index2] && index1 < size1) || index2 >= size2) {
list3[index3] = list1[index1];
index1++;
}
else {
list3[index3] = list2[index2];
index2++;
}
index3++;
}
}
If you want to take advantage of multi-threading then a fairly good solution would be to just merge 2 lists at a time.
ie suppose you have 9 lists.
merge list 0 with 1.
merge list 2 with 3.
merge list 4 with 5.
merge list 6 with 7.
These can be performed concurrently.
Then:
merge list 0&1 with 2&3
merge list 4&5 with 6&7
Again these can be performed concurrently.
then merge list 0,1,2&3 with list 4,5,6&7
finally merge list 0,1,2,3,4,5,6&7 with list 8.
Job done.
I'm not sure on the complexity of that but it seems the obvious solution and DOES have the bonus of being multi-threadable to some extent.
Consider the priority-queue implementation in this answer linked in a comment above: Merging 8 sorted lists in c++, which algorithm should I use
It's O(n lg m) time (where n = total number of items and m = number of lists).
All you need is two pointers (or just int index counters), checking for minimum between array A and B, copying the value over to the resultant list, and incrementing the pointer of the array the minimum came from. If you run out of elements on one source array, copy the remainder of the second to the resultant and you're done.
Edit:
You can trivially expand this to N arrays.
Edit:
Don't trivially expand this to N arrays :-). Do two at a time. Silly me.
If you are merging very many vector together, then you could speed up performance by using a sort of tree to determine which vector contains the smallest element. This is probably not necessary for your application, but comment if it is and I'll try to work it out.
You could just stick them all into a multiset. That will handle the sorting for you.

Finding smallest value in an array most efficiently

There are N values in the array, and one of them is the smallest value. How can I find the smallest value most efficiently?
If they are unsorted, you can't do much but look at each one, which is O(N), and when you're done you'll know the minimum.
Pseudo-code:
small = <biggest value> // such as std::numerical_limits<int>::max
for each element in array:
if (element < small)
small = element
A better way reminded by Ben to me was to just initialize small with the first element:
small = element[0]
for each element in array, starting from 1 (not 0):
if (element < small)
small = element
The above is wrapped in the algorithm header as std::min_element.
If you can keep your array sorted as items are added, then finding it will be O(1), since you can keep the smallest at front.
That's as good as it gets with arrays.
You need too loop through the array, remembering the smallest value you've seen so far. Like this:
int smallest = INT_MAX;
for (int i = 0; i < array_length; i++) {
if (array[i] < smallest) {
smallest = array[i];
}
}
The stl contains a bunch of methods that should be used dependent to the problem.
std::find
std::find_if
std::count
std::find
std::binary_search
std::equal_range
std::lower_bound
std::upper_bound
Now it contains on your data what algorithm to use.
This Artikel contains a perfect table to help choosing the right algorithm.
In the special case where min max should be determined and you are using std::vector or ???* array
std::min_element
std::max_element
can be used.
If you want to be really efficient and you have enough time to spent, use SIMD instruction.
You can compare several pairs in one instruction:
r0 := min(a0, b0)
r1 := min(a1, b1)
r2 := min(a2, b2)
r3 := min(a3, b3)
__m64 _mm_min_pu8(__m64 a , __m64 b );
Today every computer supports it. Other already have written min function for you:
http://smartdata.usbid.com/datasheets/usbid/2001/2001-q1/i_minmax.pdf
or use already ready library.
If the array is sorted in ascending or descending order then you can find it with complexity O(1).
For an array of ascending order the first element is the smallest element, you can get it by arr[0] (0 based indexing).
If the array is sorted in descending order then the last element is the smallest element,you can get it by arr[sizeOfArray-1].
If the array is not sorted then you have to iterate over the array to get the smallest element.In this case time complexity is O(n), here n is the size of array.
int arr[] = {5,7,9,0,-3,2,3,4,56,-7};
int smallest_element=arr[0] //let, first element is the smallest one
for(int i =1;i<sizeOfArray;i++)
{
if(arr[i]<smallest_element)
{
smallest_element=arr[i];
}
}
You can calculate it in input section (when you have to find smallest element from a given array)
int smallest_element;
int arr[100],n;
cin>>n;
for(int i = 0;i<n;i++)
{
cin>>arr[i];
if(i==0)
{
smallest_element=arr[i]; //smallest_element=arr[0];
}
else if(arr[i]<smallest_element)
{
smallest_element = arr[i];
}
}
Also you can get smallest element by built in function
#inclue<algorithm>
int smallest_element = *min_element(arr,arr+n); //here n is the size of array
You can get smallest element of any range by using this function
such as,
int arr[] = {3,2,1,-1,-2,-3};
cout<<*min_element(arr,arr+3); //this will print 1,smallest element of first three element
cout<<*min_element(arr+2,arr+5); // -2, smallest element between third and fifth element (inclusive)
I have used asterisk (*), before min_element() function. Because it returns pointer of smallest element.
All codes are in c++.
You can find the maximum element in opposite way.
Richie's answer is close. It depends upon the language. Here is a good solution for java:
int smallest = Integer.MAX_VALUE;
int array[]; // Assume it is filled.
int array_length = array.length;
for (int i = array_length - 1; i >= 0; i--) {
if (array[i] < smallest) {
smallest = array[i];
}
}
I go through the array in reverse order, because comparing "i" to "array_length" in the loop comparison requires a fetch and a comparison (two operations), whereas comparing "i" to "0" is a single JVM bytecode operation. If the work being done in the loop is negligible, then the loop comparison consumes a sizable fraction of the time.
Of course, others pointed out that encapsulating the array and controlling inserts will help. If getting the minimum was ALL you needed, keeping the list in sorted order is not necessary. Just keep an instance variable that holds the smallest inserted so far, and compare it to each value as it is added to the array. (Of course, this fails if you remove elements. In that case, if you remove the current lowest value, you need to do a scan of the entire array to find the new lowest value.)
An O(1) sollution might be to just guess: The smallest number in your array will often be 0. 0 crops up everywhere. Given that you are only looking at unsigned numbers. But even then: 0 is good enough. Also, looking through all elements for the smallest number is a real pain. Why not just use 0? It could actually be the correct result!
If the interviewer/your teacher doesn't like that answer, try 1, 2 or 3. They also end up being in most homework/interview-scenario numeric arrays...
On a more serious side: How often will you need to perform this operation on the array? Because the sollutions above are all O(n). If you want to do that m times to a list you will be adding new elements to all the time, why not pay some time up front and create a heap? Then finding the smallest element can really be done in O(1), without resulting to cheating.
If finding the minimum is a one time thing, just iterate through the list and find the minimum.
If finding the minimum is a very common thing and you only need to operate on the minimum, use a Heap data structure.
A heap will be faster than doing a sort on the list but the tradeoff is you can only find the minimum.
If you're developing some kind of your own array abstraction, you can get O(1) if you store smallest added value in additional attribute and compare it every time a new item is put into array.
It should look something like this:
class MyArray
{
public:
MyArray() : m_minValue(INT_MAX) {}
void add(int newValue)
{
if (newValue < m_minValue) m_minValue = newValue;
list.push_back( newValue );
}
int min()
{
return m_minValue;
}
private:
int m_minValue;
std::list m_list;
}
//find the min in an array list of #s
$array = array(45,545,134,6735,545,23,434);
$smallest = $array[0];
for($i=1; $i<count($array); $i++){
if($array[$i] < $smallest){
echo $array[$i];
}
}
//smalest number in the array//
double small = x[0];
for(t=0;t<x[t];t++)
{
if(x[t]<small)
{
small=x[t];
}
}
printf("\nThe smallest number is %0.2lf \n",small);
Procedure:
We can use min_element(array, array+size) function . But it iterator
that return the address of minimum element . If we use *min_element(array, array+size) then it will return the minimum value of array.
C++ implementation
#include<bits/stdc++.h>
using namespace std;
int main()
{
int num;
cin>>num;
int arr[10];
for(int i=0; i<num; i++)
{
cin>>arr[i];
}
cout<<*min_element(arr,arr+num)<<endl;
return 0;
}
int small=a[0];
for (int x: a.length)
{
if(a[x]<small)
small=a[x];
}
C++ code
#include <iostream>
using namespace std;
int main() {
int n = 5;
int arr[n] = {12,4,15,6,2};
int min = arr[0];
for (int i=1;i<n;i++){
if (min>arr[i]){
min = arr[i];
}
}
cout << min;
return 0;
}