Iterative Merge Sort, works same speed as Bubblesort - c++

I have tried to implement an iterative Merge sort using nested loops. Although this algorithm does sort correctly (as in after sorting things are in correct order), I know there is something wrong with this implementation as I tried to sort larger collections with it and compare timings with slower sorts, and I end up getting slow times for this iterative implementation. For example, sorting 500 items gives a time of 31 milliseconds with this implementation just like bubble sort does.
int main()
{
int size;
cin >> size;
//assume vector is already initialized with values & size
vector<int> items(size);
IterativeMergeSort(items, 0, size - 1);
}
void IterativeMergeSort(vector<int> &items, int start, int end)
{
vector<int> temp(items.size());
int left, middle, right;
for(int outer = 1; outer < 2; outer *= 2)
{
for(int inner = start; inner < end; inner = inner * outer + 1)
{
left = outer - 1;
middle = inner;
right = inner + 1;
ItMerge(items, left, middle, right, temp);
}
}
}
void ItMerge(vector<int> &items, int start, int mid, int end, vector<int> &temp)
{
int first1 = start;
int last1 = mid;
int first2 = mid + 1;
int last2 = end;
int index = first1;
while(first1 <= last1 && first2 <= last2)
{
if(items[first1] <= items[first2])
{
temp[index] = items[first1];
first1++;
}
else
{
temp[index] = items[first2];
first2++;
}
index++;
}
while(first1 <= last1)
{
temp[index] = items[first1];
first1++;
index++;
}
while(first2 <= last2)
{
temp[index] = items[first2];
first2++;
index++;
}
for(index = start; index <= end; index++)
{
items[index] = temp[index];
}
}

Your algorithm isn't merge sort. It tries to be, but it isn't.
As I understand it, what is supposed to happen is that the inner loop steps over subsequences and merges them, while the outer loop controls the inner loop's sequence length, starting with 1 and doubling on every iteration until there are just two subsequences and they get merged.
But that's not what your algorithm is doing. The outer loop's condition is broken, so the outer loop will run exactly once. And the inner loop doesn't take roughly-equal subsequences in pairs. Instead, the right subsequence is exactly one element (mid is inner, right is inner+1) and the left subsequence is always everything used so far (left is outer-1, and outer is constant 1). So the algorithm will repeatedly merge the already-sorted left subsequence with a single-element right subsequence.
This means that in effect, your algorithm is insertion sort, except that you don't insert in place, but instead copy the sorted sequence to a buffer, inserting the new element at the right moment, then copy the result back. So it's a very inefficient insertion sort.

Below is a link to somewhat optimized examples of top down and bottom up merge sort. The bottom up merge sort is a bit faster because it skips the recursive sequence used to repeated generate sub-pairs of indexes until a sub-pair represents a run of size 1. Most of the time is spent merging, so bottom up isn't that much faster. The first pass of the bottom up merge pass could be optimized by swapping pairs in place rather than copying them. The bottom up merge sort ends up with the sorted data in either the temp or original array. If the original array is wanted, then a pass count can be calculated and if the count is odd, then the first pass swaps in place.
Both versions can sort 4 million 64 bit unsigned integers in less than a second on my system (Intel Core i7 2600k 3.4ghz).
merge_sort using vectors works well with less than 9 inputs
For a vector or array of integers, a counting / radix sort would be faster still.

I've finally figured it out.
In pseudocode:
for( outer = 1, outer < length, outer *=2)
for(inner = 0; inner < length, inner = inner + (outer *2))
left = inner
middle = (inner + outer) - 1
right = (inner + (outer * 2)) - 1
merge(items, left, middle, right, temp)
After rethinking how the iterative merge sort is supposed to work and looking at a couple implementations, in the merge method, all I needed was to check if the middle and right indexes passed in were greater than or equal to the vector size (that way we handle any values that could out of bounds), then merge as usual. Also, looking at this helped greatly understand it; also this. Just to be sure that it works as well as a recursive Merge Sort, I did timings on both and both (recursive and iterative) implementations produced identical times for 500,1000,5000, and 10K values to sort (in some cases the iterative solution produced a faster time).

Related

I have a question about merge sort algorithm

I've looked at the merge sort example code, but there's something I don't understand.
void mergesort(int left, int right)
{
if (left < right)
{
int sorted[LEN];
int mid, p1, p2, idx;
mid = (left + right) / 2;
mergesort(left, mid);
mergesort(mid + 1, right);
p1 = left;
p2 = mid + 1;
idx = left;
while (p1 <= mid && p2 <= right)
{
if (arr[p1] < arr[p2])
sorted[idx++] = arr[p1++];
else
sorted[idx++] = arr[p2++];
}
while (p1 <= mid)
sorted[idx++] = arr[p1++];
while (p2 <= right)
sorted[idx++] = arr[p2++];
for (int i = left; i <= right; i++)
arr[i] = sorted[i];
}
}
In this code, I don't know about a third while loop.
In detail, This code inserts p1, p2 in order into the 'sorted array'.
I want to know how this while loop creates an ascending array.
I would appreciate it if you could write your answer in detail so that I can understand it.
why the array is sorted in ascending order
Merge sort divides an array of n elements into n runs of 1 element each. Each of those single element runs can be considered to be sorted since they only contain a single element. Pairs of single element runs are merged to create sorted runs of 2 elements each. Pairs of 2 element runs are merged to create sorted runs of 4 elements each. The process continues until a sorted run equal the size of the original array is created.
The example in the question is a top down merge sort, that recursively splits the array in half until a base case of a single element run is reached. After this, merging follows the call chain, depth first left first. Most libraries use some variation of bottom up merge sort (along with insertion sort used to detect or create small sorted runs). With a bottom up merge sort, there's no recursive splitting, an array of n elements is treated as n runs of 1 element each, and starts merging even and odd runs, left to right, in a merge pass. After ceiling(log2(n)) passes, the array is sorted.
The example code has an issue, it allocates an entire array on the stack for each level of recursion which will result in stack overflow for large arrays. The Wiki examples are better, although the bottom up example should swap references rather than copy the array.
https://en.wikipedia.org/wiki/Merge_sort
For the question's code, might as well have sorted as a global array, or at least declared as static (a single instance):
static int arr[LEN];
static int sorted[LEN];
void mergesort(int left, int right)
/* ... */
I'm a developer working in the field.
I was surprised to see you embodying merge sort.
Before we start, the time complexity of the merge sort is O(nlogn).
The reason can be found in the merge sort process!
First, let's assume that there is an unordered array.
Merger sorting process:
Divide it into an array of 1 size by the number of size of the array.
Create an array that is twice the size of the divided array.
Compare the elements of the two divided arrays and put the smaller elements in order in the created array.
Repeat this process until it reaches the size of the original array.
merge sort img
There is a reason why the time complexity of the merge sort is O(nLogn).
In this process, the time complexity of log is obtained because the array is continuously divided by half, and the time complexity of nlogn is obtained because the process is performed by a total of n times.

Efficient algorithm to produce closest triplet from 3 arrays?

I need to implement an algorithm in C++ that, when given three arrays of unequal sizes, produces triplets a,b,c (one element contributed by each array) such that max(a,b,c) - min(a,b,c) is minimized. The algorithm should produce a list of these triplets, in order of size of max(a,b,c)-min(a,b,c). The arrays are sorted.
I've implemented the following algorithm (note that I now use arrays of type double), however it runs excruciatingly slow (even when compiled using GCC with -03 optimization, and other combinations of optimizations). The dataset (and, therefore, each array) has potentially tens of millions of elements. Is there a faster/more efficient method? A significant speed increase is necessary to accomplish the required task in a reasonable time frame.
void findClosest(vector<double> vec1, vector<double> vec2, vector<double> vec3){
//calculate size of each array
int len1 = vec1.size();
int len2 = vec2.size();
int len3 = vec3.size();
int i = 0; int j = 0; int k = 0; int res_i, res_j, res_k;
int diff = INT_MAX;
int iter = 0; int iter_bound = min(min(len1,len2),len3);
while(iter < iter_bound)
while(i < len1 && j < len2 && k < len3){
int minimum = min(min(vec1[i], vec2[j]), vec3[k]);
int maximum = max(max(vec1[i], vec2[j]), vec3[k]);
//if new difference less than previous difference, update difference, store
//resultants
if(fabs(maximum - minimum) < diff){ diff = maximum-minimum; res_i = i; res_j = j; res_k = k;}
//increment minimum value
if(vec1[i] == minimum) ++i;
else if(vec2[j] == minimum) ++j;
else ++k;
}
//"remove" triplet
vec1.erase(vec1.begin() + res_i);
vec2.erase(vec2.begin() + res_j);
vec3.erase(vec3.begin() + res_k);
--len1; --len2; --len3;
++iter_bound;
}
OK, you're going to need to be clever in a few ways to make this run well.
The first thing that you need is a priority queue, which is usually implemented with a heap. With that, the algorithm in pseudocode is:
Make a priority queue for possible triples in order of max - min, then how close median is to their average.
Make a pass through all 3 arrays, putting reasonable triples for every element into the priority queue
While the priority queue is not empty:
Pull a triple out
If all three of the triple are not used:
Add triple to output
Mark the triple used
else:
If you can construct reasonable triplets for unused elements:
Add them to the queue
Now for this operation to succeed, you need to efficiently find elements that are currently unused. Doing that at first is easy, just keep an array of bools where you mark off the indexes of the used values. But once a lot have been taken off, your search gets long.
The trick for that is to have a vector of bools for individual elements, a second for whether both in a pair have been used, a third for where all 4 in a quadruple have been used and so on. When you use an element just mark the individual bool, then go up the hierarchy, marking off the next level if the one you're paired with is marked off, else stopping. This additional data structure of size 2n will require an average of marking 2 bools per element used, but allows you to find the next unused index in either direction in at most O(log(n)) steps.
The resulting algorithm will be O(n log(n)).

Why do these two variations on the "quick sorting" algorithm differ so much in performance?

I initially thought up some sorting algorithm to code in C++ for practice. People told me it's very inefficient (indeed, sorting a few hundred numbers took ~10 seconds). The algorithm was to remember the first element ("pivot") in a vector, then parse through every other element, moving each element to the left of the pivot if it is smaller, or not do anything otherwise. This would split the list into to smaller lists to sort; the rest is done through recursion.
So now I know that dividing the list into two and doing recursions like this is essentially what quicksorting does (although there are a lot of variations on how to do the partitioning). I didn't understand why my original code was so inefficient, so I wrote up a new one. Someone had mentioned that it is because of the insert() and erase() functions, so I made sure to not use those, but instead used swap().
Old (slow):
void sort(vector<T>& vec){
int size = vec.size();
if (size <= 1){ //this is the most basic case
return;
}
T pivot = vec[0];
int index = 0; //to help split the list later
for (int i = 1; i < size; ++i){ //moving (or not moving) the elements
if (vec[i] < pivot){
vec.insert(vec.begin(), vec[i]);
vec.erase(vec.begin() + i + 1);
++index;
}
}
if (index == 0){ //in case the 0th element is the smallest
vec.erase(vec.begin());
sort(vec);
vec.insert(vec.begin(), pivot);
}
else if(index == size - 1){ //in case the 0th element is the largest
vec.pop_back();
sort(vec);
vec.push_back(pivot);
}
//here is the main recursive portion
vector<T> left = vector<T>(vec.begin(), vec.begin() + index);
sort(left);
vector<T> right = vector<T>(vec.begin() + index + 1, vec.end());
sort(right);
//concatenating the sorted lists together
left.push_back(pivot);
left.insert(left.end(), right.begin(), right.end());
vec = left;
}
new (fast):
template <typename T>
void quickSort(vector<T>& vec, const int& left, const int& right){
if (left >= right){ //basic case
return;
}
T pivot = vec[left];
int j = left; //j will be the final index of the pivot before the next iteration
for (int i = left + 1; i <= right; ++i){
if (vec[i] < pivot){
swap(vec[i], vec[j]); //swapping the pivot and lesser element
++j;
swap(vec[i], vec[j]); //sending the pivot next to its original spot so it doesn't go the to right of any greater element
}
}
//recursion
quickSort(vec, left, j - 1);
quickSort(vec, j + 1, right);
}
The difference in performance is insane; the newer version can sort through tens of thousands of numbers in less than a second, while the first one can't do that with 100 numbers. What are erase() and insert() doing to slow it down, exactly? Is it really the erase() and insert() causing the bottleneck, or is there something else I am missing?
First of all, yes, insert() and erase() will be much slower than swap().
insert() will, in the best case, require every element after the spot where you're inserting into the vector to be moved to the next spot in the vector. Think about what happens if you shove yourself into the middle of a crowded line of people - everyone behind you will have to take one step back to make room for you. In the worst case, because inserting into the vector increases the vector's size, the vector may run out of space in its current memory location, leading to the entire vector (element by element) being copied into a new space where it has room to accommodate the newly inserted item. When an element in the middle of a vector is erase()'d, every element after it must be copied and moved up one space; just like how everyone behind you in a line would take one step up if you left said line. In comparison, swap() only moves the two elements being swapped.
In addition to that, I also noticed another major efficiency improvement between the two code samples:
In the first code sample, you have:
vector<T> left = vector<T>(vec.begin(), vec.begin() + index);
sort(left);
vector<T> right = vector<T>(vec.begin() + index + 1, vec.end());
sort(right);
which uses the range constructor of C++ vectors. Every time the code reaches this point, when it creates left and right, it is traversing the entirety of vec and copying each element one-by-one into the two new vectors.
In the newer, faster code, none of the elements are ever copied into a new vector; the entire algorithm takes place in the exact memory space in which the original numbers existed.
Vectors are arrays, so inserting and deleting elements in places other than the end position is done by relocate all the elements that were after position to their new positions.

Sorting Optimization

I'm currently following an algorithms class and thus decided it would be good practice to implement a few of the sorting algorithms and compare them.
I implemented merge sort and quick sort and then compared their run time, along with the std::sort:
My computer isn't the fastest but for 1000000 elements I get on average after 200 attempts:
std::sort -> 0.620342 seconds
quickSort -> 2.2692
mergeSort -> 2.19048
I would like to ask if possible for comments on how to improve and optimize the implementation of my code.
void quickSort(std::vector<int>& nums, int s, int e, std::function<bool(int,int)> comparator = defaultComparator){
if(s >= e)
return;
int pivot;
int a = s + (rand() % (e-s));
int b = s + (rand() % (e-s));
int c = s + (rand() % (e-s));
//find median of the 3 random pivots
int min = std::min(std::min(nums[a],nums[b]),nums[c]);
int max = std::max(std::max(nums[a],nums[b]),nums[c]);
if(nums[a] < max && nums[a] > min)
pivot = a;
else if(nums[b] < max && nums[b] > min)
pivot = b;
else
pivot = c;
int temp = nums[s];
nums[s] = nums[pivot];
nums[pivot] = temp;
//partition
int i = s + 1, j = s + 1;
for(; j < e; j++){
if(comparator(nums[j] , nums[s])){
temp = nums[i];
nums[i++] = nums[j];
nums[j] = temp;
}
}
temp = nums[i-1];
nums[i-1] = nums[s];
nums[s] = temp;
//sort left and right of partition
quickSort(nums,s,i-1,comparator);
quickSort(nums,i,e,comparator);
Here s is the index of the first element, e the index of the element after the last. defaultComparator is just the following lambda function:
auto defaultComparator = [](int a, int b){ return a <= b; };
std::vector<int> mergeSort(std::vector<int>& nums, int s, int e, std::function<bool(int,int)> comparator = defaultComparator){
std::vector<int> sorted(e-s);
if(s == e)
return sorted;
int mid = (s+e)/2;
if(s == mid){
sorted[0] = nums[s];
return sorted;
}
std::vector<int> left = mergeSort(nums, s, mid);
std::vector<int> right = mergeSort(nums, mid, e);
unsigned int i = 0, j = 0;
unsigned int c = 0;
while(i < left.size() || j < right.size()){
if(i == left.size()){
sorted[c++] = right[j++];
}
else if(j == right.size()){
sorted[c++] = left[i++];
}
else{
if(comparator(left[i],right[j]))
sorted[c++] = left[i++];
else
sorted[c++] = right[j++];
}
}
return sorted;
Thank you all
The first thing I see, you're passing a std::function<> which involves a virtual call, one of the most expensive calling strategies. Give it a try with simply a template T (which might be a function) - the result will be direct function calls.
Second thing, never do this result-in-local-container (vector<int> sorted;) when optimizing and when in-place variant exists. Do in-place sort. Client should be aware of you shorting their vector; if they wish, they can make a copy in advance. You take non-const reference for a reason. [1]
Third, there's a cost associated with rand() and it's far from negligible. Unless you're sure you need the randomized variant of quicksort() (and its benefits regarding 'no too bad sequence'), use just the first element as pivot. Or the middle.
Use std::swap() to swap two elements. Chances are, it gets translated to xchg (on x86 / x64) or an equivalent, which is hard to beat. Whether the optimizer identifies your intend to swap at these places without being explicit could be verified from assembly output.
The way you found the median of three elements is full of conditional moves / branches. It's simply nums[a] + nums[b] + nums[c] - max - min; but getting nums[...], min and max at the same time could also be optimized further.
Avoid i++ when aiming at speed. While most optimizers will usually create good code, there's a small chance that it's suboptimal. Be explicit when optimizing (++i after the swap), but _only_when_optimizing_.
But the most important one: valgrind/callgrind/kcachegrind. Profile, profile, profile. Only optimize what's really slow.
[1] There's an exception to this rule: const containers that you build from non-const ones. These are usually in-house types and are shared across multiple threads, hence it's better to keep them const & copy when modification is needed. In this case, you'll allocate a new container (either const or not) in your function, but you'll probably keep const one for users' convenience on API.
For quick sort, use Hoare like partition scheme.
http://en.wikipedia.org/wiki/Quicksort#Hoare_partition_scheme
Median of 3 only needs 3 if / swap statements (effectively a bubble sort). No need for min or max check.
if(nums[a] > nums[b])
std::swap(nums[a], nums[b]);
if(nums[b] > nums[c])
std::swap(nums[b], nums[c]);
if(nums[a] > nums[b])
std::swap(nums[a], nums[b]);
// use nums[b] as pivot value
For merge sort, use an entry function that does a one time creation of a working vector, then pass that vector by reference to the actual merge sort function. For top down merge sort, the indices determine the start, middle, and end of each sub-vector.
If using top down merge sort, the code can avoid copying data by alternating the direction of merge depending on the level of recursion. This can be done using two mutually recursive functions, the first one where the result ends up in the original vector, the second one where the result ends up in the working vector. The first one calls the second one twice, then merges from the working vector back to the original vector, and vice versa for the second one. For the second one, if the size == 1, then it needs to copy 1 element from the original vector to the working vector. An alternative to two functions is to pass a boolean for which direction to merge.
If using bottom up merge sort (which will be a bit faster), then each pass swaps vectors. The number of passes needed is determined up front and in the case of an odd number of passes, the first pass swaps in place, so that the data ends up in the original vector after all merge passes are done.

choose n largest elements in two vector

I have two vectors, each contains n unsorted elements, how can I get n largest elements in these two vectors?
my solution is merge two vector into one with 2n elements, and then use std::nth_element algorithm, but I found that's not quite efficient, so anyone has more efficient solution. Really appreciate.
You may push the elements into priority_queue and then pop n elements out.
Assuming that n is far smaller than N this is quite efficient. Getting minElem is cheap and sorted inserting in L cheaper than sorting of the two vectors if n << N.
L := SortedList()
For Each element in any of the vectors do
{
minElem := smallest element in L
if( element >= minElem or if size of L < n)
{
add element to L
if( size of L > n )
{
remove smallest element from L
}
}
}
vector<T> heap;
heap.reserve(n + 1);
vector<T>::iterator left = leftVec.begin(), right = rightVec.begin();
for (int i = 0; i < n; i++) {
if (left != leftVec.end()) heap.push_back(*left++);
else if (right != rightVec.end()) heap.push_back(*right++);
}
if (left == leftVec.end() && right == rightVec.end()) return heap;
make_heap(heap.begin(), heap.end(), greater<T>());
while (left != leftVec.end()) {
heap.push_back(*left++);
push_heap(heap.begin(), heap.end(), greater<T>());
pop_heap(heap.begin(), heap.end(), greater<T>());
heap.pop_back();
}
/* ... repeat for right ... */
return heap;
Note I use *_heap directly rather than priority_queue because priority_queue does not provide access to its underlying data structure. This is O(N log n), slightly better than the naive O(N log N) method if n << N.
You can do the "n'th element" algorithm conceptually in parallel on the two vectors quite easiely (at least the simple variant that's only linear in the average case).
Pick a pivot.
Partition (std::partition) both vectors by that pivot. You'll have the first vector partitioned by some element with rank i and the second by some element with rank j. I'm assuming descending order here.
If i+j < n, recurse on the right side for the n-i-j greatest elements. If i+j > n, recurse on the left side for the n greatest elements. If you hit i+j==n, stop the recursion.
You basically just need to make sure to partition both vectors by the same pivot in every step. Given a decent pivot selection, this algorithm is linear in the average case (and works in-place).
See also: http://en.wikipedia.org/wiki/Selection_algorithm#Partition-based_general_selection_algorithm
Edit: (hopefully) clarified the algorithm a bit.