Why do these two variations on the "quick sorting" algorithm differ so much in performance? - c++

I initially thought up some sorting algorithm to code in C++ for practice. People told me it's very inefficient (indeed, sorting a few hundred numbers took ~10 seconds). The algorithm was to remember the first element ("pivot") in a vector, then parse through every other element, moving each element to the left of the pivot if it is smaller, or not do anything otherwise. This would split the list into to smaller lists to sort; the rest is done through recursion.
So now I know that dividing the list into two and doing recursions like this is essentially what quicksorting does (although there are a lot of variations on how to do the partitioning). I didn't understand why my original code was so inefficient, so I wrote up a new one. Someone had mentioned that it is because of the insert() and erase() functions, so I made sure to not use those, but instead used swap().
Old (slow):
void sort(vector<T>& vec){
int size = vec.size();
if (size <= 1){ //this is the most basic case
return;
}
T pivot = vec[0];
int index = 0; //to help split the list later
for (int i = 1; i < size; ++i){ //moving (or not moving) the elements
if (vec[i] < pivot){
vec.insert(vec.begin(), vec[i]);
vec.erase(vec.begin() + i + 1);
++index;
}
}
if (index == 0){ //in case the 0th element is the smallest
vec.erase(vec.begin());
sort(vec);
vec.insert(vec.begin(), pivot);
}
else if(index == size - 1){ //in case the 0th element is the largest
vec.pop_back();
sort(vec);
vec.push_back(pivot);
}
//here is the main recursive portion
vector<T> left = vector<T>(vec.begin(), vec.begin() + index);
sort(left);
vector<T> right = vector<T>(vec.begin() + index + 1, vec.end());
sort(right);
//concatenating the sorted lists together
left.push_back(pivot);
left.insert(left.end(), right.begin(), right.end());
vec = left;
}
new (fast):
template <typename T>
void quickSort(vector<T>& vec, const int& left, const int& right){
if (left >= right){ //basic case
return;
}
T pivot = vec[left];
int j = left; //j will be the final index of the pivot before the next iteration
for (int i = left + 1; i <= right; ++i){
if (vec[i] < pivot){
swap(vec[i], vec[j]); //swapping the pivot and lesser element
++j;
swap(vec[i], vec[j]); //sending the pivot next to its original spot so it doesn't go the to right of any greater element
}
}
//recursion
quickSort(vec, left, j - 1);
quickSort(vec, j + 1, right);
}
The difference in performance is insane; the newer version can sort through tens of thousands of numbers in less than a second, while the first one can't do that with 100 numbers. What are erase() and insert() doing to slow it down, exactly? Is it really the erase() and insert() causing the bottleneck, or is there something else I am missing?

First of all, yes, insert() and erase() will be much slower than swap().
insert() will, in the best case, require every element after the spot where you're inserting into the vector to be moved to the next spot in the vector. Think about what happens if you shove yourself into the middle of a crowded line of people - everyone behind you will have to take one step back to make room for you. In the worst case, because inserting into the vector increases the vector's size, the vector may run out of space in its current memory location, leading to the entire vector (element by element) being copied into a new space where it has room to accommodate the newly inserted item. When an element in the middle of a vector is erase()'d, every element after it must be copied and moved up one space; just like how everyone behind you in a line would take one step up if you left said line. In comparison, swap() only moves the two elements being swapped.
In addition to that, I also noticed another major efficiency improvement between the two code samples:
In the first code sample, you have:
vector<T> left = vector<T>(vec.begin(), vec.begin() + index);
sort(left);
vector<T> right = vector<T>(vec.begin() + index + 1, vec.end());
sort(right);
which uses the range constructor of C++ vectors. Every time the code reaches this point, when it creates left and right, it is traversing the entirety of vec and copying each element one-by-one into the two new vectors.
In the newer, faster code, none of the elements are ever copied into a new vector; the entire algorithm takes place in the exact memory space in which the original numbers existed.

Vectors are arrays, so inserting and deleting elements in places other than the end position is done by relocate all the elements that were after position to their new positions.

Related

I have a question about merge sort algorithm

I've looked at the merge sort example code, but there's something I don't understand.
void mergesort(int left, int right)
{
if (left < right)
{
int sorted[LEN];
int mid, p1, p2, idx;
mid = (left + right) / 2;
mergesort(left, mid);
mergesort(mid + 1, right);
p1 = left;
p2 = mid + 1;
idx = left;
while (p1 <= mid && p2 <= right)
{
if (arr[p1] < arr[p2])
sorted[idx++] = arr[p1++];
else
sorted[idx++] = arr[p2++];
}
while (p1 <= mid)
sorted[idx++] = arr[p1++];
while (p2 <= right)
sorted[idx++] = arr[p2++];
for (int i = left; i <= right; i++)
arr[i] = sorted[i];
}
}
In this code, I don't know about a third while loop.
In detail, This code inserts p1, p2 in order into the 'sorted array'.
I want to know how this while loop creates an ascending array.
I would appreciate it if you could write your answer in detail so that I can understand it.
why the array is sorted in ascending order
Merge sort divides an array of n elements into n runs of 1 element each. Each of those single element runs can be considered to be sorted since they only contain a single element. Pairs of single element runs are merged to create sorted runs of 2 elements each. Pairs of 2 element runs are merged to create sorted runs of 4 elements each. The process continues until a sorted run equal the size of the original array is created.
The example in the question is a top down merge sort, that recursively splits the array in half until a base case of a single element run is reached. After this, merging follows the call chain, depth first left first. Most libraries use some variation of bottom up merge sort (along with insertion sort used to detect or create small sorted runs). With a bottom up merge sort, there's no recursive splitting, an array of n elements is treated as n runs of 1 element each, and starts merging even and odd runs, left to right, in a merge pass. After ceiling(log2(n)) passes, the array is sorted.
The example code has an issue, it allocates an entire array on the stack for each level of recursion which will result in stack overflow for large arrays. The Wiki examples are better, although the bottom up example should swap references rather than copy the array.
https://en.wikipedia.org/wiki/Merge_sort
For the question's code, might as well have sorted as a global array, or at least declared as static (a single instance):
static int arr[LEN];
static int sorted[LEN];
void mergesort(int left, int right)
/* ... */
I'm a developer working in the field.
I was surprised to see you embodying merge sort.
Before we start, the time complexity of the merge sort is O(nlogn).
The reason can be found in the merge sort process!
First, let's assume that there is an unordered array.
Merger sorting process:
Divide it into an array of 1 size by the number of size of the array.
Create an array that is twice the size of the divided array.
Compare the elements of the two divided arrays and put the smaller elements in order in the created array.
Repeat this process until it reaches the size of the original array.
merge sort img
There is a reason why the time complexity of the merge sort is O(nLogn).
In this process, the time complexity of log is obtained because the array is continuously divided by half, and the time complexity of nlogn is obtained because the process is performed by a total of n times.

Find out in linear time whether there is a pair in sorted vector that adds up to certain value

Given an std::vector of distinct elements sorted in ascending order, I want to develop an algorithm that determines whether there are two elements in the collection whose sum is a certain value, sum.
I've tried two different approaches with their respective trade-offs:
I can scan the whole vector and, for each element in the vector, apply binary search (std::lower_bound) on the vector for searching an element corresponding to the difference between sum and the current element. This is an O(n log n) time solution that requires no additional space.
I can traverse the whole vector and populate an std::unordered_set. Then, I scan the vector and, for each element, I look up in the std::unordered_set for the difference between sum and the current element. Since searching on a hash table runs in constant time on average, this solution runs in linear time. However, this solution requires additional linear space because of the std::unordered_set data structure.
Nevertheless, I'm looking for a solution that runs in linear time and requires no additional linear space. Any ideas? It seems that I'm forced to trade speed for space.
As the std::vector is already sorted and you can calculate the sum of a pair on the fly, you can achieve a linear time solution in the size of the vector with O(1) space.
The following is an STL-like implementation that requires no additional space and runs in linear time:
template<typename BidirIt, typename T>
bool has_pair_sum(BidirIt first, BidirIt last, T sum) {
if (first == last)
return false; // empty range
for (--last; first != last;) {
if ((*first + *last) == sum)
return true; // pair found
if ((*first + *last) > sum)
--last; // decrease pair sum
else // (*first + *last) < sum (trichotomy)
++first; // increase pair sum
}
return false;
}
The idea is to traverse the vector from both ends – front and back – in opposite directions at the same time and calculate the sum of the pair of elements while doing so.
At the very beginning, the pair consists of the elements with the lowest and the highest values, respectively. If the resulting sum is lower than sum, then advance first – the iterator pointing at the left end. Otherwise, move last – the iterator pointing at the right end – backward. This way, the resulting sum progressively approaches to sum. If both iterators end up pointing at the same element and no pair whose sum is equal to sum has been found, then there is no such a pair.
auto main() -> int {
std::vector<int> vec{1, 3, 4, 7, 11, 13, 17};
std::cout << has_pair_sum(vec.begin(), vec.end(), 2) << ' ';
std::cout << has_pair_sum(vec.begin(), vec.end(), 7) << ' ';
std::cout << has_pair_sum(vec.begin(), vec.end(), 19) << ' ';
std::cout << has_pair_sum(vec.begin(), vec.end(), 30) << '\n';
}
The output is:
0 1 0 1
Thanks to the generic nature of the function template has_pair_sum() and since it just requires bidirectional iterators, this solution works with std::list as well:
std::list<int> lst{1, 3, 4, 7, 11, 13, 17};
has_pair_sum(lst.begin(), lst.end(), 2);
I had the same idea as the one in the answer of 眠りネロク, but with a little bit more comprehensible implementation.
bool has_pair_sum(std::vector<int> v, int sum){
if(v.empty())
return false;
std::vector<int>::iterator p1 = v.begin();
std::vector<int>::iterator p2 = v.end(); // points to the End(Null-terminator), after the last element
p2--; // Now it points to the last element.
while(p1 != p2){
if(*p1 + *p2 == sum)
return true;
else if(*p1 + *p2 < sum){
p1++;
}else{
p2--;
}
}
return false;
}
well, since we are already given sorted array, we can do it with two pointer approach, we first keep a left pointer at start of the array and a right pointer at end of array, then in each iteration we check if sum of value of left pointer index and value of right pointer index is equal or not , if yes, return from here, otherwise we have to decide how to reduce the boundary, that is either increase left pointer or decrease right pointer, so we compare the temporary sum with given sum and if this temporary sum is greater than the given sum then we decide to reduce the right pointer, if we increase left pointer the temporary sum will remain same or only increase but never lesser, so we decide to reduce the right pointer so that temporary sum decrease and we reach near our given sum, similary if temporary sum is less than given sum, so no meaning of reducing the right pointer as temporary sum will either remain sum or decrease more but never increase so we increase our left pointer so our temporary sum increase and we reach near given sum, and we do the same process again and again unless we get the equal sum or left pointer index value becomes greater than right right pointer index or vice versa
below is the code for demonstration, let me know if something is not clear
bool pairSumExists(vector<int> &a, int &sum){
if(a.empty())
return false;
int len = a.size();
int left_pointer = 0 , right_pointer = len - 1;
while(left_pointer < right_pointer){
if(a[left_pointer] + a[right_pointer] == sum){
return true;
}
if(a[left_pointer] + a[right_pointer] > sum){
--right_pointer;
}
else
if(a[left_pointer] + a[right_poitner] < sum){
++left_pointer;
}
}
return false;
}

Iterative Merge Sort, works same speed as Bubblesort

I have tried to implement an iterative Merge sort using nested loops. Although this algorithm does sort correctly (as in after sorting things are in correct order), I know there is something wrong with this implementation as I tried to sort larger collections with it and compare timings with slower sorts, and I end up getting slow times for this iterative implementation. For example, sorting 500 items gives a time of 31 milliseconds with this implementation just like bubble sort does.
int main()
{
int size;
cin >> size;
//assume vector is already initialized with values & size
vector<int> items(size);
IterativeMergeSort(items, 0, size - 1);
}
void IterativeMergeSort(vector<int> &items, int start, int end)
{
vector<int> temp(items.size());
int left, middle, right;
for(int outer = 1; outer < 2; outer *= 2)
{
for(int inner = start; inner < end; inner = inner * outer + 1)
{
left = outer - 1;
middle = inner;
right = inner + 1;
ItMerge(items, left, middle, right, temp);
}
}
}
void ItMerge(vector<int> &items, int start, int mid, int end, vector<int> &temp)
{
int first1 = start;
int last1 = mid;
int first2 = mid + 1;
int last2 = end;
int index = first1;
while(first1 <= last1 && first2 <= last2)
{
if(items[first1] <= items[first2])
{
temp[index] = items[first1];
first1++;
}
else
{
temp[index] = items[first2];
first2++;
}
index++;
}
while(first1 <= last1)
{
temp[index] = items[first1];
first1++;
index++;
}
while(first2 <= last2)
{
temp[index] = items[first2];
first2++;
index++;
}
for(index = start; index <= end; index++)
{
items[index] = temp[index];
}
}
Your algorithm isn't merge sort. It tries to be, but it isn't.
As I understand it, what is supposed to happen is that the inner loop steps over subsequences and merges them, while the outer loop controls the inner loop's sequence length, starting with 1 and doubling on every iteration until there are just two subsequences and they get merged.
But that's not what your algorithm is doing. The outer loop's condition is broken, so the outer loop will run exactly once. And the inner loop doesn't take roughly-equal subsequences in pairs. Instead, the right subsequence is exactly one element (mid is inner, right is inner+1) and the left subsequence is always everything used so far (left is outer-1, and outer is constant 1). So the algorithm will repeatedly merge the already-sorted left subsequence with a single-element right subsequence.
This means that in effect, your algorithm is insertion sort, except that you don't insert in place, but instead copy the sorted sequence to a buffer, inserting the new element at the right moment, then copy the result back. So it's a very inefficient insertion sort.
Below is a link to somewhat optimized examples of top down and bottom up merge sort. The bottom up merge sort is a bit faster because it skips the recursive sequence used to repeated generate sub-pairs of indexes until a sub-pair represents a run of size 1. Most of the time is spent merging, so bottom up isn't that much faster. The first pass of the bottom up merge pass could be optimized by swapping pairs in place rather than copying them. The bottom up merge sort ends up with the sorted data in either the temp or original array. If the original array is wanted, then a pass count can be calculated and if the count is odd, then the first pass swaps in place.
Both versions can sort 4 million 64 bit unsigned integers in less than a second on my system (Intel Core i7 2600k 3.4ghz).
merge_sort using vectors works well with less than 9 inputs
For a vector or array of integers, a counting / radix sort would be faster still.
I've finally figured it out.
In pseudocode:
for( outer = 1, outer < length, outer *=2)
for(inner = 0; inner < length, inner = inner + (outer *2))
left = inner
middle = (inner + outer) - 1
right = (inner + (outer * 2)) - 1
merge(items, left, middle, right, temp)
After rethinking how the iterative merge sort is supposed to work and looking at a couple implementations, in the merge method, all I needed was to check if the middle and right indexes passed in were greater than or equal to the vector size (that way we handle any values that could out of bounds), then merge as usual. Also, looking at this helped greatly understand it; also this. Just to be sure that it works as well as a recursive Merge Sort, I did timings on both and both (recursive and iterative) implementations produced identical times for 500,1000,5000, and 10K values to sort (in some cases the iterative solution produced a faster time).

C++, fastest STL container for recursively doing { delete[begin], insert[end], and summing entire array contents}

I have some code and an array where each iteration I delete the first element, add an element at the end, and then sum the contents of the array. Naturally, the array stays the same size. I have tried using both vector and list, but both seem pretty slow.
int length = 400;
vector <int> v_int(length, z);
list <int> l_int(length, z);
for(int q=0; q < x; q++)
{
int sum =0;
if(y) //if using vector
{
v_int.erase(v_int.begin()); //takes `length` amount of time to shift memory
v_int.push_back(z);
for(int w=0; w < v_int.size(); w++)
sum += v_int[w];
}
else //if using list
{
l_int.pop_front(); //constant time
l_int.push_back(z);
list<int>::iterator it;
for ( it=l_int.begin() ; it != l_int.end(); it++ ) //seems to take much
sum += *it; //longer than vector does
}
}
The problem is that erasing the first element of the vector requires that each other element be shifted down, multiplying, by the size of the vector, the amount of time taken each iteration. Using a linked list avoids this (constant time removal of elements), and should not sacrifice any time summing the array (linear time traversal of the array), except that in my program it seems to be taking way longer to sum the contents than the vector does (at least 1 order of magnitude longer).
Is there a better container to use here? or a different way to approach the problem?
Why not keep a running sum with sum -= l_int.front(); sum += z?
Also the data structure you're looking for with that delete/insert performance is a queue
Efficient additions and deletions of end elements in a container is what the deque was made for.
If you are just inserting at one end and deleting at the other then you can use a queue

How to implement a minimum heap sort to find the kth smallest element?

I've been implementing selection sort problems for class and one of the assignments is to find the kth smallest element in the array using a minimum heap. I know the procedure is:
heapify the array
delete the minimum (root) k times
return kth smallest element in the group
I don't have any problems creating a minimum heap. I'm just not sure how to go about properly deleting the minimum k times and successfully return the kth smallest element in the group. Here's what I have so far:
bool Example::min_heap_select(long k, long & kth_smallest) const {
//duplicate test group (thanks, const!)
Example test = Example(*this);
//variable delcaration and initlization
int n = test._total ;
int i;
//Heapifying stage (THIS WORKS CORRECTLY)
for (i = n/2; i >= 0; i--) {
//allows for heap construction
test.percolate_down_protected(i, n);
}//for
//Delete min phase (THIS DOESN'T WORK)
for(i = n-1; i >= (n-k+1); i--) {
//deletes the min by swapping elements
int tmp = test._group[0];
test._group[0] = test._group[i];
test._group[i] = tmp;
//resumes perc down
test.percolate_down_protected(0, i);
}//for
//IDK WHAT TO RETURN
kth_smallest = test._group[0];
void Example::percolate_down_protected(long i, long n) {
//variable declaration and initlization:
int currPos, child, r_child, tmp;
currPos = i;
tmp = _group[i];
child = left_child(i);
//set a sentinel and begin loop (no recursion allowed)
while (child < n) {
//calculates the right child's position
r_child = child + 1;
//we'll set the child to index of greater than right and left children
if ((r_child > n ) && (_group[r_child] >= _group[child])) {
child = r_child;
}
//find the correct spot
if (tmp <= _group [child]) {
break;
}
//make sure the smaller child is beneath the parent
_group[currPos] = _group[child];
//shift the tree down
currPos = child;
child = left_child(currPos);
}
//put tmp where it belongs
_group[currPos] = tmp;
}
As I stated before, the minimum heap part works correctly. I understand what I what to do- it seems easy to delete the root k times but then after that what index in the array do I return... 0? This almost works- it doesn't worth with k = n or k = 1.Would the kth smallest element be in the Any help would be much appreciated!
The only array index which is meaningful to the user is zero, which is the minimum element. So, after removing k elements, the k'th smallest element will be at zero.
Probably you should destroy the heap and return the value rather than asking the user to concern themself with the heap itself… but I don't know the details of the assignment.
Note that the C++ Standard Library has algorithms to help with this: make_heap, pop_heap, and nth_element.
I am not providing a detailed answer, just explaining the key points in getting k smallest elements in a min-heap ordered tree. The approach uses skip lists.
First form a skip list of nodes of the tree with just one element the node corresponding to the root of the heap. the 1st minimum element is just the value stored at this node.
Now delete this node and insert its child nodes in the right position such that to maintain the order of values. This steps takes O(logk) time.
The second minimum value is just then the value at first node in this skip list.
Repeat the above steps until you get all the k minimum elements. The overall time complexity will be log(2)+log(3)+log(4)+... log(k) = O(k.logk). Forming a heap takes time n, so overall time complexity is O(n+klogk).
There is one more approach without making a heap that is Quickselect, which has an average time complexity of O(n) but worst case as O(n^2).
The striking difference between the two approaches is that the first approach gives all the k elements the minimum upto the kth minimum, while quickSelect gives only the kth minimum element.
Memory wise the former approach uses O(n) extra space which quickSelect uses O(1).