How to make partition inside Quicksort function? - c++

This is the question that I am working on
// Sorts an array arr[low..high]
randQuickSort(arr[], low, high)
If low >= high, then EXIT.
While pivot 'x' is not a Central Pivot.
(i) Choose uniformly at random a number from [low..high].
Let the randomly picked number number be x.
(ii) Count elements in arr[low..high] that are smaller
than arr[x]. Let this count be sc.
(iii) Count elements in arr[low..high] that are greater
than arr[x]. Let this count be gc.
(iv) Let n = (high-low+1). If sc >= n/4 and
gc >= n/4, then x is a central pivot.
3. Partition arr[low..high] around the pivot x.
// Recur for smaller elements
randQuickSort(arr, low, sc-1)
// Recur for greater elements
randQuickSort(arr, high-gc+1, high)
I have done to the (iv) part and then stucked at 3. which is to Partition arr[low..high] around the pivot and the problem is that my professor told me to mmake partition inside the randQuickSort function which I have search anywhere and the only example I got is seperated function
This is the code I have done so far
#include <iostream>
using namespace std;
void randQuickSort(int arr[], int low, int high);
int main(){
int arr[10] = {1,2,12,3,68,6,10,32,34,0};
randQuickSort(arr,0,9);
for(int i: arr)
cout << i;
}
void randQuickSort(int arr[], int low, int high){
int x,sc,gc,n;
if( low >= high )
exit(0);
while(true){
x = rand%10;
for(int i = low; i <= high; i++){
if(arr[i] < x)
sc++;
else if(arr[i] > x)
gc++;
}
n = (high-low+1);
if(sc >= n/4 && gc >= n/4)
break;
}
randQuickSort(arr,low,sc-1);
randQuickSort(arr,high-gc+1,high);
}
I would like some explanation on how do I go from here because I don't know how should I start.

Related

How is this function returning the correct answer (trying to find the minimum value in a sorted and rotated array)?

I am trying to find the minimum element in an array that has been sorted and rotated (Edit: distinct elements).
I wrote a function that uses binary search, splitting the array into smaller components and checking whether the "mid" value of the current component was either greater than the "high" value or lesser than the "starting" value moves towards either end because by my rough logic, the smallest element must be "somewhere around there".
Thinking about it a little harder, I realized that this function would not return the correct answer for a fully sorted component/array, but for some reason, it does.
// Function to find the minimum element in sorted and rotated array.
int minNumber(int arr[], int low, int high)
{
int mid = (low + high) / 2;
if (low > high)
return -1;
if (low == high)
return arr[low];
if (mid == 0 || arr[mid - 1] > arr[mid])
return arr[mid];
else if (arr[mid] > arr[high])
{
return (minNumber(arr, mid + 1, high));
}
else if (arr[mid] < arr[low])
{
return (minNumber(arr, low, mid - 1));
}
}
Passing the array "1,2,3,4,5,6,7,8,9" through it returns 1. I though maybe the rax register might have accidentally stored a 1 at some point and now the function was just spitting it out as there was no valid return statement for it to go through. Running it through g++ debugger line-by-line didn't really help and I'm still confused as to how this code works.
FULL CODE
// { Driver Code Starts
#include <bits/stdc++.h>
using namespace std;
// } Driver Code Ends
class Solution
{
public:
// Function to find the minimum element in sorted and rotated array.
int minNumber(int arr[], int low, int high)
{
int mid = (low + high) / 2;
if (low > high)
return -1;
if (low == high)
return arr[low];
if (mid == 0 || arr[mid - 1] > arr[mid])
return arr[mid];
else if (arr[mid] > arr[high])
{
return (minNumber(arr, mid + 1, high));
}
else if (arr[mid] < arr[low])
{
return (minNumber(arr, low, mid - 1));
}
}
};
// { Driver Code Starts.
int main()
{
int t;
cin >> t;
while (t--)
{
int n;
cin >> n;
int a[n];
for (int i = 0; i < n; ++i)
cin >> a[i];
Solution obj;
cout << obj.minNumber(a, 0, n - 1) << endl;
}
return 0;
} // } Driver Code Ends
Short answer: Undefined behavior doesn't mean it can't give the right answer.
<source>: In member function 'int Solution::minNumber(int*, int, int)':
<source>:31:5: warning: control reaches end of non-void function [-Wreturn-type]
When the array isn't rotated you hit that case and you are simply getting lucky.
Some nitpicking:
turn on the compiler warnings and read them
int is too small for a large array
int mid = (low + high) / 2; is UB for decently sized arrays, use std::midpoint
minNumber should take a std::span and then you could use std::size_t size = span.size();, span.first(size / 2) and span.last(size - size / 2)

How to count the amount of comparisons in a quick sort algorithm?

I have the following quick sort algorithm:
int quicksort(int data[], size_t n, int &counter)
// Library facilities used: cstdlib
{
size_t pivot_index; // Array index for the pivot element
size_t n1; // Number of elements before the pivot element
size_t n2; // Number of elements after the pivot element
if (n > 1)
{
// Partition the array, and set the pivot index.
partition(data, n, pivot_index, counter);
// Compute the sizes of the subarrays.
n1 = pivot_index;
n2 = n - n1 - 1;
// Recursive calls will now sort the subarrays.
quicksort(data, n1, counter);
quicksort((data + pivot_index + 1), n2, counter);
}
return counter;
}
void partition(int data[], size_t n, size_t& pivot_index, int &counter){
int pivot = data[0];
size_t too_big_index = 1;
size_t too_small_index = n - 1;
while (too_big_index <= too_small_index)
{
while (++counter && (too_big_index < n) && (data[too_big_index] <= pivot)) too_big_index++;
while (++counter && data[too_small_index] > pivot ) too_small_index--;
counter++;
if (too_big_index < too_small_index) swap(data[too_big_index], data[too_small_index]);
};
pivot_index = too_small_index;
data[0] = data[pivot_index];
data[pivot_index] = pivot;
}
I have added the three counter increments in the partition function, however the the counter comes out with a value of 32019997 when using a sorted array of 8000 elements, I am using a pivot of the left-most element (I know that gives me a terrible worst-case in terms of sorted array), unless I am incorrect, shouldn't the worst case be n^2 i.e 64000000? So I assume the way I am counting comparisons is wrong, but I am not sure how.

counting inversions with merge sort gives a negative number if the array length is 100000

I am still a beginner at programming and i am taking an online course (algorithms)
one of the practice questions was to count the number of inversions in a file containing 100000 numbers randomly ordered. I have tried this code on small data sets and it worked fine but when passing the actual data set it gives inversion count in negative number. Tried various solutions from different platforms but still couldn't resolve it yet.
so this is my code
#include "stdafx.h"
#include <iostream>;
#include <conio.h>:
#include <fstream>
using namespace std;
long merge(int a[], int start, int mid, int end)
int i = start;
int j = mid + 1;
int k = start;
int inversion=0;
int temp[100000];
while (i <= mid && j <= end)
{
if (a[i] < a[j])
{
temp[k++] = a[i++];
}
else
{
temp[k++] = a[j++];
inversion =inversion + (mid - i);
}
}
while (i <= mid)
{
temp[k++] = a[i++];
}
while (j <= end)
{
temp[k++] = a[j++];
}
for (int i = start; i <= end; i++)
{
a[i] = temp[i];
}
return inversion;
long Msort(int a[], int start,int end)
{
if (start >= end)
{
return 0;
}
int inversion = 0;
int mid = (start + end) / 2;
inversion += Msort(a, start, mid);
inversion += Msort(a, mid + 1, end);
inversion += merge(a, start, mid, end)
return inversion;
}
long ReadFromFile(char FileName[], int storage[],int n)
{
int b;
int count=0;
ifstream get(FileName);
if (!get)
{
cout << "no file found";
}
while (!get.eof())
{
get >> storage[count];
count++;
}
b = count;
return b;
}
int main()
{
int valuescount = 0;
int arr[100000];
char filename[] = { "file.txt" };
long n = sizeof(arr) / sizeof(arr[0]);
valuescount=ReadFromFile(filename, arr,n);
int no_Of_Inversions = Msort(arr, 0, valuescount -1);
cout << endl << "No of inversions are" << '\t' << no_Of_Inversions <<'\t';
cout <<endl<< "Total no of array values sorted"<< valuescount<<endl;
system("pause");
}
`
The issue with your code is not directly related to the input size. Rather, in an indirect way, the negative number of inversions you find is the result of an overflow in the variable inversion of the function merge.
Consider the case for your input size N = 100000. If this array of numbers is sorted in decreasing order, then all the ordered pairs in that array will be an inversion. In other words, there will be N * (N-1) / 2 inversions to be counted. As you may have noticed, that value is slightly higher than the bounds of unsigned int type. Consequently, when you try and count this value in a variable of type int, overflow occurs, leading to a negative result.
To remedy this issue, you should change the type of the variable inversion from int to long long, in functions merge and Msort. (You should also update the return type of the functions merge and Msort) Naturally, you should assign the return value of the Msort call in the main function to a variable of type long long as well. In other words, change the type of variable no_Of_Inversions into a long long as well.

Why does the quick sort algorithm duration increase when the array has duplicate values?

I am trying to measure the duration for both Merge Sort and Quick Sort functions using std::chrono time calculations and using randomly generated arrays of integers within some range [A, B], the sizes of the arrays vary from 5000 to 100,000 integers.
The goal of my code is to prove that when the method of picking the (pivot) in quick sort is improved, the quick sort function ends up taking less time to process the array than merge sort, the way I pick the pivot is using the random index method to minimize the probability of having a complexity of (n^2), However in some cases which I will describe below, the quick sort ends up taking more time than merge sort and I would like to know why this occurs.
case 1:
The range of the numbers in the array is small which increases the probability of having duplicate numbers in the array.
case 2:
When I use a local IDE like clion, the quick sort function takes a lot more time than merge sort, however an online compiler like IDEONE.com gives similar results in both sorting algorithms (even when the range of the generated integers is small)
here are the results I got in the mentioned cases(the first row of numbers is merge sort results, the second row is quick sort results):
1-clion results narrow range of numbers (-100, 600)
2-clion results with a wide range of numbers (INT_MIN, INT_MAX)
3-IDEONE results with a narrow range of numbers (-100, 600)
4- IDEONE results with a wide range of numbers (INT_MIN, INT_MAX)
#include <bits/stdc++.h>
#include <chrono>
#include <random>
using namespace std;
mt19937 gen(chrono::steady_clock::now().time_since_epoch().count());
int* generateArray(int size)
{
int* arr = new int[size];
uniform_int_distribution<> distribution(INT_MIN, INT_MAX);
for (int i=0; i < size; ++i)
{
arr[i] = distribution(gen);
}
return arr;
}
void merge(int* leftArr, int nL, int* rightArr, int nR, int* mainArr)
{
int i=0, j=0, k=0;
while (i < nL && j < nR)
{
if (leftArr[i] < rightArr[j]) { mainArr[k++] = leftArr[i++]; }
else { mainArr[k++] = rightArr[j++]; }
}
while (i < nL){ mainArr[k++] = leftArr[i++]; }
while (j < nR){ mainArr[k++] = rightArr[j++]; }
}
void mergeSort (int* mainArray, int arrayLength)
{
if (arrayLength < 2) { return; }
int mid = arrayLength/2;
int* leftArray = new int[mid];
int* rightArray = new int[arrayLength - mid];
for (int i=0; i<mid; ++i) {leftArray[i] = mainArray[i];}
for (int i = mid; i<arrayLength; ++i) {rightArray[i - mid] = mainArray[i];}
mergeSort(leftArray, mid);
mergeSort(rightArray, arrayLength-mid);
merge(leftArray, mid, rightArray, arrayLength-mid, mainArray);
delete[] leftArray;
delete[] rightArray;
}
int partition (int* arr, int left, int right)
{
uniform_int_distribution<> distribution(left, right);
int idx = distribution(gen);
swap(arr[right], arr[idx]);
int pivot = arr[right];
int partitionIndex = left;
for (int i = left; i < right; ++i)
{
if (arr[i] <= pivot)
{
swap(arr[i], arr[partitionIndex]);
partitionIndex++;
}
}
swap(arr[right], arr[partitionIndex]);
return partitionIndex;
}
void quickSort (int* arr, int left, int right)
{
if(left < right)
{
int partitionIndex = partition(arr, left, right);
quickSort(arr, left, partitionIndex-1);
quickSort(arr, partitionIndex+1, right);
}
}
int main()
{
vector <long long> mergeDuration;
vector <long long> quickDuration;
for (int i = 5000; i<= 100000; i += 5000)
{
int* arr = generateArray(i);
auto startTime = chrono::high_resolution_clock::now();
quickSort(arr, 0, i - 1);
auto endTime = chrono::high_resolution_clock::now();
long long duration = chrono::duration_cast<chrono::milliseconds>(endTime - startTime).count();
quickDuration.push_back(duration);
delete[] arr;
}
for (int i = 5000; i <= 100000; i += 5000 )
{
int* arr = generateArray(i);
auto startTime = chrono::high_resolution_clock::now();
mergeSort(arr, i);
auto endTime = chrono::high_resolution_clock::now();
long long duration = chrono::duration_cast<chrono::milliseconds>(endTime - startTime).count();
mergeDuration.push_back(duration);
delete[] arr;
}
for (int i = 0; i<mergeDuration.size(); ++i)
{
cout << mergeDuration[i] << " ";
}
cout << endl;
for (int i = 0; i<quickDuration.size(); ++i)
{
cout << quickDuration[i] << " ";
}
}
Quicksort is known to exhibit poor performance when the input set contains lots of duplicates. The solution is to use three-way partitioning as described on Wikipedia:
Repeated elements
With a partitioning algorithm such as the ones described above (even
with one that chooses good pivot values), quicksort exhibits poor
performance for inputs that contain many repeated elements. The
problem is clearly apparent when all the input elements are equal: at
each recursion, the left partition is empty (no input values are less
than the pivot), and the right partition has only decreased by one
element (the pivot is removed). Consequently, the algorithm takes
quadratic time to sort an array of equal values.
To solve this problem (sometimes called the Dutch national flag
problem), an alternative linear-time partition routine can be used
that separates the values into three groups: values less than the
pivot, values equal to the pivot, and values greater than the pivot.
... The values
equal to the pivot are already sorted, so only the less-than and
greater-than partitions need to be recursively sorted. In pseudocode,
the quicksort algorithm becomes
algorithm quicksort(A, lo, hi) is
if lo < hi then
p := pivot(A, lo, hi)
left, right := partition(A, p, lo, hi) // note: multiple return values
quicksort(A, lo, left - 1)
quicksort(A, right + 1, hi)
The partition algorithm returns indices to the first ('leftmost') and
to the last ('rightmost') item of the middle partition. Every item of
the partition is equal to p and is therefore sorted. Consequently, the
items of the partition need not be included in the recursive calls to
quicksort.
The following modified quickSort gives much better results:
pair<int,int> partition(int* arr, int left, int right)
{
int idx = left + (right - left) / 2;
int pivot = arr[idx]; // to be improved to median-of-three
int i = left, j = left, b = right - 1;
while (j <= b) {
auto x = arr[j];
if (x < pivot) {
swap(arr[i], arr[j]);
i++;
j++;
} else if (x > pivot) {
swap(arr[j], arr[b]);
b--;
} else {
j++;
}
}
return { i, j };
}
void quickSort(int* arr, int left, int right)
{
if (left < right)
{
pair<int, int> part = partition(arr, left, right);
quickSort(arr, left, part.first);
quickSort(arr, part.second, right);
}
}
Output:
0 1 2 3 4 5 6 7 8 9 11 11 12 13 14 15 16 19 18 19
0 0 0 1 0 1 1 1 1 1 2 3 2 2 2 2 3 3 3 3
0 1 2 3 4 5 6 6 8 8 9 12 11 12 13 14 16 17 18 19
0 0 1 1 1 2 3 3 3 4 4 4 5 6 5 6 7 7 8 8
So, the run with lots of duplicates is now much faster.
Why does the quick sort algorithm duration increase when the array has duplicate values?
This is only true if using Lomuto type partition scheme, where duplicate values cause the splitting to get worse.
If using Hoare partition scheme, the algorithm duration generally decreases when the array has duplicate values, because the splitting gets closer to the ideal case of splitting exactly in half and the improved splitting compensates for the extra swaps on a typical system with memory cache.

Insertion Sort algorithm by using binary search error

I run the following code which is Insertion Sort algorithm that use binary search to find the right position of the item being inserted instead of linear search but there are two numbers in the results not sorted correctly!
#include <iostream>
using namespace std;
void insertion_sort (int a[], int n /* the size of array */)
{
int i, temp,j;
for (i = 1; i < n; i++)
{
/* Assume items before a[i] are sorted. */
/* Pick an number */
temp = a[i];
/* Do binary search to find out the
point where b is to be inserted. */
int low = 0, high = i - 1, k;
while (high-low>1)
{
int mid = (high + low) / 2;
if (temp <= a[mid])
high = mid;
else
low = mid;
}
/* Shift items between high and i by 1 */
for (k = i; k > high; k--)
a[k] = a[k - 1];
a[high] = temp;
}
}
int main()
{
int A[15]={9,5,98,2,5,4,66,12,8,54,0,11,99,55,13};
insertion_sort(A,15);
for (int i=0; i<15; i++)
cout<<A[i]<<endl;
system("pause");
return 0;
}
Output:
Why?
#include <iostream>
using namespace std;
void insertion_sort (int a[], int n /* the size of array */)
{
int i, temp,j;
for (i = 1; i < n; i++)
{
/* Assume items before a[i] are sorted. */
/* Pick an number */
temp = a[i];
/* Do binary search to find out the
point where b is to be inserted. */
The high bound should be above the range and exclusive of the search, because you may need to insert at the end, i.e. do nothing.
// int low = 0, high = i - 1, k;
int low = 0, high = i, k;
Here the condition should be low < high, not low + 1 < high
// while (high-low>1)
while (low < high)
{
int mid = (high + low) / 2;
if (temp <= a[mid])
high = mid;
else
Once you have a[mid] strictly greater than temp, the lowest possible position to insert is mid + 1.
// low = mid;
low = mid + 1
}
/* Shift items between high and i by 1 */
for (k = i; k > high; k--)
a[k] = a[k - 1];
a[high] = temp;
}
}
int main()
{
int A[15]={9,5,98,2,5,4,66,12,8,54,0,11,99,55,13};
insertion_sort(A,15);
for (int i=0; i<15; i++)
cout<<A[i]<<endl;
system("pause");
return 0;
}
A few things to be noticed here:
Binary search does not give you anything, as you need to shift all elements to make space anyway. So it actually increases the overall cost of your algorithm (though not asymptotically).
As this is C++ there is no need to declare k anywhere before the for loop in which it is used (just use for(int k; ...)).
Analyse your algorithm's beginning: i=0 -> low = high = 0. So your while loop does not execute. Then, no matter if the element should be moved or not, your for (k) loop swaps elements 0 and 1. This is error number 1.
Second iteration of i: the while loop does not execute once again, as low = 0 and high = 1, and once again no matter what you swap at least elements 1 and 2. Error number 2.
Now notice that every next iteration will, no matter what, move the element which was initially at index 0 (in your test code it is =9) further and further, to the last index.
So you may see just after checking firts two iterations of for(i) loop that the assumption that elements before a[i] are sorted is wrong, and therefore the algorithm is wrong as well.
Easiest possible fix: initialize low and high as int low = -1, high = i;. What you wanted to do was to find indices low and high such that all elements from 0 to low are < a[i] and all elements from high to i-1 are ≥ a[i]. Your initialization didn't work since it didn't capture the corner cases when all elements a[0], ..., a[i-1] are greater than a[i] and the corner case when all these elements were less than a[i].