Improvement of quick sort

Improvement of quick sort - c++

Quick sort algorithm has bad behavior when there are many copy of items.(I mean we have Repetitive data).How could it be improved to this issue is resolved.
int partition (int low,int high)
{
int j=low,i=low+1;
int PivotItem=arr[low];
for(int i=0,j=0;i<n;i++)
{
if(arr[i]==PivotItem)
subarray[j]=arr[i];
}
for(i=low+1;i<=high;i++)
{
if(arr[i]<PivotItem)
{
j++;
swap(arr[i],arr[j]);
}
}
swap(arr[low],arr[j]);
int PivotPoint=j;
return PivotPoint;
}
void quick_sort(int low,int high)
{
if(low==high)
return ;
int PivotPoint=partition(low,high);
quick_sort(low,PivotPoint-1);
quick_sort(PivotPoint+1,high);
}

There is special modification of QuickSort known as dutch flag sort algorithm. It uses three-way partition for items smaller, equal and bigger than pivot item value.

I assume you meant the fact that quick sort compares elements based on a <= (or < and then the result is symmetrical to the next explanation) comparator, and if we look at the case where all elements are the same as the pivot x, you get quicksort's worst case complexity, since you split the array into two very non-even parts, one of size n-1, and the other is empty.
A quick fix to address this issue will be to use quick sort only with <, and > - to split the data to the two subarrays, and instead of a singular pivot, hold an array that holds all the elements that equal to the pivot, then recurse on the elements that are strictly larger than the pivot, and the elements that are strictly smaller than the pivot, and combine the three arrays.
Illustration:
legend: X=pivot, S = smaller than pivot, L = larger than pivot
array = |SLLLSLXLSLXSSLLXLLLSSXSSLLLSSXSSLLLXSSL|
Choose pivot - X
Create L array of only strictly smaller elements: |SSSSSSSSSSSSSSS|
Create R array of only strictly larger elements: |LLLLLLLLLLLLLLLLLL|
Create "pivot array" |XXXXXX|
Now, recurse on L, recurse on R, and combine:
|SSSSSSSSSSSSSSS XXXXXX LLLLLLLLLLLLLLLLLL|

Related

I have a question about merge sort algorithm

I've looked at the merge sort example code, but there's something I don't understand.
void mergesort(int left, int right)
{
if (left < right)
{
int sorted[LEN];
int mid, p1, p2, idx;
mid = (left + right) / 2;
mergesort(left, mid);
mergesort(mid + 1, right);
p1 = left;
p2 = mid + 1;
idx = left;
while (p1 <= mid && p2 <= right)
{
if (arr[p1] < arr[p2])
sorted[idx++] = arr[p1++];
else
sorted[idx++] = arr[p2++];
}
while (p1 <= mid)
sorted[idx++] = arr[p1++];
while (p2 <= right)
sorted[idx++] = arr[p2++];
for (int i = left; i <= right; i++)
arr[i] = sorted[i];
}
}
In this code, I don't know about a third while loop.
In detail, This code inserts p1, p2 in order into the 'sorted array'.
I want to know how this while loop creates an ascending array.
I would appreciate it if you could write your answer in detail so that I can understand it.

why the array is sorted in ascending order
Merge sort divides an array of n elements into n runs of 1 element each. Each of those single element runs can be considered to be sorted since they only contain a single element. Pairs of single element runs are merged to create sorted runs of 2 elements each. Pairs of 2 element runs are merged to create sorted runs of 4 elements each. The process continues until a sorted run equal the size of the original array is created.
The example in the question is a top down merge sort, that recursively splits the array in half until a base case of a single element run is reached. After this, merging follows the call chain, depth first left first. Most libraries use some variation of bottom up merge sort (along with insertion sort used to detect or create small sorted runs). With a bottom up merge sort, there's no recursive splitting, an array of n elements is treated as n runs of 1 element each, and starts merging even and odd runs, left to right, in a merge pass. After ceiling(log2(n)) passes, the array is sorted.
The example code has an issue, it allocates an entire array on the stack for each level of recursion which will result in stack overflow for large arrays. The Wiki examples are better, although the bottom up example should swap references rather than copy the array.
https://en.wikipedia.org/wiki/Merge_sort
For the question's code, might as well have sorted as a global array, or at least declared as static (a single instance):
static int arr[LEN];
static int sorted[LEN];
void mergesort(int left, int right)
/* ... */

I'm a developer working in the field.
I was surprised to see you embodying merge sort.
Before we start, the time complexity of the merge sort is O(nlogn).
The reason can be found in the merge sort process!
First, let's assume that there is an unordered array.
Merger sorting process:
Divide it into an array of 1 size by the number of size of the array.
Create an array that is twice the size of the divided array.
Compare the elements of the two divided arrays and put the smaller elements in order in the created array.
Repeat this process until it reaches the size of the original array.
merge sort img
There is a reason why the time complexity of the merge sort is O(nLogn).
In this process, the time complexity of log is obtained because the array is continuously divided by half, and the time complexity of nlogn is obtained because the process is performed by a total of n times.

Create a function that checks whether an array has two opposite elements or not for less than n^2 complexity. (C++)

Create a function that checks whether an array has two opposite elements or not for less than n^2 complexity. Let's work with numbers.
Obviously the easiest way would be:
bool opposite(int* arr, int n) // n - array length
{
for(int i = 0; i < n; ++i)
{
for(int j = 0; j < n; ++j)
{
if(arr[i] == - arr[j])
return true;
}
}
return false;
}
I would like to ask if any of you guys can think of an algorithm that has a complexity less than n^2.
My first idea was the following:
1) sort array ( algorithm with worst case complexity: n.log(n) )
2) create two new arrays, filled with negative and positive numbers from the original array
( so far we've got -> n.log(n) + n + n = n.log(n))
3) ... compare somehow the two new arrays to determine if they have opposite numbers
I'm not pretty sure my ideas are correct, but I'm opened to suggestions.

An important alternative solution is as follows. Sort the array. Create two pointers, one initially pointing to the front (smallest), one initially pointing to the back (largest). If the sum of the two pointed-to elements is zero, you're done. If it is larger than zero, then decrement the back pointer. If it is smaller than zero, then increment the front pointer. Continue until the two pointers meet.
This solution is often the one people are looking for; often they'll explicitly rule out hash tables and trees by saying you only have O(1) extra space.

I would use an std::unordered_set and check to see if the opposite of the number already exist in the set. if not insert it into the set and check the next element.
std::vector<int> foo = {-10,12,13,14,10,-20,5,6,7,20,30,1,2,3,4,9,-30};
std::unordered_set<int> res;
for (auto e : foo)
{
if(res.count(-e) > 0)
std::cout << -e << " already exist\n";
else
res.insert(e);
}
Output:
opposite of 10 alrready exist
opposite of 20 alrready exist
opposite of -30 alrready exist
Live Example

Let's see that you can simply add all of elements to the unordered_set and when you are adding x check if you are in this set -x. The complexity of this solution is O(n). (as #Hurkyl said, thanks)
UPDATE: Second idea is: Sort the elements and then for all of the elements check (using binary search algorithm) if the opposite element exists.

You can do this in O(n log n) with a Red Black tree.
t := empty tree
for each e in A[1..n]
if (-e) is in t:
return true
insert e into t
return false
In C++, you wouldn't implement a Red Black tree for this purpose however. You'd use std::set, because it guarantees O(log n) search and insertion.
std::set<int> s;
for (auto e : A) {
if (s.count(-e) > 0) {
return true;
}
s.insert(e);
}
return false;
As Hurkyl mentioned, you could do better by just using std::unordered_set, which is a hashtable. This gives you O(1) search and insertion in the average case, but O(n) for both operations in the worst case. The total complexity of the solution in the average case would be O(n).

Efficient way to count number of swaps to insertion sort an array of integers in increasing order

Given an array of values of length n, is there a way to count the number of swaps that would be performed by insertion sort to sort that array in time better than O(n2)?
For example :
arr[]={2 ,1, 3, 1, 2}; // Answer is 4.
Algorithm:
for i <- 2 to N
j <- i
while j > 1 and a[j] < a[j - 1]
swap a[j] and a[j - 1] //I want to count this swaps?
j <- j - 1

If you want to count the number of swaps needed in insertion sort, then you want to find the following number: for each element, how many previous elements inn the array are smaller than it? The sum of these values is then the total number of swaps performed.
To find the number, you can use an order statistic tree, a balanced binary search tree that can efficiently tell you how many elements in the tree are smaller then some given element. Specifically, an orde statistic tree supports O(log n) insertion, deletion, lookup, and count of how many elements in the tree are less than some value. You can then count how many swaps will be performed as follows:
Initialize a new, empty order statistic tree.
Set count = 0
For each array element, in order:
Add the element to the order statistic tree.
Add to count the number of elements in the tree less than the value added.
Return count,
This does O(n) iterations of a loop that takes O(log n) time, so the total work done is O(n log n), which is faster than the brute-force approach.
If you want to count the number of swaps in selection sort, then you can use the fact that insertion sort will only perform a swap on the kth pass if, after processing the first k-1 elements of the list, the element in position k is not the kth smallest element. If you can do this efficiently, then we have the following basic sketch of an algorithm:
Set total = 0
For k = 1 to n:
If the element at index k isn't the kth largest element:
Swap it with the kth largest element.
Increment total
Return total
So how do we implement this efficiently? We need to efficiently be able to check whether the element at a given index is the correct element, and also need to efficiently find the position of the element that really does belong at a given index otherwise. To do this, begin by creating a balanced binary search tree that maps each element to its position in the original array. This takes time O(n log n). Now that you have the balanced tree, we can augment the structure by assigning to each element in the tree the position in the sorted sequence that this element belongs. One way to do this is with an order statistic tree, and another would be to iterate over the tree with an inorder traversal, annotating each value in the tree with its position.
Using this structure, we can check in O(log n) time whether or not an element is in the right position by looking the element up in the tree (time O(log n)), then looking at the position in the sorted sequence at which it should be and at which position it's currently located (remember that we set this up when creating the tree). If it disagrees with our expected position, then it's in the wrong place, and otherwise it's in the right place. Also, we can efficiently simulate a swap of two elements by looking up those two elements in the tree (O(log n) time total) and then swapping their positions in O(1).
As a result, we can implement the above algorithm in time O(n log n) - O(n log n) time to build the tree, then n iterations of doing O(log n) work to determine whether or not to swap.
Hope this helps!

The number of interchanges of consecutive elements necessary to arrange them in their natural order is equal to the number of inversions in the given permutation.
So the solution to this problem is to find the number of inversions in the given array of numbers.
This can be solved in O(n log n) using merge sort.
In the merge step, if you copy an element from the right array, increment a global counter (that counts inversions) by the number of items remaining in the left array. This is done because the element from the right array that just got copied is involved in an inversion with all the elements in present in the left array.

I'm not sure, but I suspect finding the minimum number is a difficult problem. Unless there's a shortcut, you'll just be searching for optimal sorting networks, which you should be able to find good resources on with your favorite search engine (or Wikipedia).
If you only care about the big-O complexity, the answer is O(n log n), and you can probably get more concrete bounds (some actual constants in there) if you look at the analysis of some efficient in-place sorting algorithms like heapsort or smoothsort.

package insertoinSortAnalysis;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class Solution {
private int[] originalArray;
public static void main(String[] args) {
Scanner sc;
try {
sc = new Scanner(System.in);
int TestCases = sc.nextInt();
for (int i = 0; i < TestCases; i++) {
int sizeofarray = sc.nextInt();
Solution s = new Solution();
s.originalArray = new int[sizeofarray];
for (int j = 0; j < sizeofarray; j++)
s.originalArray[j] = sc.nextInt();
s.devide(s.originalArray, 0, sizeofarray - 1);
System.out.println(s.count);
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public int[] devide(int[] originalArray, int low, int high) {
if (low < high) {
int mid = (low + high) / 2;
int[] result1 = devide(originalArray, low, mid);
int[] result2 = devide(originalArray, mid + 1, high);
return merge(result1, result2);
}
int[] result = { originalArray[low] };
return result;
}
private long count = 0;
private int[] merge(int[] array1, int[] array2) {
int lowIndex1 = 0;
int lowIndex2 = 0;
int highIndex1 = array1.length - 1;
int highIndex2 = array2.length - 1;
int result[] = new int[array1.length + array2.length];
int i = 0;
while (lowIndex2 <= highIndex2 && lowIndex1 <= highIndex1) {
int element = array1[lowIndex1];
while (lowIndex2 <= highIndex2 && element > array2[lowIndex2]) {
result[i++] = array2[lowIndex2++];
count += ((highIndex1 - lowIndex1) + 1);
}
result[i++] = element;
lowIndex1++;
}
while (lowIndex2 <= highIndex2 && lowIndex1 > highIndex1) {
result[i++] = array2[lowIndex2++];
}
while (lowIndex1 <= highIndex1 && lowIndex2 > highIndex2) {
result[i++] = array1[lowIndex1++];
}
return result;
}
}

Each swap in the insertion sort moves two adjacent elements - one up by one, one down by one - and `corrects' a single crossing by doing so. So:
Annotate each item, X, with its initial array index, Xi.
Sort the items using a stable sort (you can use quicksort if you treat the `initial position' annotation as a minor key)
Return half the sum of the absolute differences between each element's annotated initial position and its final position (i.e. just loop through the annotations summing abs(Xi - i)).
Just like most of the other answers, this is O(n) space and O(n*log n) time. If an in-place merge could be modified to count the crossings, that'd be better. I'm not sure it can though.

#include<stdio.h>
#include<string.h>
#include<iostream>
#include<algorithm>
using namespace std;
int a[200001];
int te[200001];
unsigned long long merge(int arr[],int temp[],int left,int mid,int right)
{
int i=left;
int j=mid;
int k=left;
unsigned long long int icount=0;
while((i<=mid-1) && (j<=right))
{
if(arr[i]<=arr[j])
temp[k++]=arr[i++];
else
{
temp[k++]=arr[j++];
icount+=(mid-i);
}
}
while(i<=mid-1)
temp[k++]=arr[i++];
while(j<=right)
temp[k++]=arr[j++];
for(int i=left;i<=right;i++)
arr[i]=temp[i];
return icount;
}
unsigned long long int mergesort(int arr[],int temp[],int left,int right)
{
unsigned long long int i=0;
if(right>left){
int mid=(left+right)/2;
i=mergesort(arr,temp,left,mid);
i+=mergesort(arr,temp,mid+1,right);
i+=merge(arr,temp,left,mid+1,right);
}
return i;
}
int main()
{
int t,n;
scanf("%d",&t);
while(t--){
scanf("%d",&n);
for(int i=0;i<n;i++){
scanf("%d",&a[i]);
}
printf("%llu\n",mergesort(a,te,0,n-1));
}
return 0;
}

choose n largest elements in two vector

I have two vectors, each contains n unsorted elements, how can I get n largest elements in these two vectors?
my solution is merge two vector into one with 2n elements, and then use std::nth_element algorithm, but I found that's not quite efficient, so anyone has more efficient solution. Really appreciate.

You may push the elements into priority_queue and then pop n elements out.

Assuming that n is far smaller than N this is quite efficient. Getting minElem is cheap and sorted inserting in L cheaper than sorting of the two vectors if n << N.
L := SortedList()
For Each element in any of the vectors do
{
minElem := smallest element in L
if( element >= minElem or if size of L < n)
{
add element to L
if( size of L > n )
{
remove smallest element from L
}
}
}

vector<T> heap;
heap.reserve(n + 1);
vector<T>::iterator left = leftVec.begin(), right = rightVec.begin();
for (int i = 0; i < n; i++) {
if (left != leftVec.end()) heap.push_back(*left++);
else if (right != rightVec.end()) heap.push_back(*right++);
}
if (left == leftVec.end() && right == rightVec.end()) return heap;
make_heap(heap.begin(), heap.end(), greater<T>());
while (left != leftVec.end()) {
heap.push_back(*left++);
push_heap(heap.begin(), heap.end(), greater<T>());
pop_heap(heap.begin(), heap.end(), greater<T>());
heap.pop_back();
}
/* ... repeat for right ... */
return heap;
Note I use *_heap directly rather than priority_queue because priority_queue does not provide access to its underlying data structure. This is O(N log n), slightly better than the naive O(N log N) method if n << N.

You can do the "n'th element" algorithm conceptually in parallel on the two vectors quite easiely (at least the simple variant that's only linear in the average case).
Pick a pivot.
Partition (std::partition) both vectors by that pivot. You'll have the first vector partitioned by some element with rank i and the second by some element with rank j. I'm assuming descending order here.
If i+j < n, recurse on the right side for the n-i-j greatest elements. If i+j > n, recurse on the left side for the n greatest elements. If you hit i+j==n, stop the recursion.
You basically just need to make sure to partition both vectors by the same pivot in every step. Given a decent pivot selection, this algorithm is linear in the average case (and works in-place).
See also: http://en.wikipedia.org/wiki/Selection_algorithm#Partition-based_general_selection_algorithm
Edit: (hopefully) clarified the algorithm a bit.

Finding smallest value in an array most efficiently

There are N values in the array, and one of them is the smallest value. How can I find the smallest value most efficiently?

If they are unsorted, you can't do much but look at each one, which is O(N), and when you're done you'll know the minimum.
Pseudo-code:
small = <biggest value> // such as std::numerical_limits<int>::max
for each element in array:
if (element < small)
small = element
A better way reminded by Ben to me was to just initialize small with the first element:
small = element[0]
for each element in array, starting from 1 (not 0):
if (element < small)
small = element
The above is wrapped in the algorithm header as std::min_element.
If you can keep your array sorted as items are added, then finding it will be O(1), since you can keep the smallest at front.
That's as good as it gets with arrays.

You need too loop through the array, remembering the smallest value you've seen so far. Like this:
int smallest = INT_MAX;
for (int i = 0; i < array_length; i++) {
if (array[i] < smallest) {
smallest = array[i];
}
}

The stl contains a bunch of methods that should be used dependent to the problem.
std::find
std::find_if
std::count
std::find
std::binary_search
std::equal_range
std::lower_bound
std::upper_bound
Now it contains on your data what algorithm to use.
This Artikel contains a perfect table to help choosing the right algorithm.
In the special case where min max should be determined and you are using std::vector or ???* array
std::min_element
std::max_element
can be used.

If you want to be really efficient and you have enough time to spent, use SIMD instruction.
You can compare several pairs in one instruction:
r0 := min(a0, b0)
r1 := min(a1, b1)
r2 := min(a2, b2)
r3 := min(a3, b3)
__m64 _mm_min_pu8(__m64 a , __m64 b );
Today every computer supports it. Other already have written min function for you:
http://smartdata.usbid.com/datasheets/usbid/2001/2001-q1/i_minmax.pdf
or use already ready library.

If the array is sorted in ascending or descending order then you can find it with complexity O(1).
For an array of ascending order the first element is the smallest element, you can get it by arr[0] (0 based indexing).
If the array is sorted in descending order then the last element is the smallest element,you can get it by arr[sizeOfArray-1].
If the array is not sorted then you have to iterate over the array to get the smallest element.In this case time complexity is O(n), here n is the size of array.
int arr[] = {5,7,9,0,-3,2,3,4,56,-7};
int smallest_element=arr[0] //let, first element is the smallest one
for(int i =1;i<sizeOfArray;i++)
{
if(arr[i]<smallest_element)
{
smallest_element=arr[i];
}
}
You can calculate it in input section (when you have to find smallest element from a given array)
int smallest_element;
int arr[100],n;
cin>>n;
for(int i = 0;i<n;i++)
{
cin>>arr[i];
if(i==0)
{
smallest_element=arr[i]; //smallest_element=arr[0];
}
else if(arr[i]<smallest_element)
{
smallest_element = arr[i];
}
}
Also you can get smallest element by built in function
#inclue<algorithm>
int smallest_element = *min_element(arr,arr+n); //here n is the size of array
You can get smallest element of any range by using this function
such as,
int arr[] = {3,2,1,-1,-2,-3};
cout<<*min_element(arr,arr+3); //this will print 1,smallest element of first three element
cout<<*min_element(arr+2,arr+5); // -2, smallest element between third and fifth element (inclusive)
I have used asterisk (*), before min_element() function. Because it returns pointer of smallest element.
All codes are in c++.
You can find the maximum element in opposite way.

Richie's answer is close. It depends upon the language. Here is a good solution for java:
int smallest = Integer.MAX_VALUE;
int array[]; // Assume it is filled.
int array_length = array.length;
for (int i = array_length - 1; i >= 0; i--) {
if (array[i] < smallest) {
smallest = array[i];
}
}
I go through the array in reverse order, because comparing "i" to "array_length" in the loop comparison requires a fetch and a comparison (two operations), whereas comparing "i" to "0" is a single JVM bytecode operation. If the work being done in the loop is negligible, then the loop comparison consumes a sizable fraction of the time.
Of course, others pointed out that encapsulating the array and controlling inserts will help. If getting the minimum was ALL you needed, keeping the list in sorted order is not necessary. Just keep an instance variable that holds the smallest inserted so far, and compare it to each value as it is added to the array. (Of course, this fails if you remove elements. In that case, if you remove the current lowest value, you need to do a scan of the entire array to find the new lowest value.)

An O(1) sollution might be to just guess: The smallest number in your array will often be 0. 0 crops up everywhere. Given that you are only looking at unsigned numbers. But even then: 0 is good enough. Also, looking through all elements for the smallest number is a real pain. Why not just use 0? It could actually be the correct result!
If the interviewer/your teacher doesn't like that answer, try 1, 2 or 3. They also end up being in most homework/interview-scenario numeric arrays...
On a more serious side: How often will you need to perform this operation on the array? Because the sollutions above are all O(n). If you want to do that m times to a list you will be adding new elements to all the time, why not pay some time up front and create a heap? Then finding the smallest element can really be done in O(1), without resulting to cheating.

If finding the minimum is a one time thing, just iterate through the list and find the minimum.
If finding the minimum is a very common thing and you only need to operate on the minimum, use a Heap data structure.
A heap will be faster than doing a sort on the list but the tradeoff is you can only find the minimum.

If you're developing some kind of your own array abstraction, you can get O(1) if you store smallest added value in additional attribute and compare it every time a new item is put into array.
It should look something like this:
class MyArray
{
public:
MyArray() : m_minValue(INT_MAX) {}
void add(int newValue)
{
if (newValue < m_minValue) m_minValue = newValue;
list.push_back( newValue );
}
int min()
{
return m_minValue;
}
private:
int m_minValue;
std::list m_list;
}

//find the min in an array list of #s
$array = array(45,545,134,6735,545,23,434);
$smallest = $array[0];
for($i=1; $i<count($array); $i++){
if($array[$i] < $smallest){
echo $array[$i];
}
}

//smalest number in the array//
double small = x[0];
for(t=0;t<x[t];t++)
{
if(x[t]<small)
{
small=x[t];
}
}
printf("\nThe smallest number is %0.2lf \n",small);

Procedure:
We can use min_element(array, array+size) function . But it iterator
that return the address of minimum element . If we use *min_element(array, array+size) then it will return the minimum value of array.
C++ implementation
#include<bits/stdc++.h>
using namespace std;
int main()
{
int num;
cin>>num;
int arr[10];
for(int i=0; i<num; i++)
{
cin>>arr[i];
}
cout<<*min_element(arr,arr+num)<<endl;
return 0;
}

int small=a[0];
for (int x: a.length)
{
if(a[x]<small)
small=a[x];
}

C++ code
#include <iostream>
using namespace std;
int main() {
int n = 5;
int arr[n] = {12,4,15,6,2};
int min = arr[0];
for (int i=1;i<n;i++){
if (min>arr[i]){
min = arr[i];
}
}
cout << min;
return 0;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js