Insertion Sort Optimization

Insertion Sort Optimization - c++

I'm trying to practice making some different sort functions and the insertion function that I came up with is giving me some trouble. I can sort lists that are less than 30K fairly quickly. But I have a list of 100K integers and it literally takes 15 minutes for the function to complete the sort. Everything is sorted correctly, but I don't believe it should take that long.
Am I missing something with my code that is making it take so long? Many thanks in advance.
void Sort::insertion_Sort(vector <int> v)
{
int vecSize = v.size();
//for loop to advance through the vector
for (int i=0; i < vecSize; i++)
{
//delcare some variables
int cursor = i;
int inputCursor = i-1;
int temp = v[cursor];
//check to see if we are considering only a single element
if (cursor > 0)
{
//if there is more than 1 element, then we test the following.
//1. is the cursor element less than the inputCursor(which
//is the previous element)
//2. is the input cursor greater than -1
while (inputCursor > -1 && v[cursor] < v[inputCursor] )
{
//if so, we swap the variables
//then move the cursors back to check
//the previous elment and see if we need to swap again.
temp = v[cursor];
v[cursor] = v[inputCursor];
v[inputCursor] = temp;
inputCursor--;
cursor--;
}
}
}
}

Insertion sort is an O(n^2) algorithm. It's slow for large inputs. It's going to take roughly 11 times longer to process a list of 100k items than a list of 30k items. For inputs larger than 20 or so, you should use something like quicksort, which is O(n*log(n)).

The O(n^2) vs O(n*log(n)) problem, as pointed out by the other answer, is the center of this problem. I would suggest a binary search algorithm, as it is more similar to the insert algorithm, and is simplier to implement. It would look for the point of insertion dividing the already inserted vector in half, and trying to see if the integer to be inserted is greater or not of the integer in the middle. Then, it will try again to split one of the half (the one on the choosen side) and so on, recursively.
I think this is the best approach without starting from scratch.

Related

1838. Frequency of the Most Frequent Element leetcode C++

I am trying LeetCode problem 1838. Frequency of the Most Frequent Element:
The frequency of an element is the number of times it occurs in an array.
You are given an integer array nums and an integer k. In one operation, you can choose an index of nums and increment the element at that index by 1.
Return the maximum possible frequency of an element after performing at most k operations.
I am getting a Wrong Answer error for a specific test case.
My code
int checkfreq(vector<int>nums,int k,int i)
{
//int sz=nums.size();
int counter=0;
//int i=sz-1;
int el=nums[i];
while(k!=0 && i>0)
{
--i;
while(nums[i]!=el && k>0 && i>=0)
{
++nums[i];
--k;
}
}
counter=count(nums.begin(),nums.end(),el);
return counter;
}
class Solution {
public:
int maxFrequency(vector<int>& nums, int k) {
sort(nums.begin(),nums.end());
vector<int> nums2=nums;
auto distinct=unique(nums2.begin(),nums2.end());
nums2.resize(distance(nums2.begin(),distinct));
int xx=nums.size()-1;
int counter=checkfreq(nums,k,xx);
for(int i=nums2.size()-2;i>=0;--i)
{
--xx;
int temp=checkfreq(nums,k,xx);
if(temp>counter)
counter=temp;
}
return counter;
}
};
Failing test case
Input
nums = [9968,9934,9996,9928,9934,9906,9971,9980,9931,9970,9928,9973,9930,9992,9930,9920,9927,9951,9939,9915,9963,9955,9955,9955,9933,9926,9987,9912,9942,9961,9988,9966,9906,9992,9938,9941,9987,9917,10000,9919,9945,9953,9994,9913,9983,9967,9996,9962,9982,9946,9924,9982,9910,9930,9990,9903,9987,9977,9927,9922,9970,9978,9925,9950,9988,9980,9991,9997,9920,9910,9957,9938,9928,9944,9995,9905,9937,9946,9953,9909,9979,9961,9986,9979,9996,9912,9906,9968,9926,10000,9922,9943,9982,9917,9920,9952,9908,10000,9914,9979,9932,9918,9996,9923,9929,9997,9901,9955,9976,9959,9995,9948,9994,9996,9939,9977,9977,9901,9939,9953,9902,9926,9993,9926,9906,9914,9911,9901,9912,9990,9922,9911,9907,9901,9998,9941,9950,9985,9935,9928,9909,9929,9963,9997,9977,9997,9938,9933,9925,9907,9976,9921,9957,9931,9925,9979,9935,9990,9910,9938,9947,9969,9989,9976,9900,9910,9967,9951,9984,9979,9916,9978,9961,9986,9945,9976,9980,9921,9975,9999,9922]
k = 1524
Output
Expected: 81
My code returns: 79
I tried to solve as many cases as I could. I realise this is a bruteforce approach, but don't understand why my code is giving the wrong answer.
My approach is to convert numbers from last into the specified number. I need to check these as we have to count how many maximum numbers we can convert. Then this is repeated for every number till second last number. This is basically what I was thinking while writing this code.

The reason for the different output is that your xx index is only decreased one unit at each iteration of the i loop. But that loop is iterating for the number of unique elements, while xx is an index in the original vector. When there are many duplicates, that means xx is coming nowhere near the start of the vector and so it misses opportunities there.
You could fix that problem by replacing:
--xx;
...with:
--xx;
while (xx >= 0 && nums[xx] == nums[xx+1]) --xx;
if (xx < 0) break;
That will solve the issue you raise. You can also drop the unique call, and the distinct, nums2 and i variables. The outer loop could just check that xx > 0.
Efficiency is your next problem
Your algorithm is not as efficient as needed, and other tests with huge input data will time out.
Hint 1: checkfreq's inner loop is incrementing nums[i] one unit at a time. Do you see a way to have it increase with a larger amount, so to avoid that inner loop?
Hint 2 (harder): checkfreq is often incrementing the same value in different calls -- even more so when k is large and the section of the vector that can be incremented is large. Can you think of a way to avoid that checkfreq needs to redo that much work in subsequent calls, and can only concentrate on what is different compared to what it had to calculate in the previous call?

What is the use of last--; here?

I am new to C++ as well as algorithm, can anyone helps explain me the use of the (last--;) in the middle of my code? The explanation I have got is every times the array pass through will add one more value, so we need to put a last-- out there. I have tried to remove it, it doesn't affect anything, so is there a necessary to put a last--;?
void bubbleSort(int array[], int size)
{
bool swap;
int temp;
int last = size - 1;
do
{
swap = false;
for (int count = 0; count < last; count++)
{
if (array[count] > array[count + 1])
{
temp = array[count];
array[count] = array[count + 1];
array[count + 1] = temp;
swap = true;
}
}
last--;
} while (swap != false);
}

I have tried to remove it, it doesn't affect anything,
Well, have you tested performance?
Try a huge array and measure the time it takes to sort with and without that line.
The line makes sure that the inner loop doesn't visit numbers that already have been sorted.
If you delete the line, the inner loop will iterate size times every time. In worst case that will give size x size iterations.
With the line, the inner loop will iterate size times first, then size-1, then size-2... In worst case that will give size x (size-1) / 2 iterations, i.e. aprox. half the iterations and thereby better performance.

The for loop of your algorithm loops array elements and swaps adjacent elements if a[i]>a[i+1]. After one pass of this loop, the last element passed over must surely be the largest of all elements looped over, it has been bubbled up. So, in the next round of the while loop, the for loop does not have to consider this element again. This is ensured by last--.
If you remove that line, the algorithm will work, but will do twice as many comparisons, all of which are unnecessary.

It is merely an optimization of the algorithm.
After each pass, the far part the array becomes sorted. So we don't need to check it anymore. Since the for loop is based on limit, adding limit--; just contracts the loop.
I have tried to remove it, it doesn't affect anything, so is there a necessary to put a last--;?
No, it is not for the algorithm to work. It will work just as happily without it. It just purely an optimization that will have an impact on performance, especially with larger arrays.

find duplicate number in an array

I am debugging below problem and post the solution I am debugging and working on, the solution or similar is posted on a couple of forums, but I think the solution has a bug when num[0] = 0 or in general num[x] = x? Am I correct? Please feel free to correct me if I am wrong.
Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.
Note:
You must not modify the array (assume the array is read only).
You must use only constant, O(1) extra space.
Your runtime complexity should be less than O(n2).
There is only one duplicate number in the array, but it could be repeated more than once.
int findDuplicate3(vector<int>& nums)
{
if (nums.size() > 1)
{
int slow = nums[0];
int fast = nums[nums[0]];
while (slow != fast)
{
slow = nums[slow];
fast = nums[nums[fast]];
}
fast = 0;
while (fast != slow)
{
fast = nums[fast];
slow = nums[slow];
}
return slow;
}
return -1;
}

Below is my code which uses Floyd's cycle-finding algorithm:
#include <iostream>
#include <vector>
using namespace std;
int findDup(vector<int>&arr){
int len = arr.size();
if(len>1){
int slow = arr[0];
int fast = arr[arr[0]];
while(slow!=fast){
slow = arr[slow];
fast = arr[arr[fast]];
}
fast = 0;
while(slow!=fast){
slow = arr[slow];
fast = arr[fast];
}
return slow;
}
return -1;
}
int main() {
vector<int>v = {1,2,2,3,4};
cout<<findDup(v)<<endl;
return 0;
}
Comment This works because zeroes aren't allowed, so the first element of the array isn't part of a cycle, and so the first element of the first cycle we find is referred to both outside and inside the cycle. If zeroes were allowed, this would fail if arr[0] were on a cycle. E.g., [0,1,1].

The sum of integers from 1 to N = (N * (N + 1)) / 2. You can use this to find the duplicate -- sum the integers in the array, then subtract the above formula from the sum. That's the duplicate.
Update: The above solution is based on the (possibly invalid) assumption that the input array consists of the values from 1 to N plus a single duplicate.

Start with two pointers to the first element: fast and slow.
Define a 'move' as incrementing fast by 2 step(positions) and slow by 1.
After each move, check if slow & fast point to the same node.
If there is a loop, at some point they will. This is because after they are both in the loop, fast is moving twice as quickly as slow and will eventually 'run into' it.
Say they meet after k moves. This is NOT NECESSARILY the repeated element, since it might not be the first element of the loop reached from outside the loop.
Call this element X.
Notice that fast has stepped 2k times, and slow has stepped k times.
Move fast back to zero.
Repeatedly advance fast and slow by ONE STEP EACH, comparing after each step.
Notice that after another k steps, slow will have moved a total of 2k steps and fast a total of k steps from the start, so they will again both be pointing to X.
Notice that if the prior step is on the loop for both of them, they were both pointing to X-1. If the prior step was only on the loop for slow, then they were pointing to different elements.
Ditto for X-2, X-3, ...
So in going forward, the first time they are pointing to the same element is the first element of the cycle reached from outside the cycle, which is the repeated element you're looking for.

Since you cannot use any additional space, using another hash table would be ruled out.
Now, coming to the approach of hashing on existing array, it can be acheived if we are allowed to modify the array in place.
Algo:
1) Start with the first element.
2) Hash the first element and apply a transformation to the value of hash.Let's say this transformation is making the value -ve.
3)Proceed to next element.Hash the element and before applying the transformation, check if a transformation has already been applied.
4) If yes, then element is a duplicate.
Code:
for(i = 0; i < size; i++)
{
if(arr[abs(arr[i])] > 0)
arr[abs(arr[i])] = -arr[abs(arr[i])];
else
cout<< abs(arr[i]) <<endl;
}
This transformation is required since if we are to use hashing approach,then, there has to be a collision for hashing the same key.
I cant think of a way in which hashing can be used without any additional space and not modifying the array.

Create a function that checks whether an array has two opposite elements or not for less than n^2 complexity. (C++)

Create a function that checks whether an array has two opposite elements or not for less than n^2 complexity. Let's work with numbers.
Obviously the easiest way would be:
bool opposite(int* arr, int n) // n - array length
{
for(int i = 0; i < n; ++i)
{
for(int j = 0; j < n; ++j)
{
if(arr[i] == - arr[j])
return true;
}
}
return false;
}
I would like to ask if any of you guys can think of an algorithm that has a complexity less than n^2.
My first idea was the following:
1) sort array ( algorithm with worst case complexity: n.log(n) )
2) create two new arrays, filled with negative and positive numbers from the original array
( so far we've got -> n.log(n) + n + n = n.log(n))
3) ... compare somehow the two new arrays to determine if they have opposite numbers
I'm not pretty sure my ideas are correct, but I'm opened to suggestions.

An important alternative solution is as follows. Sort the array. Create two pointers, one initially pointing to the front (smallest), one initially pointing to the back (largest). If the sum of the two pointed-to elements is zero, you're done. If it is larger than zero, then decrement the back pointer. If it is smaller than zero, then increment the front pointer. Continue until the two pointers meet.
This solution is often the one people are looking for; often they'll explicitly rule out hash tables and trees by saying you only have O(1) extra space.

I would use an std::unordered_set and check to see if the opposite of the number already exist in the set. if not insert it into the set and check the next element.
std::vector<int> foo = {-10,12,13,14,10,-20,5,6,7,20,30,1,2,3,4,9,-30};
std::unordered_set<int> res;
for (auto e : foo)
{
if(res.count(-e) > 0)
std::cout << -e << " already exist\n";
else
res.insert(e);
}
Output:
opposite of 10 alrready exist
opposite of 20 alrready exist
opposite of -30 alrready exist
Live Example

Let's see that you can simply add all of elements to the unordered_set and when you are adding x check if you are in this set -x. The complexity of this solution is O(n). (as #Hurkyl said, thanks)
UPDATE: Second idea is: Sort the elements and then for all of the elements check (using binary search algorithm) if the opposite element exists.

You can do this in O(n log n) with a Red Black tree.
t := empty tree
for each e in A[1..n]
if (-e) is in t:
return true
insert e into t
return false
In C++, you wouldn't implement a Red Black tree for this purpose however. You'd use std::set, because it guarantees O(log n) search and insertion.
std::set<int> s;
for (auto e : A) {
if (s.count(-e) > 0) {
return true;
}
s.insert(e);
}
return false;
As Hurkyl mentioned, you could do better by just using std::unordered_set, which is a hashtable. This gives you O(1) search and insertion in the average case, but O(n) for both operations in the worst case. The total complexity of the solution in the average case would be O(n).

Is this a shell sort or an insertion sort?

I'm just starting to learn about sorting algorithms and found one online. At first i thought it was a shell sort but it's missing that distinct interval of "k" and the halving of the array so i'm not sure if it is or not. My second guess is an insertion sort but i'm just here to double check:
for(n = 1; n < num; n++)
{
key = A[n];
k = n;
while((k > 0) && (A[k-1] > key))
{
A[k] = A[k-1];
k = k-1;
}
A[k] = key;
}
Also if you can explain why that'd be helpful as well

Shell Sort consists of many insertion sorts that are performed on sub-arrays of the original array.
The code you have provided is insertion sort.
To get shell sort, it would be roughly having other fors around your code changing h (that gap in shell sort) and starting index of the sub-array and inside, instead of moving from k to k-1, you move from k to k+h (or k-h depending on which direction you do the insertion sort)

I think you're right, that does look a lot like an insertion sort.
This fragment assumes A[0] is already inserted. If n == 0, then the k > 0 check will fail and execution will continue at A[k] = key;, properly storing the first element into the array.
This fragment also assumes that A[0:n-1] is already sorted. It inspects A[n] and starts scanning the array backward, moving forward one place every element that is larger than the original A[n] key.
Once the scanning encounters an element less than or equal to the key, it inserts it in that location.

It's called insertion sort because the line A[k] = key inserts the current value in the correct position in the partially sorted array.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Insertion Sort Optimization - c++

Insertion sort is an O(n^2) algorithm. It's slow for large inputs. It's going to take roughly 11 times longer to process a list of 100k items than a list of 30k items. For inputs larger than 20 or so, you should use something like quicksort, which is O(n*log(n)).

Related

1838. Frequency of the Most Frequent Element leetcode C++

What is the use of last--; here?

find duplicate number in an array

Create a function that checks whether an array has two opposite elements or not for less than n^2 complexity. (C++)

Is this a shell sort or an insertion sort?

Categories

Resources