What is the use of last--; here? - c++

I am new to C++ as well as algorithm, can anyone helps explain me the use of the (last--;) in the middle of my code? The explanation I have got is every times the array pass through will add one more value, so we need to put a last-- out there. I have tried to remove it, it doesn't affect anything, so is there a necessary to put a last--;?
void bubbleSort(int array[], int size)
{
bool swap;
int temp;
int last = size - 1;
do
{
swap = false;
for (int count = 0; count < last; count++)
{
if (array[count] > array[count + 1])
{
temp = array[count];
array[count] = array[count + 1];
array[count + 1] = temp;
swap = true;
}
}
last--;
} while (swap != false);
}

I have tried to remove it, it doesn't affect anything,
Well, have you tested performance?
Try a huge array and measure the time it takes to sort with and without that line.
The line makes sure that the inner loop doesn't visit numbers that already have been sorted.
If you delete the line, the inner loop will iterate size times every time. In worst case that will give size x size iterations.
With the line, the inner loop will iterate size times first, then size-1, then size-2... In worst case that will give size x (size-1) / 2 iterations, i.e. aprox. half the iterations and thereby better performance.

The for loop of your algorithm loops array elements and swaps adjacent elements if a[i]>a[i+1]. After one pass of this loop, the last element passed over must surely be the largest of all elements looped over, it has been bubbled up. So, in the next round of the while loop, the for loop does not have to consider this element again. This is ensured by last--.
If you remove that line, the algorithm will work, but will do twice as many comparisons, all of which are unnecessary.

It is merely an optimization of the algorithm.
After each pass, the far part the array becomes sorted. So we don't need to check it anymore. Since the for loop is based on limit, adding limit--; just contracts the loop.
I have tried to remove it, it doesn't affect anything, so is there a necessary to put a last--;?
No, it is not for the algorithm to work. It will work just as happily without it. It just purely an optimization that will have an impact on performance, especially with larger arrays.

Related

1838. Frequency of the Most Frequent Element leetcode C++

I am trying LeetCode problem 1838. Frequency of the Most Frequent Element:
The frequency of an element is the number of times it occurs in an array.
You are given an integer array nums and an integer k. In one operation, you can choose an index of nums and increment the element at that index by 1.
Return the maximum possible frequency of an element after performing at most k operations.
I am getting a Wrong Answer error for a specific test case.
My code
int checkfreq(vector<int>nums,int k,int i)
{
//int sz=nums.size();
int counter=0;
//int i=sz-1;
int el=nums[i];
while(k!=0 && i>0)
{
--i;
while(nums[i]!=el && k>0 && i>=0)
{
++nums[i];
--k;
}
}
counter=count(nums.begin(),nums.end(),el);
return counter;
}
class Solution {
public:
int maxFrequency(vector<int>& nums, int k) {
sort(nums.begin(),nums.end());
vector<int> nums2=nums;
auto distinct=unique(nums2.begin(),nums2.end());
nums2.resize(distance(nums2.begin(),distinct));
int xx=nums.size()-1;
int counter=checkfreq(nums,k,xx);
for(int i=nums2.size()-2;i>=0;--i)
{
--xx;
int temp=checkfreq(nums,k,xx);
if(temp>counter)
counter=temp;
}
return counter;
}
};
Failing test case
Input
nums = [9968,9934,9996,9928,9934,9906,9971,9980,9931,9970,9928,9973,9930,9992,9930,9920,9927,9951,9939,9915,9963,9955,9955,9955,9933,9926,9987,9912,9942,9961,9988,9966,9906,9992,9938,9941,9987,9917,10000,9919,9945,9953,9994,9913,9983,9967,9996,9962,9982,9946,9924,9982,9910,9930,9990,9903,9987,9977,9927,9922,9970,9978,9925,9950,9988,9980,9991,9997,9920,9910,9957,9938,9928,9944,9995,9905,9937,9946,9953,9909,9979,9961,9986,9979,9996,9912,9906,9968,9926,10000,9922,9943,9982,9917,9920,9952,9908,10000,9914,9979,9932,9918,9996,9923,9929,9997,9901,9955,9976,9959,9995,9948,9994,9996,9939,9977,9977,9901,9939,9953,9902,9926,9993,9926,9906,9914,9911,9901,9912,9990,9922,9911,9907,9901,9998,9941,9950,9985,9935,9928,9909,9929,9963,9997,9977,9997,9938,9933,9925,9907,9976,9921,9957,9931,9925,9979,9935,9990,9910,9938,9947,9969,9989,9976,9900,9910,9967,9951,9984,9979,9916,9978,9961,9986,9945,9976,9980,9921,9975,9999,9922]
k = 1524
Output
Expected: 81
My code returns: 79
I tried to solve as many cases as I could. I realise this is a bruteforce approach, but don't understand why my code is giving the wrong answer.
My approach is to convert numbers from last into the specified number. I need to check these as we have to count how many maximum numbers we can convert. Then this is repeated for every number till second last number. This is basically what I was thinking while writing this code.
The reason for the different output is that your xx index is only decreased one unit at each iteration of the i loop. But that loop is iterating for the number of unique elements, while xx is an index in the original vector. When there are many duplicates, that means xx is coming nowhere near the start of the vector and so it misses opportunities there.
You could fix that problem by replacing:
--xx;
...with:
--xx;
while (xx >= 0 && nums[xx] == nums[xx+1]) --xx;
if (xx < 0) break;
That will solve the issue you raise. You can also drop the unique call, and the distinct, nums2 and i variables. The outer loop could just check that xx > 0.
Efficiency is your next problem
Your algorithm is not as efficient as needed, and other tests with huge input data will time out.
Hint 1: checkfreq's inner loop is incrementing nums[i] one unit at a time. Do you see a way to have it increase with a larger amount, so to avoid that inner loop?
Hint 2 (harder): checkfreq is often incrementing the same value in different calls -- even more so when k is large and the section of the vector that can be incremented is large. Can you think of a way to avoid that checkfreq needs to redo that much work in subsequent calls, and can only concentrate on what is different compared to what it had to calculate in the previous call?

What is the time complexity of below program?

Below is the program which find the length of the longest substring without repeating characters, given a string str. (details)
int test(string str) {
int left = 0, right = 0, ans = 0;
unordered_set<char> set;
while(left < str.size() and right < str.size()) {
if(set.find(str[right]) == set.end()) set.insert(str[right]);
else {
while(str[left] != str[right]){
set.erase(str[left]);
left++;
}
left++;
}
right++;
ans = (ans > set.size() ? ans : set.size());
}
return ans;
};
What is the time complexity of above solution? Is it O(n^2) or O(n) where n is the length of string?
Please note that I have gone through multiple questions on internet and also read about big oh but I am still confused. To me, it looks like O(n^2) complexity due to two while loops but I want to confirm from experts here.
It's O(n) on average.
What you see here is a sliding window technique (with variable window size, also called "two pointers technique").
Yes there are two loops, but if you look, any iteration of any of the two loops will always increase one of the pointers (either left or right).
In the first loop, either you call the second loop or you don't, but you will increase right at each iteration. The second loop always increases left.
Both left and right can have n different values (because both loops would stop when either right >= n or left == right).
So the first loop will have n executions (all the values of right from 0 to n-1) and the second loop can have at most n executions (all the possible values of left), which is a worst case of 2n = O(n) executions.
Worst case complexity
For the sake of completeness, please note that I wrote O(n) on average. The reason is that set.find has a complexity of O(1) in average but O(n) in the worst case. Same goes for set.erase. The reason is that unordered_set is implemented with a hash table and it the very unlikely case of all your items being in the same bucket, it needs to iterate on all the items.
So even though we have O(n) iterations of the loop, some iterations could be O(n). It means that in some very unlikely cases, the execution could go up to O(n^2). You shouldn't really worry about it as the probability of this to happen is close to 0, and even though I don't exactly know what the hashing technique for char in C++, I would bet that we will never end up with all characters in the same bucket.

Is this Insertion Sort implementation worst case O(n)?

I know that Insertion Sort is supposed to be worst case O(n^2), but I'm wondering why the following implementation isn't O(n).
void main()
{
//insertion sort runs from i = 1 to i = n, thus is worst case O(n)
for (
int i = 1,
placeholder = 0,
A[] = { 10,9,8,7,6,5,4,3,2,1 },
j = i;
i <= 10;
j-- > 0 && A[j - 1] > A[j]
? placeholder = A[j], A[j] = A[j - 1], A[j - 1] = placeholder
: j = ++i
)
{
for (
int x = 0;
x < 10; x++
)
cout << A[x] << ' ';
cout << endl;
}
system("pause");
}
There is only one for loop involved here and it runs from 1 to n. It seems to me that this would be the definition of O(n). What exactly am I missing here?
Sloppy terminology has led many people to false conclusions. This appears to be an example.
There is only one for loop involved here and it runs from 1 to n.
Yes, there is only one loop, but what is this "it" to which you refer? I really do mean for you to think about it. Should "it" refer to the loop? That would match a fairly common, yet sloppy, use of terminology, but a loop does not evaluate to a value. So a loop cannot actually run from one value to another. The sloppiness can be overlooked in simpler contexts, but not in yours.
Normally, the "it" would really refer to the loop control variable. With a simple loop, like for (int i = 0; i < 10; ++i), there is a one-to-one correspondence between iterations of the loop and values assigned to the control variable (which is i in my example). So there is an equivalence present, allowing one to refer to the loop when one really means the control variable. Saying that a loop runs from x to y really means that the control variable runs from x to y, and that there is one iteration of the loop per value assigned to the control variable. This correspondence fails in your code.
In your loop, the thing that runs from 1 to n is i. However, i is not incremented with each iteration of the loop, so "it runs from 1 to n" is not an accurate assessment of your loop. When i is 1, there are up to 2 iterations. That's not a one-to-one correspondence between iterations and values of i. As i increases, the divergence from one-to-one grows. Each value of i potentially corresponds to i+1 iterations, as j counts down from i to 0. The total number of iterations in the worst case scenario for n entries is the sum of the potential number of iterations for each value of i: 2 + 3 + &ctdot; + (n+1) = (n² + 3n)/2. That's O(n²).
Moral of the story: writing compact, cryptic code does not magically change the complexity of the algorithm being implemented. Cryptic code can make the complexity harder to pin down, but the main thing you've accomplished is making your code harder to read.
Thats a very odd way to write code.But You have 2 for loops in the definition. It is not always necessary to have nested loops to have O(n^2), you can have it with recursion also.
In simple terms O(n^2)n simply means number of operations performed when the input size is n.
The code given is not a correct c++ code and not even close to a pseudocode.
The correct code should be like this:
void main()
{
int i,j,key;
int A[]={10,9,8,7,6,5,4,3,2,1};
//cout<<"Array before sorting:"<<endl;
//for(i=0;i<10;i++)
//cout<<A[i]<<"\t";
//cout<<endl;
for(i=1;i<10;i++)
{
key=A[i];
for(j=i-1;j>=0 && A[j]>key;j--)
{
A[j+1]=A[j];
}
A[j+1]=key;
}
//cout<<"Array after sorting:"<<endl;
//for(i=0;i<10;i++)
//cout<<A[i]<<"\t";
//cout<<endl;
}
See, insertion sort has two loops. Outer loop is to maintain the key variable and the inner loop is to compare the elements prior to key variable with the key variable. And therefore the worst case time complexity is O(n^2) and not O(n), as the basic algorithm of insertion sort contains two loops, both of which eventually iterate n times in case of worst case i.e. when the array is inverted.

find duplicate number in an array

I am debugging below problem and post the solution I am debugging and working on, the solution or similar is posted on a couple of forums, but I think the solution has a bug when num[0] = 0 or in general num[x] = x? Am I correct? Please feel free to correct me if I am wrong.
Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.
Note:
You must not modify the array (assume the array is read only).
You must use only constant, O(1) extra space.
Your runtime complexity should be less than O(n2).
There is only one duplicate number in the array, but it could be repeated more than once.
int findDuplicate3(vector<int>& nums)
{
if (nums.size() > 1)
{
int slow = nums[0];
int fast = nums[nums[0]];
while (slow != fast)
{
slow = nums[slow];
fast = nums[nums[fast]];
}
fast = 0;
while (fast != slow)
{
fast = nums[fast];
slow = nums[slow];
}
return slow;
}
return -1;
}
Below is my code which uses Floyd's cycle-finding algorithm:
#include <iostream>
#include <vector>
using namespace std;
int findDup(vector<int>&arr){
int len = arr.size();
if(len>1){
int slow = arr[0];
int fast = arr[arr[0]];
while(slow!=fast){
slow = arr[slow];
fast = arr[arr[fast]];
}
fast = 0;
while(slow!=fast){
slow = arr[slow];
fast = arr[fast];
}
return slow;
}
return -1;
}
int main() {
vector<int>v = {1,2,2,3,4};
cout<<findDup(v)<<endl;
return 0;
}
Comment This works because zeroes aren't allowed, so the first element of the array isn't part of a cycle, and so the first element of the first cycle we find is referred to both outside and inside the cycle. If zeroes were allowed, this would fail if arr[0] were on a cycle. E.g., [0,1,1].
The sum of integers from 1 to N = (N * (N + 1)) / 2. You can use this to find the duplicate -- sum the integers in the array, then subtract the above formula from the sum. That's the duplicate.
Update: The above solution is based on the (possibly invalid) assumption that the input array consists of the values from 1 to N plus a single duplicate.
Start with two pointers to the first element: fast and slow.
Define a 'move' as incrementing fast by 2 step(positions) and slow by 1.
After each move, check if slow & fast point to the same node.
If there is a loop, at some point they will. This is because after they are both in the loop, fast is moving twice as quickly as slow and will eventually 'run into' it.
Say they meet after k moves. This is NOT NECESSARILY the repeated element, since it might not be the first element of the loop reached from outside the loop.
Call this element X.
Notice that fast has stepped 2k times, and slow has stepped k times.
Move fast back to zero.
Repeatedly advance fast and slow by ONE STEP EACH, comparing after each step.
Notice that after another k steps, slow will have moved a total of 2k steps and fast a total of k steps from the start, so they will again both be pointing to X.
Notice that if the prior step is on the loop for both of them, they were both pointing to X-1. If the prior step was only on the loop for slow, then they were pointing to different elements.
Ditto for X-2, X-3, ...
So in going forward, the first time they are pointing to the same element is the first element of the cycle reached from outside the cycle, which is the repeated element you're looking for.
Since you cannot use any additional space, using another hash table would be ruled out.
Now, coming to the approach of hashing on existing array, it can be acheived if we are allowed to modify the array in place.
Algo:
1) Start with the first element.
2) Hash the first element and apply a transformation to the value of hash.Let's say this transformation is making the value -ve.
3)Proceed to next element.Hash the element and before applying the transformation, check if a transformation has already been applied.
4) If yes, then element is a duplicate.
Code:
for(i = 0; i < size; i++)
{
if(arr[abs(arr[i])] > 0)
arr[abs(arr[i])] = -arr[abs(arr[i])];
else
cout<< abs(arr[i]) <<endl;
}
This transformation is required since if we are to use hashing approach,then, there has to be a collision for hashing the same key.
I cant think of a way in which hashing can be used without any additional space and not modifying the array.

Insertion Sort Optimization

I'm trying to practice making some different sort functions and the insertion function that I came up with is giving me some trouble. I can sort lists that are less than 30K fairly quickly. But I have a list of 100K integers and it literally takes 15 minutes for the function to complete the sort. Everything is sorted correctly, but I don't believe it should take that long.
Am I missing something with my code that is making it take so long? Many thanks in advance.
void Sort::insertion_Sort(vector <int> v)
{
int vecSize = v.size();
//for loop to advance through the vector
for (int i=0; i < vecSize; i++)
{
//delcare some variables
int cursor = i;
int inputCursor = i-1;
int temp = v[cursor];
//check to see if we are considering only a single element
if (cursor > 0)
{
//if there is more than 1 element, then we test the following.
//1. is the cursor element less than the inputCursor(which
//is the previous element)
//2. is the input cursor greater than -1
while (inputCursor > -1 && v[cursor] < v[inputCursor] )
{
//if so, we swap the variables
//then move the cursors back to check
//the previous elment and see if we need to swap again.
temp = v[cursor];
v[cursor] = v[inputCursor];
v[inputCursor] = temp;
inputCursor--;
cursor--;
}
}
}
}
Insertion sort is an O(n^2) algorithm. It's slow for large inputs. It's going to take roughly 11 times longer to process a list of 100k items than a list of 30k items. For inputs larger than 20 or so, you should use something like quicksort, which is O(n*log(n)).
The O(n^2) vs O(n*log(n)) problem, as pointed out by the other answer, is the center of this problem. I would suggest a binary search algorithm, as it is more similar to the insert algorithm, and is simplier to implement. It would look for the point of insertion dividing the already inserted vector in half, and trying to see if the integer to be inserted is greater or not of the integer in the middle. Then, it will try again to split one of the half (the one on the choosen side) and so on, recursively.
I think this is the best approach without starting from scratch.