1838. Frequency of the Most Frequent Element leetcode C++ - c++

I am trying LeetCode problem 1838. Frequency of the Most Frequent Element:
The frequency of an element is the number of times it occurs in an array.
You are given an integer array nums and an integer k. In one operation, you can choose an index of nums and increment the element at that index by 1.
Return the maximum possible frequency of an element after performing at most k operations.
I am getting a Wrong Answer error for a specific test case.
My code
int checkfreq(vector<int>nums,int k,int i)
{
//int sz=nums.size();
int counter=0;
//int i=sz-1;
int el=nums[i];
while(k!=0 && i>0)
{
--i;
while(nums[i]!=el && k>0 && i>=0)
{
++nums[i];
--k;
}
}
counter=count(nums.begin(),nums.end(),el);
return counter;
}
class Solution {
public:
int maxFrequency(vector<int>& nums, int k) {
sort(nums.begin(),nums.end());
vector<int> nums2=nums;
auto distinct=unique(nums2.begin(),nums2.end());
nums2.resize(distance(nums2.begin(),distinct));
int xx=nums.size()-1;
int counter=checkfreq(nums,k,xx);
for(int i=nums2.size()-2;i>=0;--i)
{
--xx;
int temp=checkfreq(nums,k,xx);
if(temp>counter)
counter=temp;
}
return counter;
}
};
Failing test case
Input
nums = [9968,9934,9996,9928,9934,9906,9971,9980,9931,9970,9928,9973,9930,9992,9930,9920,9927,9951,9939,9915,9963,9955,9955,9955,9933,9926,9987,9912,9942,9961,9988,9966,9906,9992,9938,9941,9987,9917,10000,9919,9945,9953,9994,9913,9983,9967,9996,9962,9982,9946,9924,9982,9910,9930,9990,9903,9987,9977,9927,9922,9970,9978,9925,9950,9988,9980,9991,9997,9920,9910,9957,9938,9928,9944,9995,9905,9937,9946,9953,9909,9979,9961,9986,9979,9996,9912,9906,9968,9926,10000,9922,9943,9982,9917,9920,9952,9908,10000,9914,9979,9932,9918,9996,9923,9929,9997,9901,9955,9976,9959,9995,9948,9994,9996,9939,9977,9977,9901,9939,9953,9902,9926,9993,9926,9906,9914,9911,9901,9912,9990,9922,9911,9907,9901,9998,9941,9950,9985,9935,9928,9909,9929,9963,9997,9977,9997,9938,9933,9925,9907,9976,9921,9957,9931,9925,9979,9935,9990,9910,9938,9947,9969,9989,9976,9900,9910,9967,9951,9984,9979,9916,9978,9961,9986,9945,9976,9980,9921,9975,9999,9922]
k = 1524
Output
Expected: 81
My code returns: 79
I tried to solve as many cases as I could. I realise this is a bruteforce approach, but don't understand why my code is giving the wrong answer.
My approach is to convert numbers from last into the specified number. I need to check these as we have to count how many maximum numbers we can convert. Then this is repeated for every number till second last number. This is basically what I was thinking while writing this code.

The reason for the different output is that your xx index is only decreased one unit at each iteration of the i loop. But that loop is iterating for the number of unique elements, while xx is an index in the original vector. When there are many duplicates, that means xx is coming nowhere near the start of the vector and so it misses opportunities there.
You could fix that problem by replacing:
--xx;
...with:
--xx;
while (xx >= 0 && nums[xx] == nums[xx+1]) --xx;
if (xx < 0) break;
That will solve the issue you raise. You can also drop the unique call, and the distinct, nums2 and i variables. The outer loop could just check that xx > 0.
Efficiency is your next problem
Your algorithm is not as efficient as needed, and other tests with huge input data will time out.
Hint 1: checkfreq's inner loop is incrementing nums[i] one unit at a time. Do you see a way to have it increase with a larger amount, so to avoid that inner loop?
Hint 2 (harder): checkfreq is often incrementing the same value in different calls -- even more so when k is large and the section of the vector that can be incremented is large. Can you think of a way to avoid that checkfreq needs to redo that much work in subsequent calls, and can only concentrate on what is different compared to what it had to calculate in the previous call?

Related

How to convert a simple computer algorithm into a mathematical function in order to determine the big o notation?

In my University we are learning Big O Notation. However, one question that I have in light of big o notation is, how do you convert a simple computer algorithm, say for example, a linear searching algorithm, into a mathematical function, say for example 2n^2 + 1?
Here is a simple and non-robust linear searching algorithm that I have written in c++11. Note: I have disregarded all header files (iostream) and function parameters just for simplicity. I will just be using basic operators, loops, and data types in order to show the algorithm.
int array[5] = {1,2,3,4,5};
// Variable to hold the value we are searching for
int searchValue;
// Ask the user to enter a search value
cout << "Enter a search value: ";
cin >> searchValue;
// Create a loop to traverse through each element of the array and find
// the search value
for (int i = 0; i < 5; i++)
{
if (searchValue == array[i])
{
cout << "Search Value Found!" << endl;
}
else
// If S.V. not found then print out a message
cout << "Sorry... Search Value not found" << endl;
In conclusion, how do you translate an algorithm into a mathematical function so that we can analyze how efficient an algorithm really is using big o notation? Thanks world.
First, be aware that it's not always possible to analyze the time complexity of an algorithm, there are some where we do not know their complexity, so we have to rely on experimental data.
All of the methods imply to count the number of operations done. So first, we have to define the cost of basic operations like assignation, memory allocation, control structures (if, else, for, ...). Some values I will use (working with different models can provide different values):
Assignation takes constant time (ex: int i = 0;)
Basic operations take constant time (+ - * ∕)
Memory allocation is proportional to the memory allocated: allocating an array of n elements takes linear time.
Conditions take constant time (if, else, else if)
Loops take time proportional to the number of time the code is ran.
Basic analysis
The basic analysis of a piece of code is: count the number of operations for each line. Sum those cost. Done.
int i = 1;
i = i*2;
System.out.println(i);
For this, there is one operation on line 1, one on line 2 and one on line 3. Those operations are constant: This is O(1).
for(int i = 0; i < N; i++) {
System.out.println(i);
}
For a loop, count the number of operations inside the loop and multiply by the number of times the loop is ran. There is one operation on the inside which takes constant time. This is ran n times -> Complexity is n * 1 -> O(n).
for (int i = 0; i < N; i++) {
for (int j = i; j < N; j++) {
System.out.println(i+j);
}
}
This one is more tricky because the second loop starts its iteration based on i. Line 3 does 2 operations (addition + print) which take constant time, so it takes constant time. Now, how much time line 3 is ran depends on the value of i. Enumerate the cases:
When i = 0, j goes from 0 to N so line 3 is ran N times.
When i = 1, j goes from 1 to N so line 3 is ran N-1 times.
...
Now, summing all this we have to evaluate N + N-1 + N-2 + ... + 2 + 1. The result of the sum is N*(N+1)/2 which is quadratic, so complexity is O(n^2).
And that's how it works for many cases: count the number of operations, sum all of them, get the result.
Amortized time
An important notion in complexity theory is amortized time. Let's take this example: running operation() n times:
for (int i = 0; i < N; i++) {
operation();
}
If one says that operation takes amortized constant time, it means that running n operations took linear time, even though one particular operation may have taken linear time.
Imagine you have an empty array of 1000 elements. Now, insert 1000 elements into it. Easy as pie, every insertion took constant time. And now, insert another element. For that, you have to create a new array (bigger), copy the data from the old array into the new one, and insert the element 1001. The 1000 first insertions took constant time, the last one took linear time. In this case, we say that all insertions took amortized constant time because the cost of that last insertion was amortized by the others.
Make assumptions
In some other cases, getting the number of operations require to make hypothesises. A perfect example for this is insertion sort, because it is simple and it's running time depends of how is the data ordered.
First, we have to make some more assumptions. Sorting involves two elementary operations, that is comparing two elements and swapping two elements. Here I will consider both of them to take constant time. Here is the algorithm where we want to sort array a:
for (int i = 0; i < a.length; i++) {
int j = i;
while (j > 0 && a[j] < a[j-1]) {
swap(a, i, j);
j--;
}
}
First loop is easy. No matter what happens inside, it will run n times. So the running time of the algorithm is at least linear. Now, to evaluate the second loop we have to make assumptions about how the array is ordered. Usually, we try to define the best-case, worst-case and average case running time.
Best-case: We do never enter the while loop. Is this possible ? Yes. If a is a sorted array, then a[j] > a[j-1] no matter what j is. Thus, we never enter the second loop. So, what operations are done in this case is the assignation on line 2 and the evaluation of the condition on line 3. Both take constant time. Because of the first loop, those operations are ran n times. Then in the best case, insertion sort is linear.
Worst-case: We leave the while loop only when we reach the beginning of the array. That is, we swap every element all the way to the 0 index, for every element in the array. It corresponds to an array sorted in reverse order. In this case, we end up with the first element being swapped 0 times, element 2 is swapped 1 times, element 3 is swapped 2 times, etc up to element n being swapped n-1 times. We already know the result of this: worst-case insertion is quadratic.
Average case: For the average case, we assume the items are randomly distributed inside the array. If you're interested in the maths, it involves probabilities and you can find the proof in many places. Result is quadratic.
Conclusion
Those were basics about analyzing the time complexity of an algorithm. The cases were easy, but there are some algorithms which aren't as nice. For example, you can look at the complexity of the pairing heap data structure which is much more complex.

find duplicate number in an array

I am debugging below problem and post the solution I am debugging and working on, the solution or similar is posted on a couple of forums, but I think the solution has a bug when num[0] = 0 or in general num[x] = x? Am I correct? Please feel free to correct me if I am wrong.
Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.
Note:
You must not modify the array (assume the array is read only).
You must use only constant, O(1) extra space.
Your runtime complexity should be less than O(n2).
There is only one duplicate number in the array, but it could be repeated more than once.
int findDuplicate3(vector<int>& nums)
{
if (nums.size() > 1)
{
int slow = nums[0];
int fast = nums[nums[0]];
while (slow != fast)
{
slow = nums[slow];
fast = nums[nums[fast]];
}
fast = 0;
while (fast != slow)
{
fast = nums[fast];
slow = nums[slow];
}
return slow;
}
return -1;
}
Below is my code which uses Floyd's cycle-finding algorithm:
#include <iostream>
#include <vector>
using namespace std;
int findDup(vector<int>&arr){
int len = arr.size();
if(len>1){
int slow = arr[0];
int fast = arr[arr[0]];
while(slow!=fast){
slow = arr[slow];
fast = arr[arr[fast]];
}
fast = 0;
while(slow!=fast){
slow = arr[slow];
fast = arr[fast];
}
return slow;
}
return -1;
}
int main() {
vector<int>v = {1,2,2,3,4};
cout<<findDup(v)<<endl;
return 0;
}
Comment This works because zeroes aren't allowed, so the first element of the array isn't part of a cycle, and so the first element of the first cycle we find is referred to both outside and inside the cycle. If zeroes were allowed, this would fail if arr[0] were on a cycle. E.g., [0,1,1].
The sum of integers from 1 to N = (N * (N + 1)) / 2. You can use this to find the duplicate -- sum the integers in the array, then subtract the above formula from the sum. That's the duplicate.
Update: The above solution is based on the (possibly invalid) assumption that the input array consists of the values from 1 to N plus a single duplicate.
Start with two pointers to the first element: fast and slow.
Define a 'move' as incrementing fast by 2 step(positions) and slow by 1.
After each move, check if slow & fast point to the same node.
If there is a loop, at some point they will. This is because after they are both in the loop, fast is moving twice as quickly as slow and will eventually 'run into' it.
Say they meet after k moves. This is NOT NECESSARILY the repeated element, since it might not be the first element of the loop reached from outside the loop.
Call this element X.
Notice that fast has stepped 2k times, and slow has stepped k times.
Move fast back to zero.
Repeatedly advance fast and slow by ONE STEP EACH, comparing after each step.
Notice that after another k steps, slow will have moved a total of 2k steps and fast a total of k steps from the start, so they will again both be pointing to X.
Notice that if the prior step is on the loop for both of them, they were both pointing to X-1. If the prior step was only on the loop for slow, then they were pointing to different elements.
Ditto for X-2, X-3, ...
So in going forward, the first time they are pointing to the same element is the first element of the cycle reached from outside the cycle, which is the repeated element you're looking for.
Since you cannot use any additional space, using another hash table would be ruled out.
Now, coming to the approach of hashing on existing array, it can be acheived if we are allowed to modify the array in place.
Algo:
1) Start with the first element.
2) Hash the first element and apply a transformation to the value of hash.Let's say this transformation is making the value -ve.
3)Proceed to next element.Hash the element and before applying the transformation, check if a transformation has already been applied.
4) If yes, then element is a duplicate.
Code:
for(i = 0; i < size; i++)
{
if(arr[abs(arr[i])] > 0)
arr[abs(arr[i])] = -arr[abs(arr[i])];
else
cout<< abs(arr[i]) <<endl;
}
This transformation is required since if we are to use hashing approach,then, there has to be a collision for hashing the same key.
I cant think of a way in which hashing can be used without any additional space and not modifying the array.

Segmentation Fault C++ (array too large?)

I'm working on the Project Euler Problem 14, where I need to find the longest collatz sequence under 1,000,000. I've come up with an algorithm that works for smaller numbers (say, 100) that stores each collatz number from 1 - 100 into an array and uses that array as a reference to speed up the computations for higher numbers.
My code is as follows:
#include <iostream>
using namespace std;
long even(long n){ //the even-collatz function
n=n/2;
return n;
}
long odd(long n){ //the odd collatz function
n=3*n+1;
return n;
}
int main(){
long x, c=0, y[1000000]; // x= the number we are finding the collatz number of, c a counter that keeps track of how many steps we've taken in the sequence, y is an array to store the collatz numbers.
for (x=1; x<1000000; x++){ //iterates from x=1 to 1 million
long a = x; //sets a number a equal to the number we are currently trying to find the collatz number of
long b = a;
c=0; //intializes counter at 0
while (a!=0){ //loops infinitely; the only way to exit is through a break.
if (a%2==0){ // detects if the number is even
a=even(a); //applies the even-collatz function if so; sets x=x/2
c=c+1;
if (y[a]!=0){ // checks if the collatz number of x is already discovered
y[b]=c+y[a]; //adds the current number of steps to the collatz number of x and
break; //exits the while loop
}
}
else if (a==1){ //checks if the new x is equal to one and
y[b]=c; //if it is, it writes the current value of c to y[b] and
break; // exits the loop
}
else if (a%2==1){ //same as the "even" block, except for odd numbers
a=odd(a);
c=c+1;
if( y[a]!=0){
y[b]=c+y[a];
break;
}
}
//this is the end of the while loop; we've applied the collatz function as many times as we've needed to to x, and incremented the counter each time
}
}
long z;
for (int n=0;n!=100;n++){
if (y[n+1]>y[n]){
z=y[n+1];
}
}
cout << z << "\n";
}
The issue I'm having is that I get a segfault after x=1818 in the for loop. Through debugging, I've found that how quickly the segfault occurs depends on the size of array y, so I'm assuming that the array is just too big. From my (basic) understanding of segfaults, I think I'm just accessing memory that I'm "not allowed". Is there any way for me to circumvent this, or should I just start working towards another solution to this problem? I'm compiling using g++ on Ubuntu studio.
This array is probably too big for your system's default stack size; the simplest fix is to change its definition to:
std::vector<long> y(1000000);
and everything else can stay the same. You could use y.size() instead of the magic number 1000000 later in your loop.
For a starting number under N collatz sequence can go way beyond N. For N == 1000000 consider x == 333335.
I would suggest you to make y a vector<int> and expand it dynamically, or just make it unordered_map<int, int>.
If y was too big for your stack, you would get a stack overflow exception as soon as main tried to run.
Your problem is more likely that a gets bigger than the size of y.
When I ran it through the debugger, a was 1417174 when x was 4255, so you might have a problem with your algorithm.
That said, you should either allocate it yourself, or make it static, as there is no guarantee that whatever compiler Project Euler uses will allow such a large stack size.

Using a hash to find one duplicated and one missing number in an array

I had this question during an interview and am curious to see how it would be implemented.
Given an unsorted array of integers from 0 to x. One number is missing and one is duplicated. Find those numbers.
Here is what I came up with:
int counts[x+1];
for(int i =0;i<=x; i++){
counts[a[i]]++;
if(counts[a[i]] == 2)
cout<<”Duplicate element: “<<a[i]; //I realized I could find this here
}
for(int j=0; j<=x; j++){
if(counts[j] == 0)
cout<<”Missing element: “<<j;
//if(counts[j] == 2)
// cout<<”Duplicate element: “<<j; //No longer needed here.
}
My initial solution was to create another array of size x+1, loop through the given array and index into my array at the values of the given array and increment. If after the increment any value in my array is two, that is the duplicate. However, I then had to loop through my array again to find any value that was 0 for the missing number.
I pointed out that this might not be the most time efficient solution, but wasn't sure how to speed it up when I was asked. I realized I could move finding the duplicate into the first loop, but that didn't help with the missing number. After waffling for a bit, the interviewer finally gave me the idea that a hash would be a better/faster solution. I have not worked with hashes much, so I wasn't sure how to implement that. Can someone enlighten me? Also, feel free to point out any other glaring errors in my code... Thanks in advance!
If the range of values is the about the same or smaller than the number of values in an array, then using a hash table will not help. In this case, there are x+1 possible values in an array of size x+1 (one missing, one duplicate), so a hash table isn't needed, just a histogram which you've already coded.
If the assignment were changed to be looking for duplicate 32 bit values in an array of size 1 million, then the second array (a histogram) could need to be 2^32 = 4 billion counts long. This is when a hash table would help, since the hash table size is a function of the array size, not the range of values. A hash table of size 1.5 to 2 million would be large enough. In this case, you would have 2^32 - 2^20 = 4293918720 "missing" values, so that part of the assignment would go away.
Wiki article on hash tables:
Hash Table
If x were small enough (such that the sum of 0..x can be represented), you could compute the sum of the unique values in a, and subtract that from the sum of 0..x, to get the missing value, without needing the second loop.
Here is a stab at a solution that uses an index (a true key-value hash doesn't make sense when the array is guaranteed to include only integers). Sorry OP, it's in Ruby:
values = mystery_array.sort.map.with_index { |n,i| n if n != i }.compact
missing_value,duplicate_value = mystery_array.include?(values[0] - 1) ? \
[values[-1] + 1, values[0]] : [values[0] - 1, values[-1]]
The functions used likely employ a non-trivial amount of looping behind the scenes, and this will create a (possibly very large) variable values which contains a range between the missing and/or duplicate value, as well as a second lookup loop, but it works.
Perhaps the interviewer meant to say Set instead of hash?
Sorting allowed?
auto first = std::begin(a);
auto last = std::end(a);
// sort it
std::sort( first, last );
// find duplicates
auto first_duplicate = *std::adjacent_find( first, last );
// find missing value
auto missing = std::adjacent_find(first, last, [](int x, int y) {return x+2 == y;});
int missing_number = 0;
if (missing != last)
{
missing_number = 1+ *missing;
}
else
{
if (counts[0] != 0)
{
missing_number = 0;
}
else
{
missing_number = 9;
}
}
Both could be done in a single hand-written loop, but I wanted to use only stl algorithms. Any better idea for handling the corner cases?
for (i=0 to length) { // first loop
for( j=0 to length ){ // second loop
if (t[i]==j+1) {
if (counter==0){//make sure duplicated number has not been found already
for( k=i+1 to length ) { //search for duplicated number
if(t[k]==j+1){
j+1 is the duplicated number ;
if(missingIsFound)
exit // exit program, missing and dup are found
counter=1 ;
}//end if t[k]..
}//end loop for duplicated number
} // end condition to search
continue ; // continue to first loop
}
else{
j+1 is the missing number ;
if(duplicatedIsFound)
exit // exit program, missing and dup are found
continue ; //continue to first loop
}//end second loop
} //end first loop

Understanding Sum of subsets

I've just started learning Backtracking algorithms at college. Somehow I've managed to make a program for the Subset-Sum problem. Works fine but then i discovered that my program doesn't give out all the possible combinations.
For example : There might be a hundred combinations to a target sum but my program gives only 30.
Here is the code. It would be a great help if anyone could point out what my mistake is.
int tot=0;//tot is the total sum of all the numbers in the set.
int prob[500], d, s[100], top = -1, n; // n = number of elements in the set. prob[i] is the array with the set.
void subset()
{
int i=0,sum=0; //sum - being updated at every iteration and check if it matches 'd'
while(i<n)
{
if((sum+prob[i] <= d)&&(prob[i] <= d))
{
s[++top] = i;
sum+=prob[i];
}
if(sum == d) // d is the target sum
{
show(); // this function just displays the integer array 's'
top = -1; // top points to the recent number added to the int array 's'
i = s[top+1];
sum = 0;
}
i++;
while(i == n && top!=-1)
{
sum-=prob[s[top]];
i = s[top--]+1;
}
}
}
int main()
{
cout<<"Enter number of elements : ";cin>>n;
cout<<"Enter required sum : ";cin>>d;
cout<<"Enter SET :\n";
for(int i=0;i<n;i++)
{
cin>>prob[i];
tot+=prob[i];
}
if(d <= tot)
{
subset();
}
return 0;
}
When I run the program :
Enter number of elements : 7
Enter the required sum : 12
Enter SET :
4 3 2 6 8 12 21
SOLUTION 1 : 4, 2, 6
SOLUTION 2 : 12
Although 4, 8 is also a solution, my program doesnt show it.
Its even worse with the number of inputs as 100 or more. There will be atleast 10000 combinations, but my program shows 100.
The Logic which I am trying to follow :
Take in the elements of the main SET into a subset as long as the
sum of the subset remains less than or equal to the target sum.
If the addition of a particular number to the subset sum makes it
larger than the target, it doesnt take it.
Once it reaches the end
of the set, and answer has not been found, it removes the most
recently taken number from the set and starts looking at the numbers
in the position after the position of the recent number removed.
(since what i store in the array 's' is the positions of the
selected numbers from the main SET).
The solutions you are going to find depend on the order of the entries in the set due to your "as long as" clause in step 1.
If you take entries as long as they don't get you over the target, once you've taken e.g. '4' and '2', '8' will take you over the target, so as long as '2' is in your set before '8', you'll never get a subset with '4' and '8'.
You should either add a possibility to skip adding an entry (or add it to one subset but not to another) or change the order of your set and re-examine it.
It may be that a stack-free solution is possible, but the usual (and generally easiest!) way to implement backtracking algorithms is through recursion, e.g.:
int i = 0, n; // i needs to be visible to show()
int s[100];
// Considering only the subset of prob[] values whose indexes are >= start,
// print all subsets that sum to total.
void new_subsets(int start, int total) {
if (total == 0) show(); // total == 0 means we already have a solution
// Look for the next number that could fit
while (start < n && prob[start] > total) {
++start;
}
if (start < n) {
// We found a number, prob[start], that can be added without overflow.
// Try including it by solving the subproblem that results.
s[i++] = start;
new_subsets(start + 1, total - prob[start]);
i--;
// Now try excluding it by solving the subproblem that results.
new_subsets(start + 1, total);
}
}
You would then call this from main() with new_subsets(0, d);. Recursion can be tricky to understand at first, but it's important to get your head around it -- try easier problems (e.g. generating Fibonacci numbers recursively) if the above doesn't make any sense.
Working instead with the solution you have given, one problem I can see is that as soon as you find a solution, you wipe it out and start looking for a new solution from the number to the right of the first number that was included in this solution (top = -1; i = s[top+1]; implies i = s[0], and there is a subsequent i++;). This will miss solutions that begin with the same first number. You should just do if (sum == d) { show(); } instead, to make sure you get them all.
I initially found your inner while loop pretty confusing, but I think it's actually doing the right thing: once i hits the end of the array, it will delete the last number added to the partial solution, and if this number was the last number in the array, it will loop again to delete the second-to-last number from the partial solution. It can never loop more than twice because numbers included in a partial solution are all at distinct positions.
I haven't analysed the algorithm in detail, but what struck me is that your algorithm doesn't account for the possibility that, after having one solution that starts with number X, there could be multiple solutions starting with that number.
A first improvement would be to avoid resetting your stack s and the running sum after you printed the solution.